Keep What Audio Cannot Say: Context-Preserving Token Pruning for Omni-LLMs

ArXi:2605.11605v1 Announce Type: cross Omnimodal Large Language Models (Omni-LLMs) incur substantial computational overhead due to the large number of multimodal input tokens they process, making token reduction essential for real-world deployment. Existing Omni-LLM pruning methods typically reduce this cost by selecting tokens that are important for the current query or strongly aligned with cross-modal cues.