HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models

ArXi:2604.01881v1 Announce Type: new Video Large Language Models (VideoLLMs) have nstrated impressive capabilities in video understanding, yet the massive number of input video tokens incurs a significant computational burden for deployment. Existing methods mainly prune video tokens at input level while neglecting the inherent information structure embedded in videos and large language models (LLMs). To address this, we propose HieraVid, a hierarchical pruning framework that progressively and dynamically reduces visual redundancy.