Energy-Driven Adaptive Visual Token Pruning for Efficient Vision-Language Models

ArXi:2603.05950v1 Announce Type: cross Visual token reduction is critical for accelerating Vision-Language Models (VLMs), yet most existing approaches rely on a fixed budget shared across all inputs, overlooking the substantial variation in image information density. We propose E-AdaPrune, an energy-driven adaptive pruning framework that determines the token budget from the singular value spectrum of the visual features space. By preserving a certain proportion of spectral energy, our method allocates tokens to information-dense scenes while aggressively compressing redundant ones, without.