When Token Pruning is Worse than Random: Understanding Visual Token Information in VLLMs

ArXi:2512.07580v2 Announce Type: replace Vision Large Language Models (VLLMs) incur high computational costs due to their reliance on hundreds of visual tokens to represent images. While token pruning offers a promising solution for accelerating inference, this paper, however, identifies a key observation: in deeper layers (e.g., beyond the 20th), existing