ID-Selection: Importance-Diversity Based Visual Token Selection for Efficient LVLM Inference

ArXi:2604.05601v1 Announce Type: new Recent advances have explored visual token pruning to accelerate the inference of large vision-language models (LVLMs). However, existing methods often struggle to balance token importance and diversity: importance-based methods tend to retain redundant tokens, whereas diversity-based methods may overlook informative ones. This trade-off becomes especially problematic under high reduction ratios, where preserving only a small subset of visual tokens is critical.