Decoupled Similarity for Task-Aware Token Pruning in Large Vision-Language Models

ArXi:2604.11240v1 Announce Type: new Token pruning has emerged as an effective approach to reduce the substantial computational overhead of Large Vision-Language Models (LVLMs) by discarding less informative visual tokens while preserving performance. However, existing methods typically rely on individual attention sources from different LVLM components, resulting in incomplete and suboptimal pruning decisions due to biased attention distributions.