LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs

ArXi:2605.15621v1 Announce Type: new Large vision-language models (LVLMs) achieve strong multimodal understanding, but their inference cost grows rapidly with the number of visual tokens, especially for high-resolution images and long videos. Existing attention-based methods estimate token importance from attention scores, which may