ERASE: Eliminating Redundant Visual Tokens via Adaptive Two-Stage Token Pruning

ArXi:2605.09982v1 Announce Type: new Recent advancements in Vision-Language Models (VLMs) enable large language models (LLMs) to process high-resolution images, significantly improving real-world multimodal understanding. However, this capability