Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models

ArXi:2603.13394v1 Announce Type: new Large Vision-Language Models (LVLMs) incur substantial inference costs due to the processing of a vast number of visual tokens. Existing methods typically struggle to model progressive visual token reduction as a multi-step decision process with sequential dependencies and often rely on hand-engineered scoring rules that lack adaptive optimization for complex reasoning trajectories.