VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, vision-language interactions

ArXi:2603.23495v1 Announce Type: cross Existing approaches for improving the efficiency of Large Vision-Language Models (LVLMs) are largely based on the concept of visual token reduction. This approach, however, creates an information bottleneck that impairs performance, especially on challenging tasks that require fine-grained understanding and reasoning. In this work, we challenge this paradigm by