Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth

ArXi:2605.18603v1 Announce Type: new Vision-Language Models (VLMs) deployed as situated agents in high-resolution visual environments require active perception -- the ability to dynamically decide where to look through operations like zooming, cropping, and panning. However, current