Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning

ArXi:2601.13942v2 Announce Type: replace-cross Large Multimodal Models (LMMs) have achieved remarkable success in visual understanding, yet they struggle with knowledge-intensive queries involving long-tail entities or evolving information due to static parametric knowledge. Recent search-augmented approaches attempt to address this limitation, but existing methods rely on indiscriminate whole-image retrieval that