Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework

ArXi:2509.23322v2 Announce Type: replace With the continuous expansion of Large Language Models (LLMs) and advances in reinforcement learning, LLMs have nstrated exceptional reasoning capabilities, enabling them to address a wide range of complex problems. Inspired by these achievements, researchers have extended related techniques to Large Multimodal Models (LMMs). However, a critical limitation has emerged, reflected in the progressive loss of visual grounding.