Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning

ArXi:2506.04559v3 Announce Type: replace Recent breakthroughs in reasoning language models have significantly advanced text-based reasoning. On the other hand, Multi-modal Large Language Models (MLLMs) still lag behind, hindered by their outdated internal LLMs. Upgrading these LLMs is often prohibitively expensive, as it requires costly vision-language alignment re