From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models

ArXi:2605.20177v1 Announce Type: new Recent advances in vision-language models (VLMs) emphasize long chain-of-thought reasoning; yet, we find that their performance on visual tasks is primarily limited by a lack of visual perception as opposed to reasoning itself. In this work, we systematically study the interplay between perception and reasoning in VLM post-