More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

ArXi:2509.25848v3 Announce Type: replace-cross Reasoning has emerged as a pivotal capability in Large Language Models (LLMs). Through Reinforcement Learning (RL), typically Group Relative Policy Optimization (GRPO), these models are able to solve complex tasks such as mathematics and code generation. Building on these advances, recent research has sought to extend reasoning to Vision-Language Models (VLMs), yielding promising results across diverse visual tasks.