Learning to Think Fast and Slow for Visual Language Models

ArXi:2511.16670v2 Announce Type: replace When faced with complex problems, we tend to engage in slower, deliberate thinking. In contrast, for simple questions we give quick, intuitive responses. This dual-system thinking approach allows us to allocate cognitive resources efficiently, reserving deeper analytical effort for tasks that truly require it. However, existing reasoning-oriented visual language models (VLMs) are mostly trained to generate uniformly long reasoning, leading to substantial token waste when concise answers would suffice.