AI RESEARCH
Thinking with Patterns: Breaking the Perceptual Bottleneck in Visual Planning via Pattern Induction
arXiv CS.LG
•
ArXi:2605.16848v1 Announce Type: cross Planning from raw visual input remains a significant challenge for current Vision-Language Models (VLMs), when the complexity of input is beyond their one-step perception capability. Motivated by recent advances in Thinking with Images (TWI), a reasonable solution is to decompose the perception process into simpler steps by iteratively acquiring and incorporating local visual evidence. However, even though current VLMs are well-trained in general TWI ability, their perceptual bottleneck in the planning domain remains.