LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models

ArXi:2604.09712v1 Announce Type: cross Spatial reasoning is a cornerstone capability for intelligent systems to perceive and interact with the physical world. However, multimodal large language models (MLLMs) frequently suffer from hallucinations and imprecision when parsing complex geometric layouts. As data-driven scaling struggles to internalize structured geometric priors and spatial constraints, integrating mature, specialized vision models presents a compelling alternative.