Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views

ArXi:2510.18632v2 Announce Type: replace-cross Though recent advances in vision-language models (VLMs) have achieved remarkable progress across a wide range of multimodal tasks, understanding 3D spatial relationships from limited views remains a significant challenge. Previous reasoning methods typically rely on pure text (e.g., topological cognitive maps) or on 2D visual cues. However, their limited representational capacity hinders performance in specific tasks that require 3D spatial imagination.