SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images

ArXi:2605.11462v1 Announce Type: cross Recent advancements in Large Vision-Language Models (VLMs) have nstrated exceptional semantic understanding, yet these models consistently struggle with spatial reasoning, often failing at fundamental geometric tasks such as depth ordering and precise coordinate grounding. Recent efforts