Boosting MLLM Spatial Reasoning with Geometrically Referenced 3D Scene Representations

ArXi:2603.08592v1 Announce Type: new While Multimodal Large Language Models (MLLMs) have achieved remarkable success in 2D visual understanding, their ability to reason about 3D space remains limited. To address this gap, we