OnlineSI: Taming Large Language Model for Online 3D Understanding and Grounding

ArXi:2601.16538v2 Announce Type: replace In recent years, researchers have increasingly been interested in how to enable Multimodal Large Language Models (MLLM) to possess spatial understanding and reasoning capabilities. However, most existing methods overlook the importance of the ability to continuously work in an ever-changing world, and lack the possibility of deployment on embodied systems in real-world environments. In this work, we