Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration

ArXi:2512.02458v2 Announce Type: replace Embodied agents are expected to assist humans by actively exploring unknown environments and reasoning about spatial contexts. When deployed in real life, agents often face sequential tasks where each new task follows the completion of the previous one and may include infeasible objectives, such as searching for non-existent objects.