AI RESEARCH

Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes

arXiv CS.CV

ArXi:2602.22667v2 Announce Type: replace Open-vocabulary 3D occupancy is vital for embodied agents, which need to understand complex indoor environments where semantic categories are abundant and evolve beyond fixed taxonomies. While recent work has explored open-vocabulary occupancy in outdoor driving scenarios, such methods transfer poorly indoors, where geometry is denser, layouts are intricate, and semantics are far fine-grained. To address these challenges, we adopt a geometry-only supervision paradigm that uses only binary occupancy labels (occupied vs free.