AI RESEARCH

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

arXiv CS.CV

ArXi:2603.07660v1 Announce Type: new The pursuit of spatial intelligence fundamentally relies on access to large-scale, fine-grained 3D data. However, existing approaches predominantly construct spatial understanding benchmarks by generating question-answer (QA) pairs from a limited number of manually annotated datasets, rather than systematically annotating new large-scale 3D scenes from raw web data. As a result, their scalability is severely constrained, and model performance is further hindered by domain gaps inherent in these narrowly curated datasets.