SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations

ArXi:2604.07990v1 Announce Type: new The convergence of 3D geometric perception and video synthesis has created an unprecedented demand for large-scale video data that is rich in both semantic and spatio-temporal information. While existing datasets have advanced either 3D understanding or video generation, a significant gap remains in providing a unified resource that s both domains at scale. To bridge this chasm, we