OneWorld: Taming Scene Generation with 3D Unified Representation Autoencoder

ArXi:2603.16099v1 Announce Type: new Existing diffusion-based 3D scene generation methods primarily operate in 2D image/video latent spaces, which makes maintaining cross-view appearance and geometric consistency inherently challenging. To bridge this gap, we present OneWorld, a framework that performs diffusion directly within a coherent 3D representation space.