AI RESEARCH

Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas

arXiv CS.CV

ArXi:2603.28980v1 Announce Type: new The synthesis of immersive 3D scenes from text is rapidly maturing, driven by novel video generative models and feed-forward 3D reconstruction, with vast potential in AR/VR and world modeling. While panoramic images have proven effective for scene initialization, existing approaches suffer from a trade-off between visual fidelity and explorability: autoregressive expansion suffers from context drift, while panoramic video generation is limited to low resolution.