ST-Gen4D: Embedding 4D Spatiotemporal Cognition into World Model for 4D Generation

ArXi:2605.07390v1 Announce Type: new Generative models have achieved success in producing apparently coherent 2D videos, but remain challenging in the physical world due to lack of 4D spatiotemporal scale. Typically, existing 4D generative models directly embed macro scale constraints to enhance overall spatiotemporal consistency. However, these methods only ensure global appearance coherence and fail to reveal the local dynamics of the physical world.