PhysAlign: Physics-Coherent Image-to-Video Generation through Feature and 3D Representation Alignment

ArXi:2603.13770v1 Announce Type: new Video Diffusion Models (VDMs) offer a promising approach for simulating dynamic scenes and environments, with broad applications in robotics and media generation. However, existing models often generate temporally incoherent content that violates basic physical intuition, significantly limiting their practical applicability. We propose PhysAlign, an efficient framework for physics-coherent image-to-video (I2V) generation that explicitly addresses this limitation.