EA-WM: Event-Aware Generative World Model with Structured Kinematic-to-Visual Action Fields

ArXi:2605.06192v1 Announce Type: new Pretrained video diffusion models provide powerful spatiotemporal generative priors, making them a natural foundation for robotic world models. While recent world-action models jointly optimize future videos and actions, they predominantly treat video generation as an auxiliary representation for policy learning.