DriveDreamer-Policy: A Geometry-Grounded World-Action Model for Unified Generation and Planning

ArXi:2604.01765v1 Announce Type: new Recently, world-action models (WAM) have emerged to bridge vision-language-action (VLA) models and world models, unifying their reasoning and instruction-following capabilities and spatio-temporal world modeling. However, existing WAM approaches often focus on modeling 2D appearance or latent representations, with limited geometric grounding-an essential element for embodied systems operating in the physical world.