GigaWorld-Policy: An Efficient Action-Centered World--Action Model

ArXi:2603.17240v1 Announce Type: new World-Action Models (WAM) initialized from pre-trained video generation backbones have nstrated remarkable potential for robot policy learning. However, existing approaches face two critical bottlenecks that hinder performance and deployment. First, jointly reasoning over future visual dynamics and corresponding actions incurs substantial inference overhead. Second, joint modeling often entangles visual and motion representations, making motion prediction accuracy heavily dependent on the quality of future video forecasts. To address these issues, we