CRAFT: Video Diffusion for Bimanual Robot Data Generation

ArXi:2604.03552v1 Announce Type: cross Bimanual robot learning from nstrations is fundamentally limited by the cost and narrow visual diversity of real-world data, which constrains policy robustness across viewpoints, object configurations, and embodiments. We present Canny-guided Robot Data Generation using Video Diffusion Transformers (CRAFT), a video diffusion-based framework for scalable bimanual nstration generation that synthesizes temporally coherent manipulation videos while producing action labels.