AI RESEARCH

Pixel Motion Diffusion is What We Need for Robot Control

arXiv CS.CV

ArXi:2509.22652v2 Announce Type: replace-cross We present DAWN (Diffusion is All We Need for robot control), a unified diffusion-based framework for language-conditioned robotic manipulation that bridges high-level motion intent and low-level robot action via structured pixel motion representation. In DAWN, both the high-level and low-level controllers are modeled as diffusion processes, yielding a fully trainable, end-to-end system with interpretable intermediate motion abstractions.