E0: Enhancing Generalization and Fine-Grained Control in VLA Models via Tweedie Discrete Diffusion

ArXi:2511.21542v2 Announce Type: replace-cross Vision-Language-Action (VLA) models offer a unified framework for robotic manipulation by integrating visual perception, language understanding, and control generation. However, existing VLA systems still struggle to generalize across diverse tasks, scenes, and camera viewpoints, and often produce coarse or unstable actions.