Action Draft and Verify: A Self-Verifying Framework for Vision-Language-Action Model

ArXi:2603.18091v1 Announce Type: new Vision-Language-Action (VLA) models have recently nstrated strong performance across embodied tasks. Modern VLAs commonly employ diffusion action experts to efficiently generate high-precision continuous action chunks, while auto-regressive generation can be slower and less accurate at low-level control. Yet auto-regressive paradigms still provide complementary priors that can improve robustness and generalization in out-of-distribution environments.