DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models

ArXi:2511.15669v2 Announce Type: replace Does Chain-of-Thought (CoT) reasoning genuinely improve Vision-Language-Action (VLA) models, or does it merely add overhead? Existing CoT-VLA systems report limited and inconsistent gains, yet no prior work has rigorously diagnosed when and why CoT helps robots act.