Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols

ArXi:2512.02787v3 Announce Type: replace-cross Vision-Language-Action (VLA) models have recently achieved remarkable progress in robotic manipulation, yet they remain limited in failure diagnosis and learning from failures. Additionally, existing failure datasets are mostly generated programmatically in simulation, which limits their generalization to the real world. In light of these, we