Scaling Cross-Environment Failure Reasoning Data for Vision-Language Robotic Manipulation

ArXi:2512.01946v3 Announce Type: replace-cross Robust robotic manipulation requires reliable failure detection and recovery. Although recent Vision-Language Models (VLMs) show promise in robot failure detection, their generalization is severely limited by the scarcity and narrow coverage of failure data. To address this bottleneck, we propose an automatic framework for generating diverse robotic planning and execution failures across both simulated and real-world environments.