AI RESEARCH
Your Vision-Language-Action Model Already Has Attention Heads For Path Deviation Detection
arXiv CS.CV
•
ArXi:2603.13782v1 Announce Type: cross Vision-Language-Action (VLA) models have nstrated strong potential for predicting semantic actions in navigation tasks, nstrating the ability to reason over complex linguistic instructions and visual contexts. However, they are fundamentally hindered by visual-reasoning hallucinations that lead to trajectory deviations. Addressing this issue has conventionally required