Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

ArXi:2605.11459v1 Announce Type: cross Vision-Language-Action (VLA) models achieve remarkable flexibility and generalization beyond classical control paradigms. However, most prevailing VLAs are trained under a single-frame observation paradigm, which leaves them structurally blind to temporal dynamics. Consequently, these models degrade severely in non-stationary scenarios, even when trained or finetuned on dynamic datasets. Existing approaches either require expensive re