$\Delta$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation

ArXi:2603.08361v1 Announce Type: new Recent vision-language-action (VLA) models have significantly advanced robotic manipulation by unifying perception, reasoning, and control. To achieve such integration, recent studies adopt a predictive paradigm that models future visual states or world knowledge to guide action generation. However, these models emphasize forecasting outcomes rather than reasoning about the underlying process of change, which is essential for determining how to act.