How VLAs (Really) Work In Open-World Environments

ArXi:2604.21192v1 Announce Type: cross Vision-language-action models (VLAs) have been extensively used in robotics applications, achieving great success in various manipulation problems. recently, VLAs have been used in long-horizon tasks and evaluated on benchmarks, such as BEHAVIOR1K (B1K), for solving complex household chores. The common metric for measuring progress in such benchmarks is success rate or partial score based on satisfaction of progress-agnostic criteria, meaning only the final states of the objects are considered, regardless of the events that lead to such states.