Premover: Fast Vision-Language-Action Control by Acting Before Instructions Are Complete

ArXi:2605.12160v1 Announce Type: cross Vision-Language-Action (VLA) policies are typically evaluated as if the user had finished typing or speaking before the robot begins acting. In real deployment, however, users take several seconds to enter a request, leaving the policy idle for a substantial fraction of the interaction. We