Understanding Asynchronous Inference Methods for Vision-Language-Action Models

ArXi:2605.08168v1 Announce Type: cross Vision-Language-Action (VLA) models offer a promising path to generalist robot control, but their inference latency causes observation staleness when generated actions are executed asynchronously. Several methods have been proposed concurrently to mitigate this problem: inference-time inpainting (IT