AI RESEARCH

vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models

arXiv CS.AI

ArXi:2603.13966v1 Announce Type: new Vision Language Action VLA models are typically evaluated using per benchmark scripts maintained independently by each model repository, leading to duplicated code, dependency conflicts, and underspecified protocols. We present vla eval, an open source evaluation harness that decouples model inference from benchmark execution through a WebSocket msgpack protocol with Docker based environment isolation.