vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models

ArXi:2603.13966v1 Announce Type: new Vision Language Action VLA models are typically evaluated using per benchmark scripts maintained independently by each model repository, leading to duplicated code, dependency conflicts, and underspecified protocols. We present vla eval, an open source evaluation harness that decouples model inference from benchmark execution through a WebSocket msgpack protocol with Docker based environment isolation.