ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding

ArXi:2604.10916v1 Announce Type: cross Ultrasound acquisition requires skilled probe manipulation and real-time adjustments. Vision-language models (VLMs) could enable autonomous ultrasound systems, but existing benchmarks evaluate only static images, not dynamic procedural understanding. We