Toward Scalable Audio Description Quality Control: A Workflow for Evaluating Human and VLM Raters

ArXi:2602.01390v2 Announce Type: replace-cross Digital video is central to communication, education, and entertainment, but without audio description (AD), blind and low-vision users are excluded. While crowdsourced platforms and vision-language models (VLMs) expand AD production, quality is rarely checked systematically. Existing evaluations rely on NLP metrics and short-clip guidelines, leaving open the question of how to assess long-form AD quality at scale.