CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

ArXi:2605.12882v1 Announce Type: new Multimodal Large Language Models (MLLMs) have significantly advanced document understanding, yet current Doc-VQA evaluations score only the final answer and leave the ing evidence unchecked. This answer-only approach masks a critical failure mode: a model can land on the correct answer while grounding it in the wrong passage -- a critical risk in high-stakes domains like law, finance, and medicine, where every