DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

ArXi:2604.25231v1 Announce Type: cross Diagram question answering (DQA) requires models to interpret structured visual representations such as charts, maps, infographics, circuit schematics, and scientific diagrams. Recent vision-language models (VLMs) often achieve high answer accuracy on these tasks, yet correct answers do not guarantee that models ground their reasoning in the diagram regions that the prediction. Models may instead rely on textual correlations or dataset artifacts without identifying the visual evidence required to verify the answer.