AutoVQA-G: Self-Improving Agentic Framework for Automated Visual Question Answering and Grounding Annotation

ArXi:2604.17488v1 Announce Type: new Manual annotation of high-quality visual question answering with grounding (VQA-G) datasets, which pair visual questions with evidential grounding, is crucial for advancing vision-language models (VLMs), but remains unscalable. Existing automated methods are often hindered by two key issues: (1) inconsistent data fidelity due to model hallucinations; (2) brittle verification mechanisms based on simple heuristics. To address these limitations, we