Beyond Pixels: Introspective and Interactive Grounding for Visualization Agents

ArXi:2604.21134v1 Announce Type: new Vision-Language Models (VLMs) frequently misread values, hallucinate details, and confuse overlapping elements in charts. Current approaches rely solely on pixel interpretation, creating a Pixel-Only Bottleneck: agents treat interactive charts as static images, losing access to the structured specification that encodes exact values. We