AI RESEARCH
SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents
arXiv CS.AI
•
ArXi:2603.29139v1 Announce Type: new Recent advances in large language models (LLMs) have enabled agentic systems that translate natural language intent into executable scientific visualization (SciVis) tasks. Despite rapid progress, the community lacks a principled and reproducible benchmark for evaluating these emerging SciVis agents in realistic, multi-step analysis settings. We present SciVisAgentBench, a comprehensive and extensible benchmark for evaluating scientific data analysis and visualization agents.