AI RESEARCH

SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents

arXiv CS.AI

ArXi:2603.29139v1 Announce Type: new Recent advances in large language models (LLMs) have enabled agentic systems that translate natural language intent into executable scientific visualization (SciVis) tasks. Despite rapid progress, the community lacks a principled and reproducible benchmark for evaluating these emerging SciVis agents in realistic, multi-step analysis settings. We present SciVisAgentBench, a comprehensive and extensible benchmark for evaluating scientific data analysis and visualization agents.