AI RESEARCH

DataClaw: A Process-Oriented Agent Benchmark for Exploratory Real-World Data Analysis

arXiv CS.AI

ArXi:2605.02503v1 Announce Type: new Evaluating autonomous data analysis agents requires testing their ability to perform exploratory analysis in underexplored data environments. However, many existing benchmarks emphasize final answer accuracy in prior-guided data settings and provide limited for reasoning process evaluation. We