AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite

ArXi:2510.21652v2 Announce Type: replace-cross AI agents hold the potential to revolutionize scientific productivity by automating literature reviews, replicating experiments, analyzing data, and even proposing new directions of inquiry; indeed, there are now many such agents, ranging from general-purpose "deep research" systems to specialized science-specific agents, such as AI Scientist and AIGS. Rigorous evaluation of these agents is critical for progress.