AI RESEARCH
Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction
arXiv CS.LG
•
ArXi:2605.13950v1 Announce Type: new Autonomous language-model agents are increasingly evaluated on long-horizon tool-use tasks, but existing benchmarks rarely capture the complexity and nuance of real scientific work. To address this gap, we