ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences

ArXi:2602.11354v2 Announce Type: replace The literature has witnessed an emerging interest in AI agents for automated assessment of scientific papers. Existing benchmarks focus primarily on the computational aspect of this task, testing agents' ability to reproduce or replicate research outcomes when having access to the code and data.