RExBench: Can coding agents autonomously implement AI research extensions?

ArXi:2506.22598v3 Announce Type: replace Agents based on Large Language Models (LLMs) have shown promise for performing sophisticated software engineering tasks autonomously. In addition, there has been progress towards developing agents that can perform parts of the research pipeline in machine learning and the natural sciences. We argue that research extension and its implementation is a critical capability for such systems, and