ResearchEnvBench: Benchmarking Agents on Environment Synthesis for Research Code Execution

ArXi:2603.06739v1 Announce Type: cross Autonomous agents are increasingly expected to scientific research, and recent benchmarks report progress in code repair and autonomous experimentation. However, these evaluations typically assume a pre-configured execution environment, which requires resolving complex software dependencies, aligning hardware and framework versions, and configuring distributed execution, yet this capability remains largely unbenchmarked. We