SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

ArXi:2603.03823v2 Announce Type: replace-cross Large language model (LLM)-powered agents have nstrated strong capabilities in automating software engineering tasks such as static bug fixing, as evidenced by benchmarks like SWE-bench. However, in the real world, the development of mature software is typically predicated on complex requirement changes and long-term feature iterations -- a process that static, one-shot repair paradigms fail to capture.