SWE Context Bench: A Benchmark for Context Learning in Coding

ArXi:2602.08316v2 Announce Type: replace-cross Large language models are increasingly used as programming agents for repository level software engineering tasks. While recent benchmarks evaluate correctness in realistic codebases, they largely treat tasks as independent and do not assess whether agents can reuse previous experience or contexts across related problems. As a result, the ability of agents to accumulate, retrieve, and apply prior experience, as well as the efficiency gains from such reuse, remains difficult to measure. We.