AI RESEARCH

Why we no longer evaluate SWE-bench Verified

OpenAI Blog

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and