⚡️The End of SWE-Bench Verified — Mia Glaese & Olivia Watkins, OpenAI Frontier Evals & Human Data
Latent Space
•
Generative AI
It's time to take the next step up in frontier agent evals.