Benchmarking AI Agents on Code Maintenance Is Finally Here

Towards AI
Generative AI

They’re Mostly Failing