Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks

ArXi:2604.05100v1 Announce Type: cross Instructed code editing, where an LLM modifies existing code based on a natural language instruction, accounts for roughly 19% of real-world coding assistant interactions. Yet very few benchmarks directly evaluate this capability. From a survey of over 150 code-related benchmarks, we find that only two, CanItEdit and EDIT-Bench, target instructed code editing with human-authored instructions and test-based evaluation.