Needle in the Repo: A Benchmark for Maintainability in AI-Generated Repository Edits

ArXi:2603.27745v1 Announce Type: cross AI coding agents can now complete complex programming tasks, but existing evaluations largely emphasize behavioral correctness and often overlook maintainability risks such as weak modularity or testability. We present Needle in the Repo (NITR), a diagnostic probe-and-oracle framework for evaluating whether behaviorally correct repository edits preserve maintainable structure.