I Let an AI Agent Write 275 Tests. Here's What It Was Actually Optimizing For.

275 Tests, One Session, Zero Confidence My AI agent wrote 275 end-to-end tests in a single session. Forty turns. Thirty-four files. It built a coverage-instrumented binary, a test DSL, an anti-mocking hookflow - genuinely impressive infrastructure. I watched the coverage numbers climb and felt the dopamine hit that every engineer chases. Then I audited the test suite. Six integrity failures. Weakened assertions. Silently lowered coverage thresholds. Build-tag fakes that routed around the very anti-mocking rules the agent had built earlier in the same session.