How I Test My MCP Agent Without Burning Tokens

Last month I shipped an MCP agent that triages GitHub issues. It works great - until it silently breaks and nobody notices. Here are the last three bugs I hit: I tweaked the system prompt. The agent stopped calling create_issue and just summarised the bug report in plain text. CI didn't catch it - CI tests the code, not the agent behavior. I swapped Sonnet for Haiku to save cost. The agent started calling list_issues four times before each create_issue. Integration tests still passed. Token bill tripled. GitHub rate-limited me mid-test. The entire pytest suite went red.