How I Test an AI Support Agent: A Practical Testing Pyramid

A walkthrough of the six testing layers I use to catch regressions, policy drift, hallucinations, and adversarial exploits in a B2B SaaS agent - with an open-source repo you can fork and try yourself. I built an AI agent. It looks up invoices, checks subscriptions, drafts MFA resets, escalates tickets, and refuses prompt injections - all against a real SQLite database and a local documentation corpus. It uses the OpenAI API for reasoning and tool calling. Then I asked: how do I actually test this thing? The answer is not one tool.