Prompt Versioning in Production: What We Learned Running LLM Agents for 3 Months

Our SDR agent's system prompt went through seven iterations before it stopped guessing email addresses. Here is what that process taught us about treating prompts as production code. We run six AI agents in production, daily, on an automated schedule. Each agent has a system prompt d as a markdown file in a git repository. Over three months, those prompts have accumulated commits than most of our Python scripts. The prompts are the most frequently edited files in the codebase. This was not what we expected. We expected to write a prompt, tune it for a week, and leave it alone.