I built an open source SDK to catch AI agent regressions before they ship.

I built an open source SDK to catch AI agent regressions before they ship You fix a bug in your agent. A week later you change the prompt or swap the model. The same bug comes back. Nobody notices until a user does. Regular software has regression tests for this. AI agents mostly do not. So I built replayd. When your agent fails, you capture that run and save it as a test. Before you ship a new version, you replay the saved failures against it. If the same failure comes back, you catch it before your users do. pip install replayd The interesting part was grading.