Built a tool for testing AI agents in multi-turn conversations

We built ArkSim which help simulate multi-turn conversations between agents and synthetic users to see how it behaves across longer interactions. This can help find issues like: - Agents losing context during longer interactions - Unexpected conversation paths - Failures that only appear after several turns The idea is to test conversation flows like real interactions, instead of just single prompts and capture issues early on.