We analyzed 10,000 voice AI calls. The LLM was rarely the problem.

We built Dograh OSS, an open-source voice AI platform. When we started, we assumed most failures would come from the LLM - bad answers, missed intent, prompt edge cases. So we spent a lot of early effort there. Then we looked at the data. We ran automated QA where an LLM reviews every turn in every call and