3 AIs Reviewed the Same Codebase. They Disagreed on 2 Findings. That is the Point.

We have a rule at Verivus Labs: before code ships, it gets reviewed by three AI models independently. We require unconditional approval from Claude, Codex, and Gemini before anything merges. We wrote about the mechanics of that process in The Codex Review Gate. That process works well on our own code. We wanted to know whether it finds real things in code we did not write. Code that is already well-maintained and well-structured. Simon Willison's llm is one of the better-engineered CLI tools in the Python ecosystem.