A 4-model adversarial review pipeline that costs $0.30 per audit

TL;DR: Single-model code review has correlated blind spots. A panel of frontier models from different vendor families catches more, but only if a final verification step throws out the hallucinations the panel converged on. Crucible is a Claude Code skill that does both for about $0.30 a run. If GPT misses a bug, the runner-up GPT-class model usually misses it too. Same