Opus 4.7 (high) takes #1 on the LLM Debate Benchmark, leading the previous champion, Sonnet 4.6 (high), by 106 BT points. Incredibly, it has not lost a single completed side-swapped matchup: 51 wins, 4 ties, and 0 losses.
r/singularity
•
Generative AI
AI Research
Info, transcripts, model profiles, comparisons: Models debate the same motion twice with sides swapped. Opus 4.7 often wins by finding the hinge of the debate, dragging the whole exchange back to it, and forcing the other model to defend on its terms. Each completed debate is judged by a three-model panel. Panels avoid same-family judges against the debaters. submitted by /u/zero0_one1 [link] [comments]