LLM Sycophancy Benchmark: Opposite-Narrator Contradictions. Same dispute, opposite first-person perspectives. Does the model keep the same judgment or start agreeing with whoever is speaking?

r/singularity • March 10, 2026

Generative AI Open Source AI AI Research

Gemini 3.1 Pro and GPT-5.4 Reasoning have the lowest headline sycophancy rates, while Mistral Large 3 and GPT-4.1 fare the worst. Once contrarian contradictions are counted (cases where the model rejects both narrators on the same dispute), Grok 4.20 Reasoning Beta comes out well ahead. 199 verified cases.

Read Full Article