AI RESEARCH

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

arXiv CS.AI

ArXi:2605.12991v1 Announce Type: cross LLM-based multi-agent pipelines flip from correct to incorrect answers under simulated peer disagreement at rates we term yield, a vulnerability widely attributed to RLHF-induced sycophancy. We test this attribution across four model families and find it largely wrong: pretrained base models exhibit the same substitution pattern as their Instruct variants, averaging higher yield than Instruct.