AI RESEARCH

From Sycophantic Consensus to Pluralistic Repair: Why AI Alignment Must Surface Disagreement

arXiv CS.LG

ArXi:2605.14912v1 Announce Type: cross Pluralistic alignment is typically operationalised as preference aggregation: producing responses that span (Overton), steer toward (Steerable), or proportionally represent (Distributional) diverse human values. We argue that aggregation alone is an incomplete primitive for deployed pluralistic alignment. Under genuine value pluralism, the failure mode of contemporary RLHF-trained assistants is not insufficient coverage but sycophantic consensus: a learned tendency to agree with, validate, and minimise friction with the immediate interlocutor.