Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models

ArXi:2510.16727v2 Announce Type: replace Large language models internalize a structural trade-off between truthfulness and obsequious flattery, emerging from reward optimization that conflates helpfulness with polite submission. This latent bias, known as sycophancy, manifests as a preference for user agreement over principled reasoning. We