AI RESEARCH

Response Time Enhances Alignment with Heterogeneous Preferences

arXiv CS.LG

ArXi:2605.06987v1 Announce Type: new Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable.