AI RESEARCH
AI Alignment Breaks at the Edge
arXiv CS.CL
•
ArXi:2602.20042v2 Announce Type: replace General Alignment has improved average-case helpfulness and safety, but current alignment practice still rewards confident, single-turn responses. The problem is not only that models fail on edge cases; it is that current evaluation makes many of these failures hard to see. We take the position that alignment must move beyond average-case evaluation by making failures under value conflict, plural stakeholder disagreement, and epistemic ambiguity visible and actionable.