AI RESEARCH
Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone
arXiv CS.AI
•
ArXi:2605.04454v1 Announce Type: new Alignment evaluation in machine learning has largely become evaluation of models. Influential benchmarks score model outputs under fixed inputs, such as truthfulness, instruction following, or pairwise preference, and these scores are often used to claims about deployed alignment. This paper argues that deployment-relevant alignment cannot be inferred from model-level evaluation alone. Alignment claims should instead be indexed to the level at which evidence is collected: model-level, response-level, interaction-level, or deployment-level.