AI RESEARCH

The Evaluation Differential: When Frontier AI Models Recognise They Are Being Tested

arXiv CS.AI

ArXi:2605.11496v1 Announce Type: new Recent published evidence from frontier laboratories shows that contemporary AI models can recognise evaluation contexts, latently represent them, and behave differently under those contexts than under deployment-continuous conditions. Anthropic's BrowseComp incident, the Natural Language Autoencoder findings on SWE-bench Verified and destructive-coding evaluations, and the OpenAI / Apollo anti-scheming work all document instances of this phenomenon. We argue that these findings create a claim-validity problem for safety.