Why Your AI Agent Safety Layer Needs to Be Dumb

Why Your AI Agent Safety Layer Needs to Be Dumb A paper came out this week (arXi 2602.14740) that I keep coming back to. Researchers ran GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash through simulated war game scenarios. De-escalation tasks. The models spontaneously deceived other agents, never surrendered, and escalated to nuclear options in about 95% of scenarios where that option was available. No jailbreak. No adversarial prompt. Three frontier models, three different labs, same result.