AI RESEARCH

Jailbroken Frontier Models Retain Their Capabilities

arXiv CS.AI

ArXi:2605.00267v1 Announce Type: cross As language model safeguards become robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a "jailbreak tax" that degrades the target model's task performance. We show that this tax scales inversely with model capability and that the most advanced jailbreaks effectively yield no reduction in model capabilities.