AI RESEARCH

Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models

arXiv CS.LG

ArXi:2603.11149v1 Announce Type: new Large language models remain vulnerable to jailbreak attacks, yet we still lack a systematic understanding of how jailbreak success scales with attacker effort across methods, model families, and harm types. We initiate a scaling-law framework for jailbreaks by treating each attack as a compute-bounded optimization procedure and measuring progress on a shared FLOPs axis.