AI RESEARCH

Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover

arXiv CS.AI

ArXi:2603.11331v1 Announce Type: cross Adversarial attacks can reliably steer safety-aligned large language models toward unsafe behavior. Empirically, we find that adversarial prompt-injection attacks can amplify attack success rate from the slow polynomial growth observed without injection to exponential growth with the number of inference-time samples.