AI RESEARCH
Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover
arXiv CS.AI
•
ArXi:2603.11331v1 Announce Type: cross Adversarial attacks can reliably steer safety-aligned large language models toward unsafe behavior. Empirically, we find that adversarial prompt-injection attacks can amplify attack success rate from the slow polynomial growth observed without injection to exponential growth with the number of inference-time samples.