Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models

ArXi:2603.20122v1 Announce Type: cross Large Language Models (LLMs) have been widely deployed, especially through free Web-based applications that expose them to diverse user-generated inputs, including those from long-tail distributions such as low-resource languages and encrypted private data. This open-ended exposure increases the risk of jailbreak attacks that undermine model safety alignment.