EvoJail: Evolutionary Diverse Jailbreak Prompt Generation for Large Language Models

ArXi:2605.02921v1 Announce Type: cross As LLMs continue to shape real-world applications, automated jailbreak generation becomes essential to reveal safety weaknesses and guide model improvement. Existing automatic jailbreak generation methods have not yet fully considered two important aspects: adaptability to evolving safety-finetuned models, which affects their effectiveness on newer model versions, and diversity in generated prompts, which can cause narrow or repetitive attack patterns.