JULI: Jailbreak Large Language Models by Self-Introspection

ArXi:2505.11790v4 Announce Type: replace Large Language Models (LLMs) are trained with safety alignment to prevent generating malicious content. Although some attacks have highlighted vulnerabilities in these safety-aligned LLMs, they typically have limitations, such as necessitating access to the model weights or the generation process. Since