AI RESEARCH

AJF: Adaptive Jailbreak Framework Based on the Comprehension Ability of Black-Box Large Language Models

arXiv CS.CL

ArXi:2505.23404v5 Announce Type: replace Recent advancements in adversarial jailbreak attacks have exposed critical vulnerabilities in Large Language Models (LLMs), enabling the circumvention of alignment safeguards through increasingly sophisticated prompt manipulations. Our experiments find that the effectiveness of jailbreak strategies is influenced by the comprehension ability of the target LLM. Building on this insight, we propose an Adaptive Jailbreak Framework (AJF) based on the comprehension ability of black-box large language models.