AI RESEARCH
Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression
arXiv CS.CL
•
ArXi:2505.13527v3 Announce Type: replace Despite substantial advancements in aligning large language models (LLMs) with human values, current safety mechanisms remain susceptible to jailbreak attacks. We hypothesize that this vulnerability stems from distributional discrepancies between alignment-oriented prompts and malicious prompts. To investigate this, we