Hiding in Plain Sight: A Steganographic Approach to Stealthy LLM Jailbreaks

ArXi:2505.16765v2 Announce Type: replace-cross Jailbreak attacks pose a serious threat to Large Language Models (LLMs) by bypassing their safety mechanisms. A truly advanced jailbreak is defined not only by its effectiveness but, critically, by its stealthiness. However, existing methods face a fundamental trade-off between semantic stealth (hiding malicious intent) and linguistic stealth (appearing natural), leaving them vulnerable to detection. To resolve this trade-off, we propose StegoAttack, a framework that leverages steganography.