AI RESEARCH

Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models

arXiv CS.CL

ArXi:2603.16192v1 Announce Type: new Modern LLMs employ safety mechanisms that extend beyond surface-level input filtering to latent semantic representations and generation-time reasoning, enabling them to recover obfuscated malicious intent during inference and refuse accordingly, and rendering many surface-level obfuscation jailbreak attacks ineffective. We propose Structured Semantic Cloaking (S2C), a novel multi-dimensional jailbreak attack framework that manipulates how malicious semantic intent is reconstructed during model inference.