AI RESEARCH

Metaphor-based Jailbreak Attacks on Text-to-Image Models

arXiv CS.AI

ArXi:2512.10766v2 Announce Type: replace-cross Text-to-image (T2I) models commonly incorporate defense mechanisms to prevent the generation of sensitive images. Unfortunately, recent jailbreak attacks have shown that adversarial prompts can effectively bypass these mechanisms and induce T2I models to produce sensitive content, revealing critical safety vulnerabilities. However, existing attack methods implicitly assume that the attacker knows the type of deployed defenses, which limits their effectiveness against unknown or diverse defense mechanisms.