AI RESEARCH

JANUS: A Lightweight Framework for Jailbreaking Text-to-Image Models via Distribution Optimization

arXiv CS.LG

ArXi:2603.21208v1 Announce Type: cross Text-to-image (T2I) models such as Stable Diffusion and DALLE remain susceptible to generating harmful or Not-Safe-For-Work (NSFW) content under jailbreak attacks despite deployed safety filters. Existing jailbreak attacks either rely on proxy-loss optimization instead of the true end-to-end objective, or depend on large-scale and costly RL-trained generators.