AI RESEARCH
GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization
arXiv CS.CV
•
ArXi:2605.07399v1 Announce Type: new Diffusion Vision-Language Models (dVLMs), built upon the non-causal foundations of Diffusion Large Language Models (dLLMs), have nstrated remarkable efficacy in multimodal tasks by departing from the traditional autoregressive generation paradigm. While dVLMs appear inherently robust against conventional jailbreak tactics, which we categorize as Fixed Prefix Optimization (FPO) (e.g., anchoring responses with "Sure, here is"), this perceived resilience is deceptive.