When One Modality Rules Them All: Backdoor Modality Collapse in Multimodal Diffusion Models

ArXi:2603.06508v1 Announce Type: new While diffusion models have revolutionized visual content generation, their rapid adoption has underscored the critical need to investigate vulnerabilities, e.g., to backdoor attacks. In multimodal diffusion models, it is natural to expect that attacking multiple modalities simultaneously (e.g., text and image) would yield complementary effects and strengthen the overall backdoor.