Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings

ArXi:2511.21893v2 Announce Type: replace Multi-modal foundation models align images, text, and other modalities in a shared embedding space but remain vulnerable to adversarial illusions, where imperceptible perturbations disrupt cross-modal alignment and mislead downstream tasks. To counteract the effects of adversarial illusions, we propose a task-agnostic mitigation mechanism that purifies the attacker's perturbed input using generative models, e.g., Variational Autoencoders (VAEs), to re natural alignment.