Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning

ArXi:2604.05834v1 Announce Type: new Multimodal contrastive learning is increasingly enriched by going beyond image-text pairs. Among recent contrastive methods, Symile is a strong approach for this challenge because its multiplicative interaction objective captures higher-order cross-modal dependence. Yet, we find that Symile treats all modalities symmetrically and does not explicitly model reliability differences, a limitation that becomes especially present in trimodal multiplicative interactions.