The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

ArXi:2511.21331v2 Announce Type: replace-cross Learning joint representations across multiple modalities remains a central challenge in multimodal machine learning. Prevailing approaches predominantly operate in pairwise settings, aligning two modalities at a time. While some recent methods aim to capture higher-order interactions among multiple modalities, they often overlook or insufficiently preserve pairwise relationships, limiting their effectiveness on single-modality tasks. In this work, we.