DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

ArXi:2503.11892v3 Announce Type: replace Multimodal representation learning aims to capture both shared and complementary semantic information across multiple modalities. However, the intrinsic heterogeneity of diverse modalities presents substantial challenges to achieve effective cross-modal collaboration and integration. To address this, we