UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings

ArXi:2505.11815v2 Announce Type: replace Current vision-language models have been explored for multi-modal embedding tasks like information retrieval. However, they face significant challenges in real-world queries and targets involving diverse modality combinations, as existing approaches often fail to align all modality combinations within a unified embedding space during