AI RESEARCH

PLUME: Latent Reasoning Based Universal Multimodal Embedding

arXiv CS.CV

ArXi:2604.02073v1 Announce Type: new Universal multimodal embedding (UME) maps heterogeneous inputs into a shared retrieval space with a single model. Recent approaches improve UME by generating explicit chain-of-thought (CoT) rationales before extracting embeddings, enabling multimodal large language models to better infer complex query intent. However, explicit CoT incurs substantial inference overhead and can compress rich multimodal evidence into a narrow textual bottleneck.