AI RESEARCH

Efficient and High-Fidelity Omni Modality Retrieval

arXiv CS.CL

ArXi:2603.02098v2 Announce Type: replace-cross Multimodal retrieval is the task of aggregating information from queries across heterogeneous modalities to retrieve desired targets. State-of-the-art multimodal retrieval models can understand complex queries, yet they are typically limited to two modalities: text and vision. This limitation impedes the development of universal retrieval systems capable of comprehending queries that combine than two modalities.