AI RESEARCH

AMES: Approximate Multi-modal Enterprise Search via Late Interaction Retrieval

arXiv CS.LG

ArXi:2603.13537v1 Announce Type: cross We present AMES (Approximate Multimodal Enterprise Search), a unified multimodal late interaction retrieval architecture which is backend agnostic. AMES nstrates that fine-grained multimodal late interaction retrieval can be deployed within a production grade enterprise search engine without architectural redesign. Text tokens, image patches, and video frames are embedded into a shared representation space using multi-vector encoders, enabling cross-modal retrieval without modality specific retrieval logic.