MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction

ArXi:2509.18095v2 Announce Type: replace-cross Universal multimodal embedding models have achieved great success in capturing semantic relevance between queries and candidates. However, current methods either condense queries and candidates into a single vector, potentially limiting the expressiveness for fine-grained information, or produce too many vectors that are prohibitive for multi-vector retrieval. In this work, we