Indexing Multimodal Language Models for Large-scale Image Retrieval

ArXi:2604.13268v1 Announce Type: cross Multimodal Large Language Models (MLLMs) have nstrated strong cross-modal reasoning capabilities, yet their potential for vision-only tasks remains underexplored. We investigate MLLMs as