AI RESEARCH

From Scenes to Elements: Multi-Granularity Evidence Retrieval for Verifiable Multimodal RAG

arXiv CS.CL

ArXi:2605.15019v1 Announce Type: new Multimodal Retrieval-Augmented Generation (RAG) systems retrieve evidence at coarse granularities (entire images or scenes), creating a mismatch with fine-grained user queries and making failures unverifiable. We