AI RESEARCH
Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation
arXiv CS.AI
•
ArXi:2605.13277v1 Announce Type: cross Visual evidence selection is a critical component of multimodal retrieval-augmented generation (RAG), yet existing methods typically rely on semantic relevance or surface-level similarity, which are often misaligned with the actual utility of visual evidence for downstream reasoning. We reformulate multimodal evidence selection from an information-theoretic perspective by defining evidence utility as the information gain induced on a model's output distribution. To overcome the intractability of answer-space optimization, we.