CURE: A Multimodal Benchmark for Clinical Understanding and Retrieval Evaluation

ArXi:2603.19274v1 Announce Type: cross Multimodal large language models (MLLMs) nstrate considerable potential in clinical diagnostics, a domain that inherently requires synthesizing complex visual and textual data alongside consulting authoritative medical literature. However, existing benchmarks primarily evaluate MLLMs in end-to-end answering scenarios. This limits the ability to disentangle a model's foundational multimodal reasoning from its proficiency in evidence retrieval and application. We