Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs

ArXi:2505.00744v5 Announce Type: replace Medical Large Multi-modal Models (LMMs) have nstrated remarkable capabilities in medical data interpretation. However, these models frequently generate hallucinations contradicting source evidence, particularly due to inadequate localization reasoning. This work reveals a critical limitation in current medical LMMs: instead of analyzing relevant pathological regions, they often rely on linguistic patterns or attend to irrelevant image areas when responding to disease-related queries. To address this, we