Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time

ArXi:2605.01766v1 Announce Type: new Multimodal large language models (MLLMs) have revolutionized the landscape of AI, nstrating impressive capabilities in tackling complex vision and audio-language tasks. However, a critical challenge remains: these models often suffer from hallucinations, generating outputs that diverge from the provided perceptual inputs. This tendency stems from an inherent imbalance in modality utilization during inference, where the dominance of textual tokens undermines the potential of perceptual inputs.