MHSA: A Lightweight Framework for Mitigating Hallucinations via Steered Attention in LVLMs

ArXi:2605.14966v1 Announce Type: new Large vision-language models (LVLMs) have achieved remarkable performance across diverse multimodal tasks, yet they continue to suffer from hallucinations, generating content that is inconsistent with the visual input. Prior work DHCP (Detecting Hallucinations by Cross-modal Attention Pattern) has explored hallucination detection from the perspective of cross-modal attention, but does not address hallucination mitigation.