Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration

ArXi:2502.01969v2 Announce Type: replace-cross Large Vision-Language Models (LVLMs) exhibit impressive multimodal reasoning capabilities but remain highly susceptible to object hallucination, where models generate responses that are not factually aligned with the visual content. Recent works attribute this issue to an inherent bias of LVLMs where the vision token attention map has spurious focus on certain positions, and propose to mitigate this issue by reordering visual tokens.