Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding

ArXi:2503.10183v4 Announce Type: replace Existing vision-language models (VLMs) often suffer from visual hallucination, where the generated responses contain inaccuracies that are not grounded in the visual input. Efforts to address this issue without model finetuning primarily mitigate hallucination by contrastively reducing language biases or amplifying the weights of visual embedding during decoding. However, these approaches remain limited in their ability to capture fine-grained visual details.