Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models

ArXi:2511.10292v2 Announce Type: replace-cross Large Vision-Language Models (LVLMs) typically process visual inputs as a prefix to the language decoder. As the model autoregressively generates text, this initial visual information inevitably undergoes "dilution" leading the model to over-rely on language priors and hallucinate objects. Existing interventions attempt to correct this by contrasting logits or iteratively refining outputs, but they incur prohibitive latency costs.