Reducing Hallucination in Vision-Language Models via Stage-wise Preference Optimization under Distribution Shift

ArXi:2605.16411v1 Announce Type: cross Hallucination remains a fundamental challenge in vision-language models (VLMs), where autoregressive generation may produce linguistically plausible yet physically inconsistent or visually ungrounded responses due to likelihood maximization under joint probabilistic modeling. We propose a stage-wise preference optimization framework for hallucination reduction through targeted multimodal data construction.