Self-Rewarding Vision-Language Model via Reasoning Decomposition

ArXi:2508.19652v2 Announce Type: replace Vision-Language Models (VLMs) often suffer from visual hallucinations: generating things that are not consistent with visual inputs and language shortcuts, where they skip the visual part and just rely on text priors. These issues arise because most post