Understanding and Mitigating Hallucinations in Multimodal Chain-of-Thought Models

ArXi:2603.27201v1 Announce Type: new Multimodal Chain-of-Thought (MCoT) models have nstrated impressive capability in complex visual reasoning tasks. Unfortunately, recent studies reveal that they suffer from severe hallucination problems due to diminished visual attention during the generation process. However, visual attention decay is a well-studied problem in Large Vision-Language Models (LVLMs