CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering

ArXi:2605.04641v1 Announce Type: new Although Large Vision-Language Models (LVLMs) have nstrated remarkable performance on downstream tasks, they frequently produce contents that deviate from visual information, leading to object hallucination. To tackle this, recent works mostly depend on expensive manual annotations and