ClipTTT: CLIP-Guided Test-Time Training Helps LVLMs See Better

ArXi:2603.26486v1 Announce Type: new Large vision-language models (LVLMs) tend to hallucinate, especially when visual inputs are corrupted at test time. We show that such corruptions act as additional distribution shifts, significantly amplifying hallucination rates in real-world applications. To address this, we propose CLIP-guided Test-Time