AI RESEARCH
ClipTTT: CLIP-Guided Test-Time Training Helps LVLMs See Better
arXiv CS.CV
•
ArXi:2603.26486v1 Announce Type: new Large vision-language models (LVLMs) tend to hallucinate, especially when visual inputs are corrupted at test time. We show that such corruptions act as additional distribution shifts, significantly amplifying hallucination rates in real-world applications. To address this, we propose CLIP-guided Test-Time