When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise

ArXi:2605.05045v1 Announce Type: cross Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets.