Beyond Shortcuts: Mitigating Visual Illusions in Frozen VLMs via Qualitative Reasoning

ArXi:2604.26250v1 Announce Type: new While Vision-Language Models (VLMs) have achieved state-of-the-art performance in general visual tasks, their perceptual robustness remains remarkably brittle when confronted with optical illusions. These failures are often attributed to shortcut heuristics, where models prioritize linguistic priors and memorized prototypes over direct visual evidence. In this work, we propose Structured Qualitative Inference (SQI), a