CREG: Compass Relational Evidence for Interpreting Spatial Reasoning in Vision-Language Models

ArXi:2603.20475v1 Announce Type: new Vision-language models (VLMs) perform strongly on spatial reasoning benchmarks, yet how they encode directional relations remains poorly understood. Existing attribution methods such as GradCAM and attention rollout reveal where a model attends, but not what direction it infers between objects. We