Right Predictions, Misleading Explanations: On the Vulnerability of Vision-Language Model Explanations

ArXi:2605.16651v1 Announce Type: cross Explanation mechanisms are increasingly used to transparency and trust in vision-language models (VLMs), particularly in settings where model decisions require human oversight. However, the robustness of these explanations remains insufficiently understood. In this work, we investigate whether explanation heatmaps in VLMs, particularly CLIP-based models, faithfully reflect model reasoning under adversarial conditions.