High-Entropy Tokens as Multimodal Failure Points in Vision-Language Models

ArXi:2512.21815v2 Announce Type: replace-cross Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, as a measure of model uncertainty, is highly correlated with VLM reliability.