DEX-AR: A Dynamic Explainability Method for Autoregressive Vision-Language Models

ArXi:2603.06302v1 Announce Type: cross As Vision-Language Models (VLMs) become increasingly sophisticated and widely used, it becomes and crucial to understand their decision-making process. Traditional explainability methods, designed for classification tasks, struggle with modern autoregressive VLMs due to their complex token-by-token generation process and intricate interactions between visual and textual modalities.