Probing Visual Concepts in Lightweight Vision-Language Models for Automated Driving

ArXi:2603.06054v1 Announce Type: cross The use of Vision-Language Models (VLMs) in automated driving applications is becoming increasingly common, with the aim of leveraging their reasoning and generalisation capabilities to handle long tail scenarios. However, these models often fail on simple visual questions that are highly relevant to automated driving, and the reasons behind these failures remain poorly understood.