Probing the Reliability of Driving VLMs: From Inconsistent Responses to Grounded Temporal Reasoning

ArXi:2603.09512v1 Announce Type: new A reliable driving assistant should provide consistent responses based on temporally grounded reasoning derived from observed information. In this work, we investigate whether Vision-Language Models (VLMs), when applied as driving assistants, can response consistantly and understand how present observations shape future outcomes, or whether their outputs merely reflect patterns memorized during