How Well Do Vision-Language Models Understand Sequential Driving Scenes? A Sensitivity Study

ArXi:2604.06750v1 Announce Type: new Vision-Language Models (VLMs) are increasingly proposed for autonomous driving tasks, yet their performance on sequential driving scenes remains poorly characterized, particularly regarding how input configurations affect their capabilities. We