VLM-AutoDrive: Post-Training Vision-Language Models for Safety-Critical Autonomous Driving Events

ArXi:2603.18178v1 Announce Type: cross The rapid growth of ego-centric dashcam footage presents a major challenge for detecting safety-critical events such as collisions and near-collisions, scenarios that are brief, rare, and difficult for generic vision models to capture. While multimodal large language models (MLLMs) nstrate strong general reasoning ability, they underperform in driving contexts due to domain and temporal misalignment. VLM-AutoDrive offers a scalable recipe for adapting general-purpose VLMs to safety-critical, temporally localized perception tasks.