Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving

ArXi:2506.05442v2 Announce Type: replace Vision-Language Models (VLMs) offer a promising approach to end-to-end autonomous driving due to their human-like reasoning capabilities. However, troublesome gaps remains between current VLMs and real-world autonomous driving applications. One major limitation is that existing datasets with loosely formatted language descriptions are not machine-friendly and may