OneDrive: Unified Multi-Paradigm Driving with Vision-Language-Action Models

ArXi:2604.17915v1 Announce Type: new Vision-Language Models(VLMs) excel at autoregressive text generation, yet end-to-end autonomous driving requires multi-task learning with structured outputs and heterogeneous decoding behaviors, such as autoregressive language generation, parallel object detection and trajectory regression. To accommodate these differences, existing systems typically