UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

ArXi:2604.02190v1 Announce Type: new Vision-Language-Action (VLA) models have recently emerged in autonomous driving, with the promise of leveraging rich world knowledge to improve the cognitive capabilities of driving systems. However, adapting such models for driving tasks currently faces a critical dilemma between spatial perception and semantic reasoning.