GraphPilot: Grounded Scene Graph Conditioning for Language-Based Autonomous Driving

ArXi:2511.11266v3 Announce Type: replace Vision-language models have recently emerged as promising planners for autonomous driving, where success hinges on topology-aware reasoning over spatial structure and dynamic interactions from multimodal input. However, existing models are typically trained without supervision that explicitly encodes these relational dependencies, limiting their ability to infer how agents and other traffic entities influence one another from raw sensor data.