Descriptor: Distance-Annotated Traffic Perception Question Answering (DTPQA)

ArXi:2511.13397v2 Announce Type: replace The remarkable progress of Vision-Language Models (VLMs) on a variety of tasks has raised interest in their application to automated driving. However, for these models to be trusted in such a safety-critical domain, they must first possess robust perception capabilities, i.e., they must be capable of understanding a traffic scene, which can often be highly complex, with many things happening simultaneously.