AI RESEARCH

X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving

arXiv CS.AI

ArXi:2603.19979v1 Announce Type: cross Scalable and reliable evaluation is increasingly critical in the end-to-end era of autonomous driving, where vision--language--action (VLA) policies directly map raw sensor streams to driving actions. Yet, current evaluation pipelines still rely heavily on real-world road testing, which is costly, biased toward limited scenario coverage, and difficult to reproduce. These challenges motivate a real-world simulator that can generate realistic future observations under proposed actions, while remaining controllable and stable over long horizons.