Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

ArXi:2603.24989v1 Announce Type: cross Learning diverse and high-fidelity traffic simulations from human driving nstrations is crucial for autonomous driving evaluation. The recent next-token prediction (NTP) paradigm, widely adopted in large language models (LLMs), has been applied to traffic simulation and achieves iterative improvements via supervised fine-tuning (SFT). However, such methods limit active exploration of potentially valuable motion tokens, particularly in suboptimal regions.