Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

ArXi:2604.06155v1 Announce Type: cross Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, ed by empirical evidence, showing that MTP promotes the convergence toward internal belief states by inducing representational contractivity via gradient coupling.