Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries

ArXi:2510.14751v2 Announce Type: replace Next-token prediction (NTP) has driven the success of large language models (LLMs), but it struggles with long-horizon reasoning, planning, and creative writing, with these limitations largely attributed to teacher-forced