TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

ArXi:2605.09536v1 Announce Type: cross Diffusion large language models (dLLMs) offer a promising paradigm for parallel text generation, but in practice they face an accuracy-parallelism trade-off, where increasing tokens per forward (TPF) often degrades generation quality. Existing acceleration methods often gain speed at the cost of accuracy. To address this limitation, we propose TAD, a Temporal-Aware trajectory self-Distillation framework.