AI RESEARCH

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention

arXiv CS.CL

ArXi:2603.08026v1 Announce Type: new Masked Diffusion Language Models (MDLMs) enable parallel token decoding, providing a promising alternative to the sequential nature of autoregressive generation. However, their iterative denoising process remains computationally expensive because it repeatedly processes the entire sequence at every step. We observe that across these diffusion steps, most token representations remain stable; only a small subset, which we term salient tokens, contributes meaningfully to the next update. Leveraging this temporal sparsity, we present DyLLM, a