Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles

ArXi:2506.10848v3 Announce Type: replace-cross Diffusion-based language models (dLLMs) have emerged as a promising alternative to traditional autoregressive LLMs by enabling parallel token generation and significantly reducing inference latency. However, existing sampling strategies for dLLMs, such as confidence-based or semi-autoregressive decoding, often suffer from static behavior, leading to suboptimal efficiency and limited flexibility.