DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference

ArXi:2604.15750v1 Announce Type: cross Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive language generation due to their potential for parallel decoding and global refinement of the entire sequence. To unlock this potential, DLM inference must carefully balance generation quality and decoding speed. Recent block-wise DLM decoding methods improve this trade-off by performing diffusion-based decoding sequentially in blocks.