AI RESEARCH
PSD: Pushing the Pareto Frontier of Diffusion LLMs via Parallel Speculative Decoding
arXiv CS.CL
•
ArXi:2605.15609v1 Announce Type: new Diffusion large language models (dLLMs) generate text by iteratively denoising masked token sequences. Although dLLMs can predict all masked positions in parallel within each step, the large number of denoising iterations still makes inference expensive. This cost can be reduced spatially by unmasking multiple tokens per step, or temporally by collapsing multiple denoising steps into one verification call. We propose Parallel Speculative Decoding (PSD), a.