AI RESEARCH
Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
arXiv CS.LG
•
ArXi:2602.06412v3 Announce Type: replace-cross Masked Diffusion Language Models generate sequences via iterative sampling that progressively unmasks tokens. However, they still recompute the attention and feed-forward blocks for every token position at every step -- even when many unmasked tokens are essentially fixed, resulting in substantial waste in compute.