AI RESEARCH
LoSA: Locality Aware Sparse Attention for Block-Wise Diffusion Language Models
arXiv CS.LG
•
ArXi:2604.12056v1 Announce Type: cross Block-wise diffusion language models (DLMs) generate multiple tokens in any order, offering a promising alternative to the autoregressive decoding pipeline. However, they still remain bottlenecked by memory-bound attention in long-context scenarios. Naive sparse attention fails on DLMs due to a KV Inflation problem, where different queries select different prefix positions, making the union of accessed KV pages large.