MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs

ArXi:2506.12876v2 Announce Type: replace The rapid scaling of large language models~(LLMs) has made inference efficiency a primary bottleneck in the practical deployment. To address this, semi-structured sparsity offers a promising solution by strategically retaining $N$ elements out of every $M$ weights, thereby enabling hardware-friendly acceleration and reduced memory.