From 2:4 to 8:16 sparsity patterns in LLMs for Outliers and Weights with Variance Correction

ArXi:2507.03052v2 Announce Type: replace As large language models (LLMs) grow in size, efficient compression techniques like quantization and sparsification are critical. While quantization maintains performance with reduced precision, structured sparsity methods, such as N:M sparsification, often fall short due to limited flexibility, and sensitivity to outlier weights. We explore semi-structured sparsity, nstrating its ability to surpass the Performance Threshold-where a compressed model matches the accuracy of its uncompressed or smaller counterpart under equivalent memory constraints.