AI RESEARCH

OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension

arXiv CS.AI

ArXi:2604.12782v1 Announce Type: cross While 4-bit quantization is essential for high-throughput deployment of Large Language Models, activation outliers often lead to significant accuracy degradation due to the restricted dynamic range of low-bit formats. In this paper, we systematically investigate the spatial distribution of outliers and nstrate a token-persistent structural clustering effect, where high-magnitude outliers consistently occupy fixed channels across tokens. Building on this insight, we propose OSC, a hardware-efficient framework for outlier suppression.