AI RESEARCH

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

arXiv CS.LG

ArXi:2605.17757v1 Announce Type: new INT2 KV-cache quantization is attractive for long-context LLM serving, but it remains difficult to make both accurate and deployable. Simple rotations such as Hadamard transforms reduce outliers, but still degrade at INT2 because they are not aligned with downstream attention. We propose OSCAR, an Ultra-low-bit KV Cache quantization method that estimates attention-aware covariance structures offline and uses them to derive fixed rotations and clipping thresholds for quantization.