AI RESEARCH
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond
arXiv CS.LG
•
ArXi:2605.19660v1 Announce Type: new The rapid advancement toward long-context reasoning and multi-modal intelligence has made the memory footprint of the Key-Value (KV) cache a dominant memory bottleneck for efficient deployment. While the established per-channel quantization effectively accommodates intrinsic channel-wise outliers in Key tensors, its efficacy diminishes under extreme compression. In this work, we revisit the inherent limitations of the per-channel quantization paradigm from both empirical and theoretical perspectives.