AI RESEARCH
HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization
arXiv CS.LG
•
ArXi:2605.03562v1 Announce Type: new KV-cache quantizers usually optimize storage-space reconstruction, even though attention reads keys through logits and values through attention-weighted readout. We argue that persistent cache error should be measured in model-visible coordinates. For keys, the visible object is score error modulo constant shifts; this yields HeadQ, a key-side method that s a low-rank residual side code in a calibration-learned query basis and applies it as an additive logit correction.