AI RESEARCH
FibQuant: Universal Vector Quantization for Random-Access KV-Cache Compression
arXiv CS.AI
•
ArXi:2605.11478v1 Announce Type: new Long-context inference is increasingly a memory-traffic problem. The culprit is the key--value (KV) cache: it grows with context length, batch size, layers, and heads, and it is read at every decoding step. Rotation-based scalar codecs meet this systems constraint by storing a norm, applying a shared random rotation, and quantizing one coordinate at a time. They are universal and random-access, but they discard the geometry created by the normalization step.