AI RESEARCH

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

arXiv CS.LG

ArXi:2605.09649v1 Announce Type: new The key-value (KV) cache is a major bottleneck in long-context inference, where memory and computation grow with sequence length. Existing KV eviction methods reduce this cost but typically degrade performance relative to full-cache inference. Our key insight is that full-cache attention is not always optimal: in long contexts, irrelevant tokens can dilute attention away from useful evidence, so selective, learnable eviction can improve generation rather than merely approximate the full cache. We.