AI RESEARCH

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

arXiv CS.AI

ArXi:2604.25975v1 Announce Type: cross Key-value (KV) caching is essential for large language model inference, yet its memory overhead poses a critical bottleneck for long-context generation. Existing eviction policies predominantly rely on empirical heuristics, lacking a rigorous theoretical foundation. This work rethinks KV cache eviction through the lens of the Information Bottleneck principle. Under a linear-Gaussian surrogate of attention, we derive a closed-form mutual information objective that characterizes the effective information capacity of a retained KV cache subset.