AI RESEARCH

Beyond Token Eviction: Mixed-Dimension Budget Allocation for Efficient KV Cache Compression

arXiv CS.LG

ArXi:2603.20616v1 Announce Type: new Key-value (KV) caching is widely used to accelerate transformer inference, but its memory cost grows linearly with input length, limiting long-context deployment. Existing token eviction methods reduce memory by discarding less important tokens, which can be viewed as a coarse form of dimensionality reduction that assigns each token either zero or full dimension.