AI RESEARCH
Beyond Token Eviction: Mixed-Dimension Budget Allocation for Efficient KV Cache Compression
arXiv CS.LG
•
ArXi:2603.20616v1 Announce Type: new Key-value (KV) caching is widely used to accelerate transformer inference, but its memory cost grows linearly with input length, limiting long-context deployment. Existing token eviction methods reduce memory by discarding less important tokens, which can be viewed as a coarse form of dimensionality reduction that assigns each token either zero or full dimension.