IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs

ArXi:2604.10539v1 Announce Type: cross Key-Value (KV) cache plays a crucial role in accelerating inference in large language models (LLMs) by storing intermediate attention states and avoiding redundant computation during autoregressive generation. However, its memory footprint scales linearly with sequence length, often leading to severe memory bottlenecks on resource-constrained hardware.