AI RESEARCH
The Residual Stream Is All You Need: On the Redundancy of the KV Cache in Transformer Inference
arXiv CS.AI
•
ArXi:2603.19664v1 Announce Type: cross The key-value (KV) cache is widely treated as essential state in transformer inference, and a large body of work engineers policies to compress, evict, or approximate its entries. We prove that this state is entirely redundant: keys and values at every layer are deterministic projections of the residual stream, and recomputing them from a single residual vector per token incurs exactly zero reconstruction error, not approximately, but bit-identically. We verify this across six models from four architecture families (135M to 4B parameters.