AI RESEARCH
How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers
arXiv CS.LG
•
ArXi:2604.17935v1 Announce Type: new The key-value (KV) cache is the dominant memory bottleneck during Transformer inference, yet little is known theoretically about how aggressively it can be compressed before multi-step reasoning degrades. We study this through $k$-hop pointer chasing on $n$ tokens under a shared KV cache of size $s$, attention dimension $m$, $H$ heads, $p$-bit precision, and a locality-respecting cache controller (satisfied by all standard KV-compression methods). We give three results. (1) Product depth lower bound (conjectured.