AI RESEARCH
The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference
arXiv CS.AI
•
ArXi:2604.15409v1 Announce Type: cross KV caching is a ubiquitous optimization in autoregressive transformer inference, long presumed to be numerically equivalent to cache-free computation. This assumption fails under standard FP16 precision: cache-ON and cache-OFF execution paths employ different floating-point accumulation orderings which, due to FP16 non-associativity, produce a deterministic divergence in decoded token sequences.