AI RESEARCH

Transactional Attention: Semantic Sponsorship for KV-Cache Retention

arXiv CS.LG

ArXi:2604.11288v1 Announce Type: cross At K=16 tokens (0.4% of a 4K context), every existing KV-cache compression method achieves 0% on credential retrieval. The failure mode is dormant tokens: credentials, API keys, and configuration values that receive near-zero attention but become essential at generation time. Because these tokens lack the statistical signals that eviction policies rely on, no method based on attention scores, reconstruction loss, or learned retention gates retains them. We.