AI RESEARCH

Knowledge Packs: Zero-Token Knowledge Delivery via KV Cache Injection

arXiv CS.CL

ArXi:2604.03270v1 Announce Type: new RAG wastes tokens. We propose Knowledge Packs: pre-computed KV caches that deliver the same knowledge at zero token cost. For causal transformers, the KV cache from a forward pass on text F is identical to what a joint pass on F+q would produce - this follows directly from the causal mask. The equivalence is exact but fragile: wrong chat template formatting causes 6-7pp degradation, which we believe explains prior claims of KV outperforming RAG.