3.5× KV cache compression with +0.012 PPL (Mistral 7B, no retraining)
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
AI Research
3.5× KV cache compression with minimal quality loss (+0.012 PPL on Mistral 7B). PR under review in NVIDIA kvpress - happy to share details or benchmarks. submitted by /u/Spirited-Toe-3988 [link] [comments]