KLD measurements of 8 different llama.cpp KV cache quantizations over several 8-12B models
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
A couple of weeks ago i was wondering about the impact of KV quantization, so i tried looking for any PPL or KLD measurements but didn't find anything extensive. I did some of my own and these are the results. Models included: Qwen3.5 9B, Qwen3 VL 8B, Gemma 3 12B, Ministral 3 8B, Irix 12B (Mistral Nemo) Disclaimers I am very GPU poor with a meager 6gb of vram,. therefore. all logits were generated with already quantized models (in this case they're all IQ4_XS), so that i could actually run them.