KLD measurements of 8 different llama.cpp KV cache quantizations over several 8-12B models

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

A couple of weeks ago i was wondering about the impact of KV quantization, so i tried looking for any PPL or KLD measurements but didn't find anything extensive. I did some of my own and these are the results. Models included: Qwen3.5 9B, Qwen3 VL 8B, Gemma 3 12B, Ministral 3 8B, Irix 12B (Mistral Nemo) Disclaimers I am very GPU poor with a meager 6gb of vram,. therefore. all logits were generated with already quantized models (in this case they're all IQ4_XS), so that i could actually run them.