Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM

r/LocalLLaMA
Open Source AI

Greetings from former TurboQuant's biggest defender, now middle-sized niche-aware TurboQuant defender. Today I'm presenting to you the results of me thoroughly exploring the world of PPL and KLD benchmarks with my single RTX 3090 using BeeLlama v0.1.2, with some backstory of unsuccessfully trying other tests and then re-exploring PPL and KLD even thoroughly to compensate. Tests were done with Qwen 3.6 27B ( Q5_K_S and IQ4_XS ) at 64k and 128k context, so a decent model with decent quants at decent context length.