Qwen3.5-4B GGUF quants comparison (KLD vs speed) - Lunar Lake

r/LocalLLaMA
Generative AI

I wanted to know which type of quant is the best on this laptop (Intel 258V - iGPU 140V 18GB), so I tested all these small quants hoping that it generalizes to bigger models: Winners in bold (KLD≤0.01) Uploader Quant tk/s KLD GB KLD/GB* mradermacher* Q4_0 28.97 0.052659918 2.37 0.04593 mradermacher_i1 Q4_0 28.89 0.059171561 2.37 0.05162 mradermacher_i1 IQ3_XXS 28.59 0.177140713 1.77 0.20736 Unsloth UD-IQ2_XXS 28.47 0.573673327 1.42 0.83747 Unsloth Q4_0 28.3 0.053431218 2.41 0.04583 Bartowski Q4_0 28.28 0.049796789 2.45 0.04200 mradermacher Q4_K_S 27.74 0.050305722 2.39 0.04350 Unsloth.