Help my llm isn't llming

r/LocalLLaMA
Generative AI Open Source AI

Long story short, for some reasons Q4 and Q6 seem to be taking the same amount of RAM on my Macbook air M2 16GB? And also the same generation speed? I'm a beginner with little knowledge about this, and I hope some kind souls here can save me. here are some stats. models: unsloth Qwen3.5 9B UD-Q4_K_XL (5.97GB) and unsloth Qwen3.5 9B Q6_K (7.46) temp 0.8 top-k 40 top-p 0.95 they, along with other stats, are all defaults of llama.cpp I sudo purged every time before switching to the next model, turned off all windows except terminal and activity monitor, and made sure there's no swapping.