Gemma 4 for 16 GB VRAM

I think the 26B A4B MoE model is superior for 16 GB. I tested many quantizations, but if you want to keep the vision, I think the best one currently is: (I tested bartowski variants too, but unsloth has better reasoning for the size) But you need some parameter tweaking for the best performance, especially for coding: --temp 0.3 --top-p 0.9 --min-p 0.1 --top-k 20 Keeping the temp and top-k low and min-p a little high, it performs very well. So far no issues and it performs very close to the aistudio hosted model. For vision use the mmproj-F16.gguf.