Gemma 4 on Llama.cpp should be stable now

r/LocalLLaMA
Generative AI Open Source AI

With the merging of, all of the fixes to known Gemma 4 issues in Llama.cpp have been resolved. I've been running Gemma 4 31B on Q5 quants for some time now with no issues. Runtime hints: remember to run with `--chat-template-params` with the interleaved template Aldehir has prepared (it's in the llama.cpp code under models/templates) I strongly encourage running with `--cache-ram 2048 -ctxcp 2` to avoid system RAM problems running KV cache with Q5 K and Q4 V has shown no large performance degradation, of.