New Bartowski Gemma 4 quants are a lot slower?
r/LocalLLaMA
•
Generative AI
Open Source AI
Bartowski has uploaded new quants for Gemma 4. I've downloaded them for 26B and E4B. Compared to his original release I'm getting about half the tg/s for both of them. 75% of the pp/s. Does anyone know what changed? I'm assuming the weights aren't the problem but maybe the gguf header now enables a llama.cpp feature that my hardware dislikes? Thanks for any information! submitted by /u/Top-Rub-4670 [link] [comments]