New Bartowski Gemma 4 quants are a lot slower?

r/LocalLLaMA
Generative AI Open Source AI

Bartowski has uploaded new quants for Gemma 4. I've downloaded them for 26B and E4B. Compared to his original release I'm getting about half the tg/s for both of them. 75% of the pp/s. Does anyone know what changed? I'm assuming the weights aren't the problem but maybe the gguf header now enables a llama.cpp feature that my hardware dislikes? Thanks for any information! submitted by /u/Top-Rub-4670 [link] [comments]