Gemma 4 - lazy model or am I crazy? (bit of a rant)

r/LocalLLaMA
Generative AI Open Source AI

Like it says in the title. Specifically, the 26b MoE. I’ve wanted to like this model, so much. Thought it might replace Qwen 3.5 27b. Keep coming back to it and trying it every time there’s an update, hoping it will have improved. I’m running unsloth UD_Q4_K_XL on llama.cpp. I’m on the latest commits from main. I know about - jinja. I know about the interleaved thinking template. I’m not running low quant KV cache. This is far from the first model I’ve run. Every time, my tests show the same thing - it is a very lazy model when it comes to using skills or searching the web.