Gemma 4 31B passed 7/8 real-world production tests — including ones I designed to make it fail. Full prompts + outputs.

I've been waiting for a capable free local LLM for a while. I think we're close - the quality is getting there fast, and Gemma 4 is the first open-weight model where I genuinely considered using it in production for simple-to-medium tasks. To test that instinct, I ran both models (31B Dense and 26B A4B MoE) through 8 real-world tasks - not benchmarks, actual prompts I'd use at work.