Gemma-4 26B-A4B + Opencode on M5 MacBook is *actually good*

TL;DR, 32gb M5 MacBook Air can run gemma-4-26B-A4B-it-UD-IQ4_XS at 300t/s PP and 12t/s generation (running in low power mode, uses 8W, making it the first laptop I've used to not get warm and noisy whilst running LLMs). Fast prompt processing + short thinking traces + can actually handle agentic behaviour = Opencode is actually usable from my laptop! -- Previously I've been running LLMs off my M1 Max 64gb. And whilst it's been good enough for tinkering and toy use cases, it's never really been great for running anything that requires longer context. i.e.