M5 Max 128GB, 17 models, 23 prompts: Qwen 3.5 122B is still a local king
r/LocalLLaMA
•
Generative AI
Open Source AI
The last Llama (Scout/Maverick) was released a year ago. Since then US based releases have been super rare: Granite 3.3, GPT-OSS 20B & 120B, Nemotron 3 Nano / Super and now Gemma 4. Can't even compare to the solid Chinese open model output or Qwens, DeepSeeks, Kimis, MiniMaxes, GLMs, MiMos, Seeds, etc. Gemma 4 is like a breath of fresh air. Not just the model itself, but the rollout, the beauty, the innovation: K=V in global attention, Per-Layer Embeddings, tri-modal minis (E4B, E2B), etc. Most of my local LLM usage used to be via rented GPUs: Google Cloud, AWS, etc.