llama.cpp MTP Boost, New Gemma-4 GGUF, & Qwen 3.6 Local Benchmarks

Dev.to AI
Machine Learning Generative AI Open Source AI

Llama.cpp MTP Boost, New Gemma-4 GGUF, & Qwen 3.6 Local Benchmarks Today's Highlights The llama.cpp project sees a significant performance leap with Multi-head Attention Parallelism (MTP) merged into master, showing up to 11.5% faster generation for 27B Qwen models. Meanwhile, a new Gemma-4 finetune optimized for creative writing is released in GGUF format for Ollama, and Qwen 3.6 models nstrate strong performance on the Terminal-Bench 2.0 leaderboard, outperforming Gemini 2.5 Pro in some local coding tasks. MTP merged into llama.cpp (r/LocalLLaMA)