Strix Halo Llama.cpp MTP Benchmarks: 27B Gets Much Faster, 35B Is Mixed

r/LocalLLaMA
Generative AI Open Source AI

TL;DR All models were Qwen3.6 27B-MTP vs Base 27B (15k single-turn): Faster overall Total Time (wall): 87.44s → 77.39s ( 10.05s faster / -11.50%) Generation: 7.63 → 16.15 t/s (+111.77% speedup) Prompt Processing: 279.75 → 244.90 t/s (-12.46% slowdown) 35B-MTP vs Base 35B (15k single-turn): Slower overall Total Time (wall): 20.83s → 23.16s ( 2.33s slower / +11.17%) Generation: 48.18 → 56.12 t/s (+16.47% speedup) Prompt Processing: 972.18 → 811.90 t/s (-16.49% slowdown) 27B-MTP vs Base 27B (5-turn chat, ~28.5k context): Massive time savings Total Time (wall): 258.65s → 200.55s ( 58.10s.