Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs

r/LocalLLaMA
AI Hardware Open Source AI

Hey r/LocalLLaMA, We’ve released our ByteShape Qwen 3.6 35B GGUF quantizations in two families: standard NTP (Next Token Prediction or non-MTP) and MTP. Blog / Download NTP Models / Download MTP Models TL;DR For NTP, “pick the largest quant that fits” worked surprisingly well. Lower bpw was not automatically better: our largest model was very hard to beat on quality/speed, including prompt processing and token generation. MTP gave a real GPU generation-speed boost, usually around 20-40%, but the extra memory footprint can change what fits. MTP speedup is heavily workload dependent.