Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090

r/LocalLLaMA
Generative AI

So I've been messing around trying to get MTP working alongside TBQ4_0 (TurboQuant's lossless 4.25 bp KV cache) on Qwen3.6-27B for my own use. So after a day of vibecoding I think I may have gotten something viable. Went from about 43 t/s when I first got it compiling to 80-87 t/s after optimizing. With MTP draft acceptance around 73% on top of that.