More Qwen3.6-27B MTP success but on dual Mi50s

r/LocalLLaMA
Generative AI

TLDR: The hype is real! 1.5x speedup. Up to 2x speedup with tensor parallelism! After reading the PR I immediately hunted for MTP-compatible Q4_1 quants (they offer a small speedup on these compute-lacking older cards) but couldn't find any. Luckily I came across this post which highlighted how to transplant MTP grafting onto your own quants, and thus attached it to Bartowski's quant I already had.