MTP for Qwen3.6-35B-A3B on 6GB VRAM laptop: not worth it

r/LocalLLaMA
Generative AI Open Source AI

I have an Asus gaming laptop from 2021 that I bought used for 500€ last year. I wanted to see if the recently merged MTP in llama.cpp is worth using on such a VRAM constrained device for the Qwen3.6-35B-A3B model. So I did some experiments to measure performance with and without MTP. TL;DR: It's not worth it. The prompt processing is so much slower with MTP that it outweighs the minimal gains in TG speeds. However, I did discover a useful VRAM saving trick: using q4_0 quantization for the draft KV cache works just as well as q8_0 and saves a small bit of