Quantizing MTP KV Cache = free lunch?

r/LocalLLaMA
Generative AI Open Source AI

With the MTP llama.cpp implementation in the Qwen3.6/3.5 models VRAM is required for the MTP layer.