Quantizing MTP KV Cache = free lunch?
r/LocalLLaMA
•
Generative AI
Open Source AI
With the MTP llama.cpp implementation in the Qwen3.6/3.5 models VRAM is required for the MTP layer.