MTP - The proofs in the puddin! Using it with Qwen3.6-27b

r/LocalLLaMA
Generative AI Open Source AI

Been running llama.cpp MTP with Qwen3.6-27B Q4_K_M as my daily coding assistant and got curious what was actually happening under the hood. Pulled the metrics from llama-server and charted a full session. A few things stood out - generation speed tanks hard past 85K context (down 30-35% by 95K+), cold prefills are brutal but the KV cache slot-save feature is doing serious heavy lifting on hit rate. Config details and observations below, happy to answer questions. Referring to this post: Get Faster Qwen3.6 27b submitted by /u/admajic [link] [comments.