Llama.cpp MTP with Qwen3.6 27B on Headless RTX 3090
r/LocalLLaMA
•
Generative AI
Open Source AI
Saw some posts around PP being slower, so they were cautious on trying it. Here's a real-world datapoint. Settings: Headless RTX 3090 24G OpenCode Model unsloth's Qwen3.6-27B-MTP-Q4_K_M.gguf 128k context q8_0 k cache --spec-draft-n-max: 3 --draft-p-min: 0 Use Cases: Research task that uses ~85,000 tokens Coding task that uses ~85,000 tokens.