Qwen3.6 27b q5_k_M MTP - 256k context - 5090
r/LocalLLaMA
•
Generative AI
Open Source AI
Straight to it: llama-server-mtp \ -m ~/models/Qwen3.6-27B-Q5_K_M-mtp.gguf \ --spec-type mtp \ --spec-draft-n-max 3 \ --cache-type-k q8_0 \ --cache-type- q8_0 \ -np 1 \ -c 262144 \ -ngl 99 \ --host 0.0.0.0 \ --port 8080 Been running this on my desktop 5090 with no issues and no spillover! You will need to install a special version of llamacpp to run Qwen3.6 with MTP: Edit: 65-75 tps submitted by /u/No_Mango7658 [link] [comments]