Get faster qwen 3.6 27b
r/LocalLLaMA
•
Generative AI
Open Source AI
Using 100k context with 3090 with MTP GGUF and getting 50 t/s on llama.cpp Thought I would knowledge share Use And am17an commit /media/adam/D_DRIVE/LLM/llama-cpp-am17an/build/bin/llama-server -m "/media/Qwen3.6-27B-Q4/Qwen3.6-27B-MTP-Q4_K_M.gguf" \ --ctx-size 100000 \ -ngl 99 -fa on \ --cache-type-k q4_0 --cache-type- q4_0 \ --batch-size 2048 --ubatch-size 1024 \ --spec-type mtp --spec-draft-n-max 2 \ --flash-attn Note: Spec draft 3 seemed to much for the 3090 at higher context Why 100k context? Beside it slows down and 100k is enough for most tasks then compact and continue.