Another shout out to llama.cpp build b9455 2x3090
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Tools
As you guys know, the next highest quant is Unsloth's /Qwen3.6-27B-UD-Q8_K_XL.gguf. With llama.cpp before, i was getting 30-50 tk/s. vllm was kicking llama's ass with its tensor splits speeding up the 2x3090s at 70+ tk/s for months. But I can't seem to find good quants for vllm and settle for some unknown qwen3.6-mtp-8.0. it was also making minor coding mistakes here and there. now being able to run unsloth's UDQ8KXL at 70+t/s, its code output are so clean, its like a different beast altogether. Finally got around to test out the llama ver b9455b with tensor-split, and holy f.