Qwen3.6-35B-A3B GGUF from Unsloth is quite a bit slower?

r/LocalLLaMA
Generative AI Open Source AI

Hi there, first of all I just want to give a huge thanks for Unsloth's tireless work at producing high quality GGUFs and also for their friendly interaction with us here. I'm just running on a CPU-only setup with the latest llama.cpp on Debian 13. For some reason on my setup the Unsloth GGUFs get about 30% less tokens/sec than a similarly sized one from another creator, and followup responses take quite a bit longer to process.