Quant Qwen3.6-27B on 16GB VRAM with 100k context length

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Tools

I have experimented how to run Qwen3.6-27B on my laptop with an A5000 16GB GPU. I have created an own IQ4_XS GGUF "qwen3.6-27b-IQ4_XS-pure.gguf" with the Unsloth imatrix and compared the mean KLD of it with other quants. You can see that I also have tested different turboquant versions. It looks that the buun-llama-cpp fork is better than the TheTom/llama-cpp-turboquant fork. If you want to try my version, you can do the following: Download my GGUF from Huggingface.