Qwen3.6-27B 4.256bpw in full VRAM on a 5070 Ti with 50000 q4_0 context - not turbo!

r/LocalLLaMA
Open Source AI AI Tools

Hugging face link here. Ive been waiting for sokann to drop his Qwen 3.6 GGUF for 16 GB GPUs as his Qwen 3.5 was my GGUF of choice. I tried cHunter789's Qwen3.6-27B-i1-IQ4_XS-GGUF that was posted yesterday, but could only achieve a context window of 30000 while staying in VRAM. With the same launch settings, I am able to achieve a 50000 context window with this GGUF, which is quite the increase. You Linux/headless guys should be able to get some out of it too.