Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct

r/LocalLLaMA
Generative AI Open Source AI AI Tools

Ok, hear me out. This all started when I was trying to understand why this Qwen3.6 27B INT8 Autoround recipe was performing so much better than any other Qwen3.6 27B quant I tried. On some personal Rust + Bevy benchmarks, it was consistently outputting better code and games. I then noticed the model did a LOT less thinking. The INT8 model is great, but vLLM VRAM usage is higher. And since llama-cpp (in PR) has MTP, I figured I'd try to quant this and add MTP too.