Qwen3.6:27B VRAM 16GB 5080: MTP Quant, Speeds, and Configs

r/LocalLLaMA
Generative AI

For those of you running Qwen3.6:27B on 16GB VRAM, what quantization did you settle on? For my primary purpose as a HA voice assistant, I've found my ideal target to be >50 tg and >800 pp. Qwen3.5:9B works really fast, but I'm experimenting with higher intelligence. Offloaded the vision model to CPU because it is infrequently used.