Is there a big gap between Q4 and Q6 on Qwen3.6?

r/LocalLLaMA
Open Source AI

I’ve got one 3090 and thanks to the help of MTP and all, I can do around 65 tok/s on qwen 3.6 dense 27b. But I’m running at Q4_M so everything fits and my context isn’t super high. Maybe 65k or up to 100k. I’ve thrown around the idea of a second 3090. But I do already have some gaming PCs running parallel stuff with smaller 3080 (2x) and 4080S cards to my 3090. So it seems the real benefit of a second 3090 is running at a higher quant. But for those that do, have you noticed a big difference? submitted by /u/vick2djax [link] [comments.