Qwen 3.6 35B different quant speeds ?
r/LocalLLaMA
•
Generative AI
Open Source AI
This is on RTX 3090, llama.ccp main, linux arch. So what is everybody's experience so far, ive tested a few quants / llama.ccp forks and came right back to where i started pretty much, i couldnt get higher speed / quality than the UD IQ4 quant, i tried the Apex compact i, the tqr3_4Q. Even tho on paper they should be faster, i couldnt get better results than 120-130, so i kinda reverted to what i already had. The tqr3_4Q fits nicely tho its really small, but its like the q3 km quality so no point for me running in as i have like 4 GB vram left free even at 260k contex.