Qwen3.5 27B | RTX 5090 | 400w
r/LocalLLaMA
•
Generative AI
Open Source AI
Just a quick tap. Running RTX 5090 at 400W with stock clocks runs Qwen3.5 27B virtually at the same speed on llama.cpp with Unsloth Q6_K quant. Normally dense models would take a hit but for some reason it's tremendously efficient on this model and I haven't found a reason why. I've tried with a friend's RTX 5090 and result is the same. Let me know if this helps submitted by /u/Holiday_Purpose_3166 [link] [comments]