Qwen3.5 27B | RTX 5090 | 400w

r/LocalLLaMA
Generative AI Open Source AI

Just a quick tap. Running RTX 5090 at 400W with stock clocks runs Qwen3.5 27B virtually at the same speed on llama.cpp with Unsloth Q6_K quant. Normally dense models would take a hit but for some reason it's tremendously efficient on this model and I haven't found a reason why. I've tried with a friend's RTX 5090 and result is the same. Let me know if this helps submitted by /u/Holiday_Purpose_3166 [link] [comments]