Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
----START HUMAN TEXT---- Hi all, I've seen a bunch of posts about squeezing 27B onto a 24GB card and all the quantization tricks involved in doing so. It's all amazing work, but at the end of the day a quantized model with quantized KV will inevitably compound errors faster than non-quantized ones, which noticeably impacts agentic coding. I figured a 48GB GPU offered just enough VRAM to avoid most of the quantization nastiness with genuinely good options, like Blackwell-accelerated FP8. Luckily, Qwen released their own FP8 variant of the 27B model.