Qwen3.5-27b 8 bit vs 16 bit

r/LocalLLaMA
Open Source AI AI Research AI Tools

I tested Qwen3.5 27B with vLLM using the original bf16 version vs the Qwen made -fp8 quantization and using 8 bit KV cache vs the original 16 bit cache. I got practically identical results. I attribute the small difference to random noise as I only ran each once. The test was done using the Aider benchmark on a RTX 6000 Pro. My