Qwen3.5-27b 8 bit vs 16 bit
r/LocalLLaMA
•
Open Source AI
AI Research
AI Tools
I tested Qwen3.5 27B with vLLM using the original bf16 version vs the Qwen made -fp8 quantization and using 8 bit KV cache vs the original 16 bit cache. I got practically identical results. I attribute the small difference to random noise as I only ran each once. The test was done using the Aider benchmark on a RTX 6000 Pro. My