Developers who use local AI - Q4_0 vs Q8_0 KV quant?

r/LocalLLaMA
Generative AI Open Source AI

I'd love to hear from developers who use big context windows if they notice a difference? Obviously I would love to cut the KV cache VRAM requirement in half, but I'm worried about quality especially when we enter into 50k+ context territory. I don't really need a full study, just wondering, anecdotally, what people have experienced. My current setup: Docker stack with Llama.cpp server at the helm (Vulkan - I pay AMD tax daily) - 32GB VRAM, using mostly Qwen 3.6 models for development. I go back and forth beetween the 27b dense and 35b MoE.