Do not use mixed KV cache quantization
r/LocalLLaMA
•
AI Research
I've seen a few people in the comments on here and the other AI subs suggest mixing quantization for the KV cache to retain higher accuracy and still saving memory. I was running that for a while until I realized how wrong it is.