attn-rot (TurboQuant-like KV cache trick) lands in llama.cpp
r/LocalLLaMA
•
Generative AI
Open Source AI
80% of the benefit of TQ with almost no downsides. Q8 is now ≈ F16 submitted by /u/Dany0 [link] [comments]