attn-rot (TurboQuant-like KV cache trick) lands in llama.cpp

r/LocalLLaMA
Generative AI Open Source AI

80% of the benefit of TQ with almost no downsides. Q8 is now ≈ F16 submitted by /u/Dany0 [link] [comments]