attn-rot (TurboQuant-like KV cache trick) lands in llama.cpp

r/LocalLLaMA • April 01, 2026

Generative AI Open Source AI

80% of the benefit of TQ with almost no downsides. Q8 is now ≈ F16 submitted by /u/Dany0 [link] [comments]