Tested TurboQuant KV compression with Gemma 4 31B — 5.80x compression, perfect long-context recall, JSON output preserved
r/LocalLLaMA
•
Open Source AI
AI Research
Quick experiment: I implemented Google Research's TurboQuant paper (arXi 2504.19874) as a Python package and tested it with Google's brand new Gemma 4 31B model. The results exceeded the paper's claims.