Tested TurboQuant KV compression with Gemma 4 31B — 5.80x compression, perfect long-context recall, JSON output preserved

r/LocalLLaMA
Open Source AI AI Research

Quick experiment: I implemented Google Research's TurboQuant paper (arXi 2504.19874) as a Python package and tested it with Google's brand new Gemma 4 31B model. The results exceeded the paper's claims.