Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark
r/LocalLLaMA
•
Open Source AI
AI Research
Just got Gemma 4 31B running at full 256K context on a single RTX 5090 using TurboQuant KV cache compression.