Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark

r/LocalLLaMA
Open Source AI AI Research

Just got Gemma 4 31B running at full 256K context on a single RTX 5090 using TurboQuant KV cache compression.