RTX 5090 gemma4-26b TG performance report
r/LocalLLaMA
•
AI Tools
Nothing exhaustive. but I thought I'd report what I've seen from early testing. I'm running a modified version of vLLM that has NVFP4 for gemma4-26b. Weights come in around 15.76 GiB and the remainder is KV cache. I'm running full context as well. For a "story telling" prompt and raw output with no thinking, I'm seeing about 150 t/s on TG. TTFT in streaming mode is about 80ms. Quality is good! submitted by /u/Nice_Cellist_7595 [link] [comments]