Gemma 4 26B Hits 600 Tok/s on One RTX 5090

r/LocalLLaMA
Open Source AI AI Research AI Tools

I ran a benchmark to see how much DFlash speculative decoding actually helps in vLLM.