Gemma 4 26B Hits 600 Tok/s on One RTX 5090
r/LocalLLaMA
•
Open Source AI
AI Research
AI Tools
I ran a benchmark to see how much DFlash speculative decoding actually helps in vLLM.