Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code)
r/LocalLLaMA
•
Open Source AI
AI Research
Following up on my previous Gemma 4 31B benchmark post, I tested speculative decoding with Gemma 4 E2B (4.65B) as the draft model. The results were much better than I expected, so I wanted to share some controlled benchmark numbers.