Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code)

r/LocalLLaMA
Open Source AI AI Research

Following up on my previous Gemma 4 31B benchmark post, I tested speculative decoding with Gemma 4 E2B (4.65B) as the draft model. The results were much better than I expected, so I wanted to share some controlled benchmark numbers.