**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**

What is SPEED-Bench? The Qualitative split: semantic coverage and draft accuracy The Throughput split: realistic serving workloads A unified measurement framework Insights from SPEED-Bench Domain-dependent accuracy and speedups Vocabulary pruning reveals long-tail failures Random tokens overestimate throughput Start using SPEED-Bench Speculative Decoding (SD) has emerged as a critical technique for accelerating LLM inference. SD uses a lightweight draft model to speculate multiple future tokens.