TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs

ArXi:2510.15545v4 Announce Type: replace Accelerating the inference of large language models (LLMs) has been a critical challenge in generative AI. Speculative decoding (SD) substantially improves LLM inference efficiency. However, its utility is limited by a fundamental constraint: the draft and target models must share the same vocabulary, thus limiting the herd of available draft models and often necessitating the