Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs

ArXi:2510.20064v2 Announce Type: replace Speculative decoding is widely used in accelerating large language model (LLM) inference. In this work, we focus on the online draft model selection problem in speculative decoding. We design an algorithm that provably competes with the best draft model in hindsight for each query in terms of either the token acceptance probability or expected acceptance length.