AI RESEARCH

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

arXiv CS.LG

ArXi:2605.02888v1 Announce Type: new Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length~$\gamma$, which determines how many tokens the draft model proposes per step. Nearly all existing systems use a fixed~$\gamma$ (typically~4), yet empirical evidence suggests that the optimal value varies across task types and, crucially, depends on the compression level applied to the target model.