ConFu: Contemplate the Future for Better Speculative Sampling

ArXi:2603.08899v1 Announce Type: new Speculative decoding has emerged as a powerful approach to accelerate large language model (LLM) inference by employing lightweight draft models to propose candidate tokens that are subsequently verified by the target model. The effectiveness of this paradigm critically depends on the quality of the draft model.