AI RESEARCH
DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification
arXiv CS.CL
•
ArXi:2604.07622v1 Announce Type: new Speculative decoding is an effective technique for accelerating large language model inference by drafting multiple tokens in parallel. In practice, its speedup is often bottlenecked by a rigid verification step that strictly enforces the accepted token distribution to exactly match the target model. This constraint leads to the rejection of many plausible tokens, lowering the acceptance rate and limiting overall time speedup.