AI RESEARCH
SpecTr-GBV: Multi-Draft Block Verification Accelerating Speculative Decoding
arXiv CS.CL
•
ArXi:2604.25925v1 Announce Type: new Autoregressive language models suffer from high inference latency due to their sequential decoding nature. Speculative decoding (SD) mitigates this by employing a lightweight draft model to propose candidate tokens, which are selectively verified by a larger target model. While existing methods either adopt multi-draft strategies to increase acceptance rates or block verification techniques to jointly verify multiple tokens, they remain limited by treating these improvements in isolation.