AI RESEARCH

Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding

arXiv CS.CL

ArXi:2605.00342v1 Announce Type: new Tree-based speculative decoding accelerates autoregressive generation by verifying multiple draft candidates in parallel, but this advantage weakens for sparse Mixture-of-Experts (MoE) models. As the draft tree grows, different branches activate different experts, expanding the union of activated experts and substantially increasing target-side verification cost. We propose EVICT, a