AI RESEARCH
Mistletoe: Stealthy Acceleration-Collapse Attacks on Speculative Decoding
arXiv CS.LG
•
ArXi:2605.14005v1 Announce Type: cross Speculative decoding has become a widely adopted technique for accelerating large language model (LLM) inference by drafting multiple candidate tokens and verifying them with a target model in parallel. Its efficiency, however, critically depends on the average accepted length $\tau$, i.e., how many draft tokens survive each verification step.