AI RESEARCH

Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match

arXiv CS.CL

ArXi:2511.22972v3 Announce Type: replace Large language models (LLMs) achieve strong performance across diverse tasks but suffer from high inference latency due to their autoregressive generation. Speculative Decoding (SPD) mitigates this issue by verifying candidate tokens in parallel from a smaller draft model, yet its strict exact-match verification discards many semantically valid continuations. Moreover, existing