AI RESEARCH

When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?

arXiv CS.CL

ArXi:2604.26412v1 Announce Type: new Speculative decoding accelerates LLM inference, but SOTA hidden-state-based drafters suffer from long-range decay: draft accuracy degrades as the speculative step increases. Existing work attributes this decay to train-inference mismatch and proposes test-time