SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting

ArXi:2605.07243v1 Announce Type: new Speculative decoding accelerates LLM inference by drafting a tree of candidate continuations and verifying it in one target forward. Existing drafters fall into two camps with opposite weaknesses. Autoregressive drafters such as EAGLE-3 preserve dependence along each draft path but call the drafter once per tree depth, making drafting a non-trivial share of per-iteration latency.