TAPS: Task Aware Proposal Distributions for Speculative Sampling

ArXi:2603.27027v1 Announce Type: cross Speculative decoding accelerates autoregressive generation by letting a lightweight draft model propose future tokens that a larger target model then verifies in parallel. In practice, however, draft models are usually trained on broad generic corpora, which leaves it unclear how much speculative decoding quality depends on the draft