AI RESEARCH

PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding

arXiv CS.AI

ArXi:2605.08632v1 Announce Type: cross Speculative decoding accelerates Large Language Models (LLMs) inference by using a lightweight draft model to propose candidate tokens that are verified in parallel by the target model. However, existing draft model