AI RESEARCH
PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding
arXiv CS.AI
•
ArXi:2605.08632v1 Announce Type: cross Speculative decoding accelerates Large Language Models (LLMs) inference by using a lightweight draft model to propose candidate tokens that are verified in parallel by the target model. However, existing draft model