AI RESEARCH

D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting

arXiv CS.AI

ArXi:2605.18810v1 Announce Type: cross Speculative decoding accelerates LLM inference by having a small drafter propose tokens that a larger target model verifies in parallel. Recent diffusion-based parallel drafters such as DFlash predict the full B-token block in one forward pass, enabling deeper drafters and longer accepted blocks. However, existing multi-token drafter objectives often use fixed position-dependent weighting schedules, such as head-dependent weights or block-position decays, which do not adapt as the positions limiting acceptance change during