Efficiently Aligning Draft Models via Parameter- and Data-Efficient Adaptation

ArXi:2603.09527v1 Announce Type: cross Speculative decoding accelerates LLM inference but suffers from performance degradation when target models are fine-tuned for specific domains. A naive solution is to retrain draft models for every target model, which is costly and inefficient. To address this, we