SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding

ArXi:2603.18567v1 Announce Type: cross Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tokens for batched verification. However, its adoption has been limited by the lack of high-quality draft models and scalable