AI RESEARCH

Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple

arXiv CS.LG

ArXi:2603.11053v1 Announce Type: cross Speculative decoding is a technique that uses multiple language models to accelerate infer- ence. Previous works have used an experi- mental approach to optimize the throughput of the inference pipeline, which involves LLM