AI RESEARCH
Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple
arXiv CS.LG
•
ArXi:2603.11053v1 Announce Type: cross Speculative decoding is a technique that uses multiple language models to accelerate infer- ence. Previous works have used an experi- mental approach to optimize the throughput of the inference pipeline, which involves LLM