AI RESEARCH

Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models

arXiv CS.CL

ArXi:2603.02631v2 Announce Type: replace Prompt length is a major bottleneck in agentic large language model (LLM) workloads, where repeated inference steps and multi-call loops incur substantial prefill cost. Recent work on speculative prefill nstrates that attention-based token importance estimation can enable