AI RESEARCH

StreamServe: Adaptive Speculative Flows for Low-Latency Disaggregated LLM Serving

arXiv CS.AI

ArXi:2604.09562v1 Announce Type: cross Efficient LLM serving must balance throughput and latency across diverse, bursty workloads. We