AI RESEARCH
StreamServe: Adaptive Speculative Flows for Low-Latency Disaggregated LLM Serving
arXiv CS.AI
•
ArXi:2604.09562v1 Announce Type: cross Efficient LLM serving must balance throughput and latency across diverse, bursty workloads. We