AI RESEARCH

Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics

arXiv CS.AI

ArXi:2605.01280v1 Announce Type: cross This position paper argues that LLM inference serving has outgrown generic heuristics and now demands mathematical optimization and algorithmic foundations. Despite rapid advances in serving systems such as vLLM and SGLang, their algorithmic cores remain largely unchanged from classical distributed computing: request routing uses join-shortest-queue or round-robin, scheduling defaults to FIFO, and KV cache eviction follows