Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live

ArXi:2511.02230v4 Announce Type: replace-cross KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting. This policy breaks for agentic workloads, which interleave LLM calls with tools, We present CacheTTL, a serving system to optimize job completion time for multi-turn agent workloads by