AI RESEARCH
Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines
arXiv CS.AI
•
ArXi:2604.15186v1 Announce Type: cross Agentic workflows carry out complex tasks by orchestrating multiple large language models (LLMs) and tools. Serving such workflows at a target throughput with low latency is challenging because they can be defined using arbitrary agentic frameworks and exhibit unpredictable execution times: execution may branch, fan-out, or recur in data-dependent ways. Since LLMs in workflows often outnumber available GPUs, their execution also leads to GPU oversubscription.