Justitia: Fair and Efficient Scheduling of Task-parallel LLM Agents with Selective Pampering

ArXi:2510.17015v2 Announce Type: replace-cross LLM agents, which often comprise parallel inference tasks, are commonly adopted to solve real-world problems. When serving such task-parallel LLM agents in shared GPU servers, the scheduler is expected to attain fast agent completion with guaranteed worst-case performance. For that objective, our insight is to selectively pampering agents based on their completion order under idealized fair-sharing. We design Justitia, a fair and also efficient scheduler for task-parallel LLM agents.