AI RESEARCH
Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption
arXiv CS.AI
•
ArXi:2605.19593v1 Announce Type: new Modern deployments of Large Language Models (LLMs) increasingly require serving multiple models with diverse architectures, sizes, and specialization on shared, heterogeneous hardware. This setting