Towards Multi-Model LLM Schedulers: Empirical Insights into Offloading and Preemption

ArXi:2605.19593v1 Announce Type: new Modern deployments of Large Language Models (LLMs) increasingly require serving multiple models with diverse architectures, sizes, and specialization on shared, heterogeneous hardware. This setting