Are there ways to set up llama-swap so that competing model requests are queued ?

r/LocalLLaMA
Generative AI Open Source AI

Hello everyone as the title says, I am looking to provide a 48gb workstation to students as an API endpoint. I am using litellm currently and want to keep using it but under the hood I would love to get a llama swap instance to run so that I can offer different models and students can just query the one they want. But if no memory is left I would like the job to be queued is there a functionality like that? Also I am running on AMD does that