Open source load balancer for Ollama instances
r/LocalLLaMA
•
Generative AI
We (the OpenZiti team) built an OpenAI-compatible gateway that, among other things, distributes requests across multiple Ollama instances with weighted round-robin, background health checks, and automatic failover. The use case: You have Ollama running on a few different machines. You want a single endpoint that any OpenAI-compatible client could hit (Open WebUI, Continue, scripts, etc.) and have requests distributed across the instances. If one goes down, traffic shifts automatically to the others. When it comes back, it rejoins the pool.