Multi-Model Failover In Your AI Gateway
Dev.to AI
•
Generative AI
Think about two scenarios that are pretty common. 1) You hit a rate limit or run out of tokens, so you have to "downgrade" to a small/less powerful Model. 2) An LLM provider is down or having intermittent issues. In these two cases, what do you do if you only have one Model set up for your Gateway to route to? In this blog post, you'll learn how to set up failover for your LLMs. Prerequisites To follow along with this blog post from a hands-on perspective, you will need the following: A Kubernetes cluster (local is fine). Agentgateway installed along with the Kubernetes Gateway API CRDs.