I Found a Stateless LLM Runtime on GitHub — It Dynamically Loads Models Per Request

Dev.to AI
Generative AI

I was randomly browsing GitHub when I came across this project called Chameleon. At first I thought it was just another LLM wrapper - but it turned out to be something very different. 🔗 Repo: Thoughts I haven’t run it yet, but the architecture alone is interesting. Curious to see: how it performs under load whether cold starts become a bottleneck how far the routing system can go If you’re into AI infra, this is definitely worth a look.