I Replaced OpenAI's API and Cut My Inference Bill by 94%

Dev.to AI
Machine Learning Generative AI Open Source AI

I was paying OpenAI ~$380/month for a RAG pipeline doing ~50K requests/day. Most of them were straightforward: summarize this, extract that, classify this ticket. GPT-4o is great. But $2.50 per million input tokens for classification tasks? That's a tax on laziness. I switched to an OpenAI-compatible API running open-weight models. Same code. Same response format. The bill dropped to ~$22/month. Here's exactly what I did.