Your prompt is getting longer without you knowing it (and it's killing your margins)

I've been looking at LLM billing patterns lately, and there's a silent killer that creeps up on almost every team: prompt inflation. When you first build an AI feature, your prompt is tight. Maybe 500 tokens for the system instructions and 100 for the user query. The math looks great. "This will cost us fractions of a cent per call," you tell the team. Fast forward three months. Someone added conversation history to make the bot "smarter." Another de added a massive RAG context block because the model hallucinated once.