We Built a Service That Catches LLM Drift Before Your Users Do
Dev.to AI
•
Machine Learning
Generative AI
You shipped your LLM-powered feature. It worked perfectly in testing. Users loved the beta. Three weeks later, your inbox fills up. Outputs are wrong. The JSON your app parses doesn't look right. The classifier is giving different answers. And you had no idea until users told you. This Happens Than You Think In February 2025, developers on r/LLMDevs reported GPT-4o changing behaviour with zero advance notice: "We caught GPT-4o drifting this week. OpenAI changed GPT-4o in a way that significantly changed our prompt outputs. Zero advance notice." It's not just OpenAI.