My LLM Started Lying to My App and I Didn't Notice for Three Days
Dev.to AI
•
Generative AI
It started with a Slack message from a user: "Your summaries look weird." Not an error. Not a crash. Just. weird. By the time I'd triaged, reproduced, and traced it back to the root cause, the model had been returning malformed JSON on about 12% of requests for 72 hours. Our error handling was swallowing the parse failures and returning stale cache. Users were getting yesterday's data labeled as today's. Nobody's monitoring caught it because no exception was raised. The model just quietly stopped following the format instructions it had followed reliably for months.