Bulletproofing LLM Structured Output in Python: Healing Retries, Cost Caps, and Drift Detection (Runnable Code)

I shipped a structured-output endpoint to production in March. The schema was clean, JSON mode was on, the model was GPT-4.1, the eval suite was green. Three weeks in, the on-call channel lit up because a downstream billing job had silently skipped 4,200 records over a weekend. The output was valid JSON. It just wasn't the JSON we asked for. That was my last "JSON mode is good enough" deployment. Since then I've shipped four LLM structured-output systems and the failures keep coming from the same places - and JSON mode catches roughly two of them.