I catalogued every way local models break JSON output and built a repair library, here's what I found across 288 model calls
r/LocalLLaMA
•
Generative AI
Open Source AI
I've been running structured output prompts through a bunch of models on OpenRouter for the past few months - Llama 3, Mistral, Command R, DeepSeek, Qwen, and every other model on OpenRouter - alongside the usual closed-source suspects. 288 calls total. I wanted to know what actually breaks, how often, and whether open models fail differently from the API-only ones. Short answer: not really. The failure modes are almost identical across the board.