llama.cpp chooses to be unstable, or, a mea culpa to Ollama
r/LocalLLaMA
•
Generative AI
Open Source AI
TL;DR: The new parser insists that it must crash the process, but *only* when it has provided completely valid streaming output, and doesn't have anything valid left to send after the final inference. There's innumerable ways to repro, from Mistral 3.x tool-calling, to Llama 3.x requests with tools that have responses with a "{" that aren't a tool call (yes, it's as simple as any Llama 3.x response with a code block:/ ) For two years, I've been the guy telling people llama.cpp doesn't deserve its reputation.