Qwen3.5 overthinking anxiety duct tape fix

r/LocalLLaMA
Generative AI Open Source AI

A lot of people are complaining about Qwen3.5 overthinking answers with their "But wait. " thinking blocks. I've been playing around with Qwen3.5 a lot lately and wanted to share a quick duct tape fix to get them out of the refining loop (at least in llama.cpp, probably works for other inference engines too): add the flags --reasoning-budget and --reasoning-budget-message like so: llama-server \ --reasoning-budget 4096 \ --reasoning-budget-message ". Okay enough thinking.