Graceful reasoning budget termination for qwen3.5 models in llama.cpp

r/LocalLLaMA
Generative AI Open Source AI

I fixed the issue with the reasoning budget beeing just a hard cutoff and the model dropped the mic mid sentence. This is not the most graceful variant to do it. Possibly Performance degradation also. But the model just reasons for minutes when not stopped. I found that when after some budget a sentence is injected like: "Final Answer:\nBased on my analysis above, " The model keeps writing like it were its own idea and then finishes up gracefully with a summary. I implemented this with a prompt injection flag. For example after 300 tokens and a rest budget for the the summary.