(llama.cpp) Possible to disable reasoning for some requests (while leaving reasoning on by default)?

r/LocalLLaMA
Generative AI Open Source AI

I am running unsloth/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf with llama-server (with reasoning enabled). Is it possible to disable reasoning for some requests only? If yes, how? I want to leave reasoning on by default, but in some other use cases I want it to respond as fast as possible (e.g. chatting bot) submitted by /u/regunakyle [link] [comments]