Gemma 4 thinking system prompt

r/LocalLLaMA
Open Source AI

I like to be able to enable and disable thinking using a system prompt, so that I can control what which prompts generate thinking tokens rather than relying on the model to choose for me. It's one of the reasons I loved Qwen-30b-A3b. I'm having trouble getting this same setup working for the gemma 4 models. Right now playing with the 26b. The model will sometimes respond to a system prompt asking it to skip reasoning, sometimes not. If I put ` ` in the user prompt before my own content, that seems to work well. However that isn't really practical for api calls and the like.