How does Pi coding agent control Qwen's thinking verbosity? (Qwen 35B A3B, llama-server)
r/LocalLLaMA
•
Generative AI
Open Source AI
I'm running Qwen 35B A3B via llama-server with reasoning budget set to -1 (unlimited) for testing. In every client I've tried, the model just thinks endlessly before responding. But with Pi, it does the bare minimum thinking and still responds fairly accurately - which is a stark difference. My first instinct was that it's the system prompt, so I copied Pi's default system prompt into other clients. No change - still runaway thinking.