Pi & Qwen3.5 with llama-cpp doing a lot of prompt re-processing

r/LocalLLaMA
Generative AI Open Source AI

I've noticed an issue when I'm using Pi as a coding agent with llama-cpp, and I'm wondering if there's an issue with Pi or how I have it configured, or if this is just expected behavior. I'm using Qwen3.5 122b with thinking enabled. When doing a bunch of agentic edits, it will do a lot of interleaving thinking and tool calls. This all works fine. But then when it comes to my next turn providing input, I get a whole bunch of the context cache invalidated, because it looks like Pi is no longer sending over the thinking blocks.