llama.cpp constantly reprocessing huge prompts with opencode/pi.dev
r/LocalLLaMA
•
Generative AI
Open Source AI
I’m using llama-swap with llama.cpp. I mainly use opencode + pi.de and I’m seeing frequent massive prompt reprocessing / prefills even tho the prompts are very similar between requests. Example behavior: context grows to +50k tokens LCP similarity often shows 0.99+ but sometimes n_past suddenly falls back to ~4-5k then llama.cpp reprocesses 40k+ tokens again TTFT jumps to multiple minutes Example logs: sim_best = 0.996 red context checkpoint.