PSA: Using Claude Code without Anthropic: How to fix the 60-second local KV cache invalidation issue.
r/LocalLLaMA
•
Generative AI
Open Source AI
TL;DR: Claude Code injects dynamic telemetry headers and git status updates into the system prompt on every single request. If you are using a local inference backend like llama.cpp downstream llama-server or LM Studio, this dynamic injection instantly breaks prefix matching, flushes your entire KV cache, and forces your hardware to re-process a 20K+ token system prompt from scratch for every minor tool call. You can fix this in ~/.claude/settings.json.