Lessons from building a coding agent for 8k context windows: token budgeting, parallel executors, and per-file isolation

r/LocalLLaMA
Generative AI AI Hardware AI Tools

Most AI coding tools (Cursor, Aider, Claude Code) assume you have a 200k-token model. If you're running local LLMs through Ollama or LM Studio, or hitting free-tier cloud APIs like Groq or OpenRouter, you've got around 8k tokens to work with. That doesn't fit a whole project, barely fits a single large file. I spent the last few weeks building a CLI coding agent that's designed around the 8k constraint instead of fighting it. Wanted to share what I learned, because some of it surprised me. The core insight: the LLM never needs to see your whole project.