What's a good context length for a general/personal assistant agent?
r/LocalLLaMA
•
Generative AI
I've been trying to find a good balance between speed and memory. 64K seems like the sweet spot to me - with qwen3.5:35b-a3b-q4 it all fits in my 7900 XTX - but I'm wondering if I'm overshooting. This agent is just a personal assistant: taking notes, reminding me of things, doing some light web search. System prompt is under 2K tokens and it only has 2 MCP servers / 3 tools. Nothing crazy. For those running similar setups, what context length are you actually using? Are you going max and letting it fill up, or keeping it tighter for speed? Curious where people are landing on this.