My OpenCode local LLM agent setup — what would you change?

r/LocalLLaMA
Machine Learning Generative AI Open Source AI AI Research

I’ve been fine-tuning my OpenCode workflow to balance API costs with local hardware performance. Currently running llama.cpp locally with a focus on high-quantization models The Agent Stack Agent Model Quant Speed (t/s) plan Kimi K2.5 (OpenCode Go) API ~45 build / debug Qwen3 Coder Next Q8_K_XL 47 review Qwen3.5-122B-A10B Q8_K_XL 18 security MiniMax M2.5 Q4_K_XL 20 docs / test GLM-4.7-Flash Q8_K_XL 80 The Logic Kimi K2.5: Hits 76.8% on SWE-bench. I’ve prompted it to aggressively delegate tasks to the local agents to keep my remote token usage near zero. Qwen3 Coder Next: Currently my.