Two local models beat one bigger local model for long-running agents

I've been running OpenClaw locally on a Mac Studio M4 (36GB) with Qwen 3.5 27B (4-bit, oMLX) as a household agent. The thing that finally made it reliable wasn't what I expected. The usual advice is "if your agent is flaky, use a bigger model." I ended up going the other direction: adding a second, smaller model, and it worked way better. The problem When Qwen 3.5 27B runs long in OpenClaw, it doesn't get dumb.