Opinion: Qwen 3.6 27b Beats Sonnet 4.6 on Feature Planning

I keep hearing the argument that that large models are better for high-level planning and task orchestration, since they have general knowledge to work from when making decisions. However, I've been testing Qwen 3.6 27b (Unsloth Q5_K_M) quite a lot since its release, and it's consistently outperforming larger models on attention to detail and foresight. SBS comparison attached of Qwen (running in Pi, a lightweight harness that tends to benefit small models) and Sonnet 4.6 (in Claude Code) given the same "plan review" task using identical prompts and `Claude.md` files.