Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard!
r/LocalLLaMA
•
Generative AI
AI Research
Qwen3.6-35B-A3B and 9B are officially on the public Terminal-Bench 2.0 leaderboard! little-coder × Qwen3.6-35B-A3B hit 24.6% (±3.2), and now land above Gemini 2.5 Pro on Gemini CLI (19.6%) and Qwen3-Coder-480B on Terminus 2 (23.9%). I didn’t expect the scaffold-model gap from Polyglot to hold on a benchmark this hard but it did! little-coder × Qwen3.5-9B came in at 9.2% which is humble. Yet, it also shows again that sub-10B local models are now measurable on a hard agentic benchmark, not assumed unworthy of a slot.