Serving Qwen3.6-35B-A3B With vLLM and Building a Coding Agent With Tool Calling

Dev.to AI
Generative AI Open Source AI AI Tools

Alibaba's Qwen team released Qwen3.6-35B-A3B on April 16, 2026 under Apache 2.0. It is a sparse mixture-of-experts model with 35B total parameters but only about 3B active per token. It scores 73.4% on SWE-bench Verified and 37.0 on MCPMark, which makes it one of the strongest open-weight models for agentic coding right now. This post walks through serving it locally with vLLM, calling it from Python with the OpenAI SDK, and wiring up tool calling so the model can act as a coding agent.