Qwen Models with Claude Code on 36gb vram - insights

r/LocalLLaMA
Generative AI Open Source AI

I have tried the local models Qwen3-Coder-Next 80a3b (unsloth gguf: Qwen3-Coder-Next-UD-IQ3_XXS) and Qwen3.5 35a3b (unsloth gguf: Qwen3.5-35B-A3B-UD-Q4_K_XL) with Claude Code. Both run with a context of ~132k in the 36GB combined VRAM of my RTX 3090 and RTX 5070. I could have maybe used a 5 or 6-bit quant with the 35B model with this VRAM. Insights: Qwen3-Coder-Next is superior in all aspects. The biggest issue with the Qwen3.5 35B was that it stops during the middle of jobs in Claude Code. I had to spam /execute-plan from Superpowers in order for it to work.