My gripe with Qwen3.5 35B and my first fine tune fix

r/LocalLLaMA
Machine Learning Open Source AI

This is not a Qwen 35B-A3B hater post, I love the model. When I saw the Qwen3.5 release, I was pretty excited because its size seemed perfect for local inference use, and the series looked like the first genuinely useful models for that purpose. I was getting 80+ tokens per second on my laptop, but I became very frustrated due to the following issues: Just saying hello can take up 500-700 reasoning tokens. At least some quantized versions get stuck in thinking loops and yield no output for moderate to complex questions.