Qwen 3 32B outscored every Qwen 3.5 model across 11 blind evals, 3B-active-parameter model won 4

(Note: Several people in the SLM results thread asked for Qwen 3.5 models. This delivers on that.) People in my SLM results thread asked for Qwen 3.5 numbers. Ran 8 Qwen models head-to-head across 11 hard evaluations: survivorship bias, Arrow's impossibility theorem, Kelly criterion, Simpson's Paradox (construct exact numbers), Bayesian probability, LRU cache with TTL, Node.js 502 debugging, SQL optimization, Go concurrency bugs, distributed lock race conditions, and a baseline string reversal. Same methodology as the SLM batch. Every model sees the same prompt.