Qwen 3.5 397b (180gb) scores 93% on MMLU

r/LocalLLaMA
Open Source AI

I see that on MLX, there simply is no smaller version of Qwen 3.5 397b other than the 4bit - and even then the 4bit is extremely poor on coding and other specifics (i’ll have benchmarks by tmrrw for regular MLX), and while 4bit MLX would be closer to 200gb, I was able to make a 180gb quantized version that scored 93% with reasoning on on MMLU 200 questions while retaining the full 38 token/s of the m3 ultra m chip speeds (gguf on mac has 1/3rd reduced speeds for qwen 3.5). Does anyone have benchmarks for the q2 or mlx’s 4bit? It would take me a few hrs to leave it running.