DFlash Doubles the T/S Gen Speed of Qwen3.5 27B (BF16) on Mac M5 Max

r/LocalLLaMA
Open Source AI AI Tools

The new DFlash in oMLX 0.3.5 RC1 looks like it doubles (!) the speed of Qwen3.5 27B (BF16). Initial test. Generation T/S went from 9 to 22 T/S! Models used (HuggingFace) Main Model: Jackrong/MLX-Qwopus3.5-27B-v3-bf16 Draft Model: z-lab/Qwen3.5-27B-DFlash System: M5 Max 128GB DFlash on Github: oMLX (v0.3.5 RC1): I'm not affiliated with any of the developers. Since the Qwen3.5 27B model is so good for the size, with speed being the only thing holding it back, I thought that this may help deploy this model locally at higher quants/full weights. I've yet to test with OpenCode or other harness.