DFlash Doubles the T/S Gen Speed of Qwen3.5 27B (BF16) on Mac M5 Max

The new DFlash in oMLX 0.3.5 RC1 looks like it doubles (!) the speed of Qwen3.5 27B (BF16). Initial test. Generation T/S went from 9 to 22 T/S! Models used (HuggingFace) Main Model: Jackrong/MLX-Qwopus3.5-27B-v3-bf16 Draft Model: z-lab/Qwen3.5-27B-DFlash System: M5 Max 128GB DFlash on Github: oMLX (v0.3.5 RC1): I'm not affiliated with any of the developers. Since the Qwen3.5 27B model is so good for the size, with speed being the only thing holding it back, I thought that this may help deploy this model locally at higher quants/full weights. I've yet to test with OpenCode or other harness.