DTree on MLX ... tiny win over DFlash on Qwen3.5-4B (M2)..

r/LocalLLaMA
Generative AI

I ported DTree to MLX. and finally got one setting that seems to beat matched DFlash locally. M2 Max 32GB, Qwen3.5-4B, q4_g64, spec=16, tree_budget=24 - DFlash: 45.07 e2e tok/s - DTree: 48.31 e2e tok/s So basically ~1.07x over DFlash. Not massive, but at least it looks real and repeatable enough to mention. A lot of the other things I tried were flat or just worse, so my current read is that MLX verifier cost is still the main limiter here. anyone has gotten bigger DTree gains on MLX? submitted by /u/naftalinus [link] [comments.