M5 Max vs M3 Max Inference Benchmarks (Qwen3.5, oMLX, 128GB, 40 GPU cores)

Ran identical benchmarks on both 16” MacBook Pros with 40 GPU cores and 128GB unified memory across three Qwen 3.5 models (122B-A10B MoE, 35B-A3B MoE, 27B dense) using oMLX v0.2.23. Quick numbers at pp1024/tg128: 35B-A3B: 134.5 vs 80.3 tg tok/s (1.7x) 122B-A10B: 65.3 vs 46.1 tg tok/s (1.4x) 27B dense: 32.8 vs 23.0 tg tok/s (1.4x) The gap widens at longer contexts. At 65K, the 27B dense drops to 6.8 tg tok/s on M3 Max vs 19.6 on M5 Max (2.9x). Prefill advantages are even larger, up to 4x at long context, driven by the M5 Max’s GPU Neural Accelerators.