M5 Max vs M3 Max Inference Benchmarks (Qwen3.5, oMLX, 128GB, 40 GPU cores)
r/LocalLLaMA
•
AI Hardware
Open Source AI
Ran identical benchmarks on both 16” MacBook Pros with 40 GPU cores and 128GB unified memory across three Qwen 3.5 models (122B-A10B MoE, 35B-A3B MoE, 27B dense) using oMLX v0.2.23. Quick numbers at pp1024/tg128: 35B-A3B: 134.5 vs 80.3 tg tok/s (1.7x) 122B-A10B: 65.3 vs 46.1 tg tok/s (1.4x) 27B dense: 32.8 vs 23.0 tg tok/s (1.4x) The gap widens at longer contexts. At 65K, the 27B dense drops to 6.8 tg tok/s on M3 Max vs 19.6 on M5 Max (2.9x). Prefill advantages are even larger, up to 4x at long context, driven by the M5 Max’s GPU Neural Accelerators.