Took the 48GB flash-moe benchmark and ran it on 128GB M5 Max. Here's what happens.
r/LocalLLaMA
•
AI Research
Saw Dan Woods post about running Qwen3.5-397B locally on a MacBook Pro with 48GB RAM at 4.36 tok/s. I have an M5 Max with 128GB so I had to try it. I used the Anemll fork ( which adds Metal 4 NAX for M5+ and the --cache-io-split flag. I ran the full cache-io-split sweep to find the actual optimal value. --- ## Speed vs baseline | Config | tok/s | ||| | M3 Max 48GB, original (Dan Woods) | 4.36 | | M5 Max 128GB, 4-bit, no split | 12.48 | | M5 Max 128GB, 4-bit, cache-io-split 4 | **12.99** | 3x faster than the original on a laptop with no cloud, no Python, just C and Metal shaders.