M5 Max Actual Pre-fill performance gains

r/LocalLLaMA
AI Hardware

I think I figured out why apple says 4x the peak GPU AI compute. It's because they load it with a bunch of power for a few seconds. So it looks like half the performance comes from AI accelerators and the other half from dumping watts in (or the AI accelerators use watts). Press release: "With a Neural Accelerator in each GPU core and higher unified memory bandwidth, M5 Pro and M5 Max are over 4x the peak GPU compute for AI compared to the previous generation." This is good for short bursty prompts but longer ones I imagine the speed gains diminish.