These are the benchmark results for Gemma4 E4B tested on my iPhone 16 Pro.
r/singularity
•
AI Hardware
AI Research
The first photo shows the results when run on the CPU, and the second one is on the GPU. Look at the speed difference between the Prefill and Decode speeds in my benchmark results. There's almost a 10 to 20-fold gap. They say Prefill is mainly driven by the CPU or GPU, while memory speed is what really matters during the Decode stage. It seems memory really is the bottleneck in AI inference. It's pretty insane. Of