Running 400B model on iPhone (1 minute read)

TLDR AI
Generative AI

A short clip of an iPro running Qwen3.5-397B-A17B (a 397B parameters Mixture-of-Experts model) at 0.6 tokens per second.