Running 400B model on iPhone (1 minute read)
TLDR AI
•
Generative AI
A short clip of an iPro running Qwen3.5-397B-A17B (a 397B parameters Mixture-of-Experts model) at 0.6 tokens per second.