397B running in 14GB of RAM via PAGED MoE on a 64GB Mac Studio — here's the engine
r/LocalLLaMA
•
Generative AI
AI Hardware
Hellooo r/LocalLLaMA Qwen3.5-397B-A17B is 209GB on disk. The MoE has 512 experts, top-10 routing per token. The naive load won't open on a M1 64GB Mac. What I (claude) did: keep only K=20 experts resident, lazy-page the rest from SSD when the router selects them, evict on cache pressure. Float16 compute path (faster than ternary on MPS), Apple Silicon native, MLX-based.