Running Qwen3.5 397B on M3 Macbook Pro with 48GB RAM at 5 t/s

r/LocalLLaMA
Generative AI

This guy, Dan Woods, used Karpathy's autoresearch and Apple's "LLM in a Flash" paper to evolve a harness that can run Qwen3.5 397B at 5.7 t/s on only 48GB RAM. X.com article here, github repository and paper here. He says the math suggests 18 t/s is possible on his hardware and that dense models that have a predictable weight access pattern could get even faster. submitted by /u/jawondo [link] [comments]