[cupel] M5 Max 128GB: Qwen3.5-397B IQ2 @ 29 tokens per second
r/LocalLLaMA
•
Open Source AI
A year ago I would just read about 397B league of models. Today I can run it on my laptop. The combination of importance matrix (imatrix) with Unsloth's per-model adaptive layer quantization is what makes it all possible. But I didn't start with 397B, I started with 17 smaller models. There were a lot of great feedback from " M5 Max 128GB, 17 models, 23 prompts: Qwen 3.5 122B is still a local king " discussion.