Update on General reasoning for local 16gb M4 model server Qwen3.5 LFM
r/LocalLLaMA
•
AI Research
I benchmarked 331 GGUF models on a Mac Mini M4 (16 GB) so you don't have to. Here are the results. Continuing on this past benchmark: - Choosing a local model for a 16 GB machine has been mostly vibes so I automated the entire pipeline and let it run for weeks. 31 out of 331 models are completely unusable on 16 GB Models with TTFT > 10 seconds or < 0.1 tokens/sec. They technically load but are memory-thrashing. This includes every 27B+ dense model I tested. The worst offender: Qwen3.5-27B-heretic-v2-Q4_K_S with a 97-second time-to-first-token and 0.007 tok/s.