Gemma4 26B A4B runs easily on 16GB Macs
r/LocalLLaMA
•
AI Hardware
Typically, models in the 26B-class range are difficult to run on 16GB macs because any GPU acceleration requires the accelerated layers to sit entirely within wired memory. It's possible with aggressive quants (2 bits, or maybe a very lightweight IQ3_XXS), but quality degrades significantly by doing so. However, if run entirely on the CPU instead (which is much feasible with MoE models), it's possible to run really good quants even when the models end up being larger than the entire available system