llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Tools

Just compiled llama.cpp on MacBook Neo with 8 Gb RAM and 9b Qwen 3.5 and it works (slowly, but anyway) Config used: Build - llama.cpp version: 8294 (76ea1c1c4) Machine - Model: MacBook Neo (Mac17,5) - Chip: Apple A18 Pro - CPU: 6 cores (2 performance + 4 efficiency) - GPU: Apple A18 Pro, 5 cores, Metal ed - Memory: 8 GB unified Model - Hugging Face repo: unsloth/Qwen3.5-9B-GGUF - GGUF file: models/Qwen3.5-9B-Q3_K_M.gguf - File size on disk: 4.4 GB Launch hyperparams./build/bin/llama-cli \ -m models/Qwen3.5-9B-Q3_K_M.gguf \ --device MTL0 \ -ngl all \ -c 4096 \ -b 128 \ -ub 64 \ -ctk q4_0.