Qwen3.5-35B-A3B Benchmark On MacBook Pro(M4 Pro Chip + 48GB Unified Memory)

r/LocalLLaMA
Open Source AI AI Research

Llamacpp command config: --model ~/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf \ --mmproj ~/.lmstudio/models/lmstudio-community/Qwen3.5-35B-A3B-GGUF/mmproj-Qwen3.5-35B-A3B-BF16.gguf \ --alias "qwen/qwen3.5-35B-A3B" \ --temp 0.6 \ --top-p 0.95 \ --top-k 20 \ --min-p 0.00 \ --jinja -c 0 \ --host 127.0.0.1 \ --port 8001 \ --k-unified \ --cache-type-k q8_0 --cache-type- q8_0 \ --flash-attn on --fit on \ --ctx-size 98304 Current throughput(also in the screenshot): ~35 tok/sec Also, tried with a small draft model.