M5-Max Macbook Pro 128GB RAM - Qwen3 Coder Next 8-Bit Benchmark
r/LocalLLaMA
•
Machine Learning
Generative AI
AI Hardware
Open Source AI
AI Research
Qwen3-Coder-Next 8-Bit Benchmark: MLX vs Ollama TLDR: M5-Max with 128gb of RAM gets 72 tokens per second from Qwen3-Coder-Next 8-Bit using MLX Overview This benchmark compares two local inference backends - MLX (Apple's native ML framework) and Ollama (llama.cpp-based) - running the same Qwen3-Coder-Next model in 8-bit quantization on Apple Silicon. The goal is to measure raw throughput (tokens per second), time to first token (TTFT), and overall coding capability across a range of real-world programming tasks.