MTPLX | 2.24x faster TPS | The native MTP inference engine for Apple Silicon

r/LocalLLaMA
AI Hardware

TLDR: 28 tok/s → 63 tok/s on Qwen3.6-27B on a MacBook Pro M5 Max. 2.24× faster at real temperature 0.6. Works for coding, creative writing, and chat Works on ANY MTP model: No external drafter. No extra memory usage. Uses the model's own built-in MTP heads. Works on any model that ships them. Not greedy: Unlike similar speculative decoding projects, we use mathematically exact temperature sampling with rejection sampling. Adjustable temperatures for any task. Every other speculative decode project on Apple Silicon is greedy-only.