Multi-Token Prediction (MTP) for qwen-3.5 is coming to mlx-lm

r/LocalLLaMA •
Open Source AI

🚀 Big update for the LocalLlama community: Multi-Token Prediction (MTP) is coming to mlx-lm for the qwen - 3.5 series. (not my PR, just sharing because this is cool 👇) Early for generating multiple tokens per forward pass is in, and the gains already look solid: • 15.3 → 23.3 tok/s (~1.5x throughput boost) • ~80.6% acceptance rate The author of the PR benchmarked with Qwen3.5-27B 4-bit on an M4 Pro. Huge kudos to AirRunner for contributing this 🙌 PR: submitted by /u/be566 [link] [comments.