New Gemma 4 MTP on MLX?
r/LocalLLaMA
•
Open Source AI
In case you haven't heard, Google just released Multi Token Prediction drafters for Gemma 4, a speculative decoding approach that pairs the main model with a lightweight drafter. It can predict several tokens ahead and then verify them in parallel, speeding up inference 2-3x faster. Has anyone used this with MLX? I tried to without success. It does not seem to be ed yet. submitted by /u/purealgo [link] [comments]