Turns out Gemma 4 had MTP (multi token prediction) all along
r/LocalLLaMA
•
Open Source AI
Hey Everyone, While I was trying to utilize Gemma 4 through the LiteRT api in my android app, I noticed that Gemma 4 was throwing errors when loading it on my Google Pixel 9 test device of the "mtp weights being an incompatible tensor shape". I did some digging and found out there's additional MTP prediction heads within the LiteRT files for speculative decoding and much faster outputs. Well turns out I got confirmation today from a Google employee that Gemma 4 DOES INDEED have MTP but it was "removed on purpose" for "ensuring compatibility and broad usability.