Uploaded Unsloth Qwen3.6-35B-A3B UD XL models with MTP grafted, here are the results
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Tools
Following my previous post, a few people asked for the 35B A3B version. The model is up on HuggingFace at if anyone wants to check it out. It includes the isolated MTP layers and convert.py as well. The results are not great though. Q4 only got a 6% speed increase and Q8 only 2.5%. On the 27B it was a 2-2.5x gain, so this could be related to the MTP implementation of llama.cpp and the qwen35moe architecture or just a limitation of the model. Results are preliminary and might change in future. Either way, wanted to report back for anyone who was wondering.