Llama.cpp MTP support now in beta!

r/LocalLLaMA
Generative AI Open Source AI AI Tools

Happy to report that llama.cpp MTP is now in beta, thanks to Aman (and all the others that have pushed the various issues in the meantime). This has the potential to actually get merged soon-ish. Currently contains for Qwen3.5 MTP, but other models are likely to follow suit. Between this and the maturing tensor-parallel, expect most performance gaps between llama.cpp and vLLM, at least when it comes to token generation speeds, to be erased. submitted by /u/ilintar [link] [comments.