Llama.cpp MTP support now in beta!
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Tools
Happy to report that llama.cpp MTP is now in beta, thanks to Aman (and all the others that have pushed the various issues in the meantime). This has the potential to actually get merged soon-ish. Currently contains for Qwen3.5 MTP, but other models are likely to follow suit. Between this and the maturing tensor-parallel, expect most performance gaps between llama.cpp and vLLM, at least when it comes to token generation speeds, to be erased. submitted by /u/ilintar [link] [comments.