why llama.cpp can’t combine speculative decode methods?
r/LocalLLaMA
•
Generative AI
Open Source AI
Dicking around with the new mtp speculative decode with qwen3.6 27b, and it’s great. but for agentic coding i’ve seen significant improvements from ngram, because a decent fraction of the time (e.g. calling edit tool) the model is just repeating verbatim a section of code that it has already seen before. ngram can speculate on a lot of tokens reeaallly fast in comparison. it’d be great if we could combine them by using them both at the same time, but it looks like if i add them both to the command line arguments, only ngram is active.