llama: avoid copying logits during prompt decode in MTP by am17an · Pull Request #23198 · ggml-org/llama.cpp

r/LocalLLaMA
Generative AI Open Source AI

Time to update your llama.cpp -> improved prompt processing speed submitted by /u/jacek2023 [link] [comments]