llama: avoid copying logits during prompt decode in MTP by am17an · Pull Request #23198 · ggml-org/llama.cpp
r/LocalLLaMA
•
Generative AI
Open Source AI
Time to update your llama.cpp -> improved prompt processing speed submitted by /u/jacek2023 [link] [comments]