CUDA: reduce MMQ stream-k overhead by JohannesGaessler · Pull Request #22298 · ggml-org/llama.cpp

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

CUDA prompt processing speedup on MoE check this submitted by /u/jacek2023 [link] [comments]