CUDA: reduce MMQ stream-k overhead by JohannesGaessler · Pull Request #22298 · ggml-org/llama.cpp
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
CUDA prompt processing speedup on MoE check this submitted by /u/jacek2023 [link] [comments]