Optimize MOE GEMV kernel for BS > 1. by gaugarg-nv · Pull Request #20905 · ggml-org/llama.cpp

r/LocalLLaMA
Generative AI AI Hardware Open Source AI

What's your speedup? (CUDA only) submitted by /u/jacek2023 [link] [comments]