Optimize MOE GEMV kernel for BS > 1. by gaugarg-nv · Pull Request #20905 · ggml-org/llama.cpp
r/LocalLLaMA
•
Generative AI
AI Hardware
Open Source AI
What's your speedup? (CUDA only) submitted by /u/jacek2023 [link] [comments]