Faster MoE Inference with Warp Decode (16 minute read)
TLDR AI
•
AI Research
Cursor's “warp decode” is a kernel design that reorganizes MoE inference around output neurons instead of experts. It achieves ~1.8x higher throughput and improved numerical accuracy on Blackwell GPUs.