AI RESEARCH
RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts
arXiv CS.AI
•
ArXi:2604.26039v1 Announce Type: cross The optimal kernel configuration for Mixture-of-Experts (MoE) inference depends on both batch size and the expert routing distribution, yet production systems dispatch from batch size alone, leaving 10-70% of kernel throughput unrealized. We present RaMP, a routing-aware dispatch framework. A performance-region analysis derives, from hardware constants alone, when each optimization helps, correctly predicting all 8 tested architectures, including 3 unseen.