SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

ArXi:2512.14080v2 Announce Type: replace-cross Mixture of Experts (MoE) models have emerged as the de facto architecture for scaling up language models without significantly increasing the computational cost. Recent MoE models nstrate a clear trend towards high expert granularity (smaller expert intermediate dimension) and higher sparsity (constant number of activated experts with a higher number of total experts), which improve model quality per FLOP.