Why MoE models keep converging on ~10B active parameters
r/LocalLLaMA
•
Open Source AI
Interesting pattern: despite wildly different total sizes, many recent MoE models land around 10B active params. Qwen 3.5 122B activates 10B. MiniMax M2.7 runs 230B total with 10B active via Top 2 routing