AI RESEARCH
Optimal Transport Aggregation for Distributed Mixture-of-Experts
arXiv CS.AI
•
ArXi:2312.09877v2 Announce Type: replace-cross Mixture-of-experts (MoE) models provide a flexible statistical framework for modeling heterogeneity and nonlinear relationships. In many modern applications, however, datasets are naturally distributed across multiple machines due to storage, computational, or governance constraints. We consider a distributed model aggregation setting in which local MoE models are trained independently on decentralized datasets and subsequently combined into a global estimator.