AI RESEARCH

ExFusion: Efficient Transformer Training via Multi-Experts Fusion

arXiv CS.CV

ArXi:2603.27965v1 Announce Type: new Mixture-of-Experts (MoE) models substantially improve performance by increasing the capacity of dense architectures. However, directly