AI RESEARCH

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

arXiv CS.LG

ArXi:2604.23036v1 Announce Type: new Despite MoE models leading many benchmarks, supervised fine-tuning (SFT) for the MoE architectures remains difficult because its router layers are fragile. Methods such as DenseMixer and ESFT mitigate router collapse with dense mixing or auxiliary load-balancing losses, but these