AI RESEARCH
Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning
arXiv CS.LG
•
ArXi:2604.23036v1 Announce Type: new Despite MoE models leading many benchmarks, supervised fine-tuning (SFT) for the MoE architectures remains difficult because its router layers are fragile. Methods such as DenseMixer and ESFT mitigate router collapse with dense mixing or auxiliary load-balancing losses, but these