Expert Upcycling: Growing MoE capacity mid-training without increasing inference cost (7B→13B, ~32% GPU hours saved)
r/LocalLLaMA
•
AI Hardware
AI Research
Author here, sharing a preprint we recently released. We're actively looking for feedback from this community before we revise. Motivation