AI RESEARCH

Spectral Condition for $\mu$P under Width-Depth Scaling

arXiv CS.LG

ArXi:2603.00541v2 Announce Type: replace Generative foundation models are increasingly scaled in both width and depth, posing significant challenges for stable feature learning and reliable hyperparameter (HP) transfer across model sizes. While maximal update parameterization ($\mu$P) has provided a principled solution to both problems for width scaling, existing extensions to the joint width-depth scaling regime remain fragmented, architecture- and optimizer-specific, and often rely on technically involved theories.