AI RESEARCH
Rethinking Language Model Scaling under Transferable Hypersphere Optimization
arXiv CS.LG
•
ArXi:2603.28743v1 Announce Type: new Scaling laws for large language models depend critically on the optimizer and parameterization. Existing hyperparameter transfer laws are mainly developed for first-order optimizers, and they do not structurally prevent