AI RESEARCH

Rethinking Language Model Scaling under Transferable Hypersphere Optimization

arXiv CS.LG

ArXi:2603.28743v1 Announce Type: new Scaling laws for large language models depend critically on the optimizer and parameterization. Existing hyperparameter transfer laws are mainly developed for first-order optimizers, and they do not structurally prevent