AI RESEARCH
Beyond Muon: MUD (MomentUm Decorrelation) for Faster Transformer Training
arXiv CS.LG
•
ArXi:2603.17970v1 Announce Type: new Orthogonalized-momentum optimizers such as Muon improve transformer