AI RESEARCH

Beyond Muon: MUD (MomentUm Decorrelation) for Faster Transformer Training

arXiv CS.LG

ArXi:2603.17970v1 Announce Type: new Orthogonalized-momentum optimizers such as Muon improve transformer