AI RESEARCH
On the Convergence Analysis of Muon
arXiv CS.LG
•
ArXi:2505.23737v2 Announce Type: replace-cross The majority of parameters in neural networks are naturally represented as matrices. However, most commonly used optimizers treat these matrix parameters as flattened vectors during optimization, potentially overlooking their inherent structural properties. Recently, an optimizer called Muon has been proposed, specifically designed to optimize matrix-structured parameters. Extensive empirical evidence shows that Muon can significantly outperform traditional optimizers when