AI RESEARCH
Muon Does Not Converge on Convex Lipschitz Functions
arXiv CS.LG
•
ArXi:2605.08980v1 Announce Type: new Muon and its variants have shown strong empirical performance in a variety of deep learning tasks. Existing convergence analyses of Muon rely on smoothness assumptions, though arguably the most successful function class for developing deep learning methods (such as AdaGrad, Shampoo, Schedule-Free and more) has been the class of convex and Lipschitz functions. In this paper we question whether the classical convex Lipschitz model is a useful one for understanding Muon. Our answer is no.