AI RESEARCH

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

arXiv CS.AI

ArXi:2605.19619v1 Announce Type: cross Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows markedly faster convergence than the vector-wise algorithms. Although some works have begun to study convergence properties (i.e., optimization error) of the Muon optimizer, its generalization properties (i.e., generalization error) is still not established.