AI RESEARCH

From SGD to Muon: Adaptive Optimization via Schatten-p Norms

arXiv CS.AI

ArXi:2605.19781v1 Announce Type: new Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-wise constraints can be unified under Linear Minimization Oracle (LMO) theory. However, all current methods impose fixed LMO geometries for the update rules, chosen by-design or empirically, which are not necessarily optimal according to the problem's geometry. We