Enhancing LLM Training via Spectral Clipping

ArXi:2603.14315v1 Announce Type: new While spectral-based optimizers like Muon operate directly on the spectrum of updates, standard adaptive methods such as AdamW do not account for the global spectral structure of weights and gradients, leaving them vulnerable to two empirical issues in large language model