AI RESEARCH

On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization

arXiv CS.LG

ArXi:2601.12238v4 Announce Type: replace-cross In this paper, we provide a comprehensive theoretical analysis of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak Heavy-Ball and Nestero) for tracking time-varying optima under strong convexity and smoothness. Our finite-time bounds reveal a sharp decomposition of tracking error into transient, noise-induced, and drift-induced components.