Gradient Descent's Last Iterate is Often (slightly) Suboptimal

ArXi:2604.13870v1 Announce Type: cross We consider the well-studied setting of minimizing a convex Lipschitz function using either gradient descent (GD) or its stochastic variant (SGD), and examine the last iterate convergence. By now, it is known that standard stepsize choices lead to a last iterate convergence rate of $\log T/\sqrt{T}$ after $T$ steps. A breakthrough result of Jain recovered the optimal $1/\sqrt{T}$ rate by constructing a non-standard stepsize sequence.