What is the long-run distribution of stochastic gradient descent? A large deviations analysis

ArXi:2406.09241v3 Announce Type: replace-cross In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are likely to be visited by SGD, and by how much.