Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

ArXi:2605.13434v1 Announce Type: new Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient arrives. Vanilla ASGD applies each arriving gradient with the same weight.