CompressedScaffnew: The First Theoretical Double Acceleration of Communication from Local Training and Compression in Distributed Optimization

ArXi:2210.13277v4 Announce Type: replace In distributed optimization, a large number of machines alternate between local computations and communication with a coordinating server. Communication, which can be slow and costly, is the main bottleneck in this setting. To reduce this burden and. therefore. accelerate distributed gradient descent, two strategies are popular: 1) communicate less frequently; that is, perform several iterations of local computations between the communication rounds; and 2) communicate compressed information instead of full-dimensional vectors.