Distributed SGD with flexible gradient compression

TT Phuong - IEEE Access, 2020 - ieeexplore.ieee.org
IEEE Access, 2020ieeexplore.ieee.org
We design and evaluate a new algorithm called FlexCompressSGD for training deep neural
networks over distributed datasets via multiple workers and a central server. In
FlexCompressSGD, all gradients transmitted between workers and the server are
compressed, and the workers are allowed to flexibly choose a compressing method different
from that of the server. This flexibility significantly helps reduce the communication cost from
each worker to the server. We mathematically prove that FlexCompressSGD converges with …
We design and evaluate a new algorithm called FlexCompressSGD for training deep neural networks over distributed datasets via multiple workers and a central server. In FlexCompressSGD, all gradients transmitted between workers and the server are compressed, and the workers are allowed to flexibly choose a compressing method different from that of the server. This flexibility significantly helps reduce the communication cost from each worker to the server. We mathematically prove that FlexCompressSGD converges with convergence rate 1/√(MT) where M is the number of distributed workers and T is the number of training iterations. We experimentally demonstrate that FlexCompressSGD obtains competitive top-1 testing accuracy on the ImageNet dataset while being able to reduce more than 70% communication cost from each worker to the server when compared with the state-of-the-art.
ieeexplore.ieee.org
Showing the best result for this search. See all results