We propose a class of robust stochastic methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The key is a regularization term incorporated with the cost function to robustify the learning task. In contrast to most of the existing algorithms, the resultant Byzantine-Robust Stochastic Aggregation (RSA) methods do not rely on the assumption that the data are i.i.d. on the workers. We show that the RSA methods converge to a near-optimal solution with the learning error dependent on the number of Byzantine workers.
joint work with Haoran Sun
We consider a distributed non-convex optimization problems, in which a number of agents collectively optimize a sum of smooth local objective functions. We address the question: What is the fastest rate that any distributed algorithms can achieve, and how to achieve those rates. We develop a lower bound analysis, and show that in the worst-case it takes any first-order algorithm $O(1/(\epsilon*\sqrt(k)))$ iterations to achieve some $\epsilon$-solution, where $k$ is the network spectrum gap. We also develop a rate-optimal distributed method whose rate matches the lower bound. The algorithm combines ideas from distributed consensus, as well as classical fast solvers for linear systems.
In Lazily Aggregated Gradient (LAG), the nodes in a distributed or decentralized algorithm can choose to use outdated gradient information, therefore, reduce communication. This is also called gradient censoring. Surprisingly, the original convergence rates remain the same in the strongly-convex, convex, and nonconvex cases while communication is reduced. When local Lipschitz constants are heterogeneous, communication is reduced significantly. Numerical experiments with synthetic and real data will demonstrate this significant communication reduction. This talk combines different recent works.