Thu.1 09:00–10:15 | H 0105 | BIG
.

Distributed and Decentralized Optimization: Learning, Lower-Complexity Bounds, and Communication-Efficiency

Chair: Kaizhao Sun Organizers: Andy Sun, Wotao Yin
9:00

Qing Ling

RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

We propose a class of robust stochastic methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The key is a regularization term incorporated with the cost function to robustify the learning task. In contrast to most of the existing algorithms, the resultant Byzantine-Robust Stochastic Aggregation (RSA) methods do not rely on the assumption that the data are i.i.d. on the workers. We show that the RSA methods converge to a near-optimal solution with the learning error dependent on the number of Byzantine workers.

9:25

Mingyi Hong

joint work with Haoran Sun

Distributed Non-Convex First-Order Optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms

We consider a distributed non-convex optimization problems, in which a number of agents collectively optimize a sum of smooth local objective functions. We address the question: What is the fastest rate that any distributed algorithms can achieve, and how to achieve those rates. We develop a lower bound analysis, and show that in the worst-case it takes any first-order algorithm $O(1/(\epsilon*\sqrt(k)))$ iterations to achieve some $\epsilon$-solution, where $k$ is the network spectrum gap. We also develop a rate-optimal distributed method whose rate matches the lower bound.  The algorithm combines ideas from distributed consensus, as well as classical fast solvers for linear systems.

9:50

Wotao Yin

LAG: Lazily Aggregated Gradient is Communication-Efficient in Distributed and Decentralized Optimization

In Lazily Aggregated Gradient (LAG), the nodes in a distributed or decentralized algorithm can choose to use outdated gradient information, therefore, reduce communication. This is also called gradient censoring. Surprisingly, the original convergence rates remain the same in the strongly-convex, convex, and nonconvex cases while communication is reduced. When local Lipschitz constants are heterogeneous, communication is reduced significantly. Numerical experiments with synthetic and real data will demonstrate this significant communication reduction. This talk combines different recent works.