ICCOPT 2019

16:00

Martin Jaggi

Decentralized Learning with Communication Compression

We consider decentralized stochastic optimization, as in training of machine learning and deep learning models, where the training data remains separated over many user devices. We propose a new communication-efficient decentralized algorithm based on gradient compression (sparsification and quantization) for SGD as in https://arxiv.org/abs/1902.00340 , while also providing faster consensus algorithms under communication compression. Finally we discuss flexible decentralized learning of generalized linear models as in https://arxiv.org/abs/1808.0488

16:25

Robert Gower

Expected smoothness is the key to understanding the mini-batch complexity of stochastic gradient methods

Wouldn't it be great if there was automatic way of setting the optimal mini-batch size? In this talk I will present a rather general approach for choosing the optimal mini-batch size based on how smooth is the mini-batch function. For this I will introduce the notation of expected smoothness, and show our we are using this notation to choose the mini-batch size and stepsize in SVRG, SGD, and SAG. I will most likely focus on SVRG.

16:50

Ahmet Alacaoglu

joint work with Olivier Fercoq, Ion Necoara, Volkan Cevher

Almost surely constrained convex optimization

We propose a stochastic gradient framework for solving stochastic composite convex optimization problems with (possibly) infinite number of linear inclusion constraints that need to be satisfied almost surely. We use smoothing and homotopy techniques to handle constraints without the need for matrix-valued projections. We show for our stochastic gradient algorithm optimal rates up to logarithmic factors even without constraints, for general convex and restricted strongly convex problems. We demonstrate the performance of our algorithm with numerical experiments.

Recent Advancements in Optimization Methods for Machine Learning (4/4)