We consider decentralized stochastic optimization, as in training of machine learning and deep learning models, where the training data remains separated over many user devices. We propose a new communication-efficient decentralized algorithm based on gradient compression (sparsification and quantization) for SGD as in https://arxiv.org/abs/1902.00340 , while also providing faster consensus algorithms under communication compression. Finally we discuss flexible decentralized learning of generalized linear models as in https://arxiv.org/abs/1808.0488
Wouldn't it be great if there was automatic way of setting the optimal mini-batch size? In this talk I will present a rather general approach for choosing the optimal mini-batch size based on how smooth is the mini-batch function. For this I will introduce the notation of expected smoothness, and show our we are using this notation to choose the mini-batch size and stepsize in SVRG, SGD, and SAG. I will most likely focus on SVRG.
joint work with Olivier Fercoq, Ion Necoara, Volkan Cevher
We propose a stochastic gradient framework for solving stochastic composite convex optimization problems with (possibly) infinite number of linear inclusion constraints that need to be satisfied almost surely. We use smoothing and homotopy techniques to handle constraints without the need for matrix-valued projections. We show for our stochastic gradient algorithm optimal rates up to logarithmic factors even without constraints, for general convex and restricted strongly convex problems. We demonstrate the performance of our algorithm with numerical experiments.