Tue.3 14:45–16:00 | H 2033 | STO
.

Large-Scale Stochastic First-Order Optimization (2/2)

Chair: Samuel Horvath Organizers: Samuel Horvath, Filip Hanzely
14:45

Nidham Gazagnadou

joint work with Robert Gower, Joseph Salmon

Optimal mini-batch and step sizes for SAGA

Recently, it has been shown that the step sizes of a family of stochastic variance reduced gradient methods, called the JacSketch methods, depend on the expected smoothness constant. We provide closed form expressions for this constant, leading to larger step sizes and thus to faster convergence, and numerical experiments verifying these bounds. We suggest a new practice for setting the mini-batch and step sizes for SAGA, part of the JacSketch family. Furthermore, we can now show that the total complexity of the SAGA algorithm decreases linearly in the mini-batch size up to an optimal value.

15:10

[canceled] Hamed Sadeghi

joint work with Pontus Giselsson

Acceleration of reduced variance stochastic gradient method

We propose a convergence acceleration scheme for optimization problems in stochastic settings. We use Anderson acceleration in combination with a reduced variance stochastic gradient method and show that the iterates are stochastically quasi-Fejér monotone. This allows us to prove almost sure convergence of the method. We apply the acceleration technique to reduced variance stochastic gradient algorithms SVRG and SAGA to show its performance gains.

15:35

Samuel Horvath

Stochastic Distributed Learning with Gradient Quantization and Variance Reduction

We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes to compress (e.g. quantize or sparsify) the gradients, thereby introducing additional variance, that might slow down convergence. We provided a unified analysis of these approaches in strongly/weakly convex and non-convex frameworks and provide the first methods that achieve linear convergence for arbitrary quantized updates.