Wed.2 14:15–15:30 | H 0104 | BIG
.

Recent Advancements in Optimization Methods for Machine Learning (3/4)

Chair: Martin Takáč Organizers: Albert Berahas, Martin Takáč
14:15

Peter Richtárik

joint work with Konstantin Mishchenko, Filip Hanzely

SEGA: Variance Reduction via Gradient Sketching

We propose a randomized first order optimization method--SEGA (SkEtched GrAdient method)-- which progressively throughout its iterations builds a variance-reduced estimate of the gradient from random linear measurements (sketches) of the gradient obtained from an oracle. In each iteration, SEGA updates the current estimate of the gradient through a sketch-and-project operation using the information provided by the latest sketch, and this is subsequently used to compute an unbiased estimate of the true gradient through a random relaxation procedure.

14:40

[canceled] Aaron Defazio

Optimization, Initialization and Preconditioning of Modern ReLU Networks

In this talk I will discuss a number of issues encountered when optimizing modern ReLU based deep neural networks such as the ResNet-50 model, and suggest some solutions. Topics include: - Improving the conditioning of the Hessian using a careful initialization scheme. - Guiding the design of networks by following a scaling rule that improves the conditioning of the Hessian, resulting in less guess work and faster optimization. - Avoiding instability at the beginning of learning by using balanced ReLUs. - Variance reduction for deep learning

15:05

Aurelien Lucchi

Continuous-time Models for Accelerated Stochastic Optimization Algorithms

We focus our discussion on accelerated methods for non-convex and stochastic optimization problems. The choice of how to memorize information to build the momentum has a significant effect on the convergence properties of an accelerated method. We first derive a general continuous-time model that can incorporate arbitrary types of memory. We then demonstrate how to discretize such process while matching the same rate of convergence as the continuous-time model. We will also discuss modern optimization techniques used in machine learning such as Adam and RMSprop.