Thu.2 10:45–12:00 | H 3007 | BIG
.

Big Data and Machine Learning – Contributed Session 2/3

Chair: Rachael Tappenden
10:45

Lei Zhao

joint work with Daoli Zhu

Linear Convergence of Variable Bregman Stochastic Coordinate Descent Method for Nonsmooth Nonconvex Optimization by Level-set Variational Analysis

In this paper, we develop a new variational approach on level sets aiming towards linear convergence of a variable Bregman stochastic coordinate descent (VBSCD) method for a broad class of nonsmooth and nonconvex optimization problem. Along this way, we not only derive the convergence of VBSCD, that is any accumulation of the sequence generated by VBSCD is a critical point, but also provide $\{\mathbb{E}_{\xi_{k-1}}F(x^k)\}$ $Q-$linear and almost surely $\{\mathbb{E}_{\xi_{k-1}}x^k\}$ $R-$linear convergence rate.

11:10

Sarit Khirirat

joint work with Sindri Magnússon, Mikael Johansson

Compressed Gradient Methods with Memory-Based Error Compensation

The veritable scale of modern data necessitates information compression in distributed optimization for machine learning. Gradient compression schemes using memory-based error compensation have displayed superior performance in practice. We illustrate how these approaches can be improved by using Hessian information in the compression. Explicit expressions for the accuracy gains on strongly convex and non-convex optimization problems are derived. The superiority of our approach in terms of convergence speed and solution accuracy is illustrated in numerical examples with 1-bit compression.

11:35

Rachael Tappenden

joint work with Chenxin Ma, Naga Venkata C. Gudapati, Majid Jahani, Martin Takáč

Underestimate Sequences and Quadratic Averaging

This talk discusses several first order methods for minimizing strongly convex functions in both the smooth and composite cases. The algorithms, based on efficiently updating lower bounds on the objective functions, have natural stopping conditions that provide the user with a certificate of optimality. Convergence of all algorithms is guaranteed via a new Underestimate Sequence framework, and the algorithms converge linearly, with the accelerated variants converging at the optimal linear rate.