Wed.1 11:30–12:45 | H 2038 | SPA
.

Complexity of Sparse Optimization for Subspace Identification

Chair: Julien Mairal Organizer: Julien Mairal
11:30

Guillaume Garrigos

joint work with Jalal Fadili, Jerome Malick, Gabriel Peyré

Model Consistency for Learning with low-complexity priors

We consider supervised learning problems where the prior on the coefficients of the estimator is an assumption of low complexity. An important question in that setting is the one of model consistency, which consists of recovering the correct structure by minimizing a regularized empirical risk. Under some conditions, it is known that deterministic algorithms succeed in recovering the correct structure, while a simple stochastic gradient method fails to do so. In this talk, we provide the theoretical underpinning of this behavior using the notion of mirror-stratifiable regularizers.

11:55

Yifan Sun

joint work with Halyun Jeong, Julie Nutini, Mark Schmidt

Are we there yet? Manifold identification of gradient-related proximal methods

In machine learning, models that generalize better often generate outputs that lie on a low-dimensional manifold. Recently, several works have shown finite-time manifold identification by proximal methods. In this work, we provide a unified view by giving a simple condition under which any proximal method using a constant step size can achieve finite-iteration manifold detection. We provide iteration bounds for several key methods such as (FISTA, DRS, ADMM, SVRG, SAGA, and RDA).

12:20

Julien Mairal

joint work with Andrei Kulunchakov

Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

We propose a unified view of gradient-based algorithms for stochastic convex composite optimization based on estimate sequences. This point of view covers stochastic gradient descent (SGD), the variance-reduction approaches SAGA, SVRG, MISO, their proximal variants, and has several advantages: (i) we provide a generic proof of convergence for the aforementioned methods; (ii) we obtain new algorithms with the same guarantees (iii) we derive generic strategies to make these algorithms robust to stochastic noise (iv) we obtain new accelerated algorithms.