joint work with Jalal Fadili, Jerome Malick, Gabriel Peyré
We consider supervised learning problems where the prior on the coefficients of the estimator is an assumption of low complexity. An important question in that setting is the one of model consistency, which consists of recovering the correct structure by minimizing a regularized empirical risk. Under some conditions, it is known that deterministic algorithms succeed in recovering the correct structure, while a simple stochastic gradient method fails to do so. In this talk, we provide the theoretical underpinning of this behavior using the notion of mirror-stratifiable regularizers.
joint work with Halyun Jeong, Julie Nutini, Mark Schmidt
In machine learning, models that generalize better often generate outputs that lie on a low-dimensional manifold. Recently, several works have shown finite-time manifold identification by proximal methods. In this work, we provide a unified view by giving a simple condition under which any proximal method using a constant step size can achieve finite-iteration manifold detection. We provide iteration bounds for several key methods such as (FISTA, DRS, ADMM, SVRG, SAGA, and RDA).
joint work with Andrei Kulunchakov
We propose a unified view of gradient-based algorithms for stochastic convex composite optimization based on estimate sequences. This point of view covers stochastic gradient descent (SGD), the variance-reduction approaches SAGA, SVRG, MISO, their proximal variants, and has several advantages: (i) we provide a generic proof of convergence for the aforementioned methods; (ii) we obtain new algorithms with the same guarantees (iii) we derive generic strategies to make these algorithms robust to stochastic noise (iv) we obtain new accelerated algorithms.