Tue.3 14:45–16:00 | H 2013 | NON
.

Stochastic Methods for Nonsmooth Optimization

Chair: Konstantin Mishchenko Organizer: Konstantin Mishchenko
14:45

Tianyi Lin

Improved oracle complexity for stochastic compositional variance reduced gradient

We propose an accelerated stochastic compositional variance reduced gradient method for optimizing the sum of a composition function and a convex nonsmooth function. We provide an incremental first-order oracle (IFO) complexity analysis for the proposed algorithm and show that it is provably faster than all the existing methods. In particular, we show that our method achieves an asymptotic IFO complexity of O(( m+n)log(1/eps) + 1/eps^3) where m and n are the number of inner/outer component functions, improving the best-known results of O(m+n+(m+n)^{2/3}/eps^2) for convex composition problem.

15:10

Adil Salim

On the Convergence of some Stochastic Primal-Dual Algorithms

Primal-dual algorithms, like Chambolle-Pock or Vu-Condat algorithms, are suitable methods to solve nonsmooth optimization problems involving linear constraints. In this talk, we will consider stochastic versions of these algorithms where every function used to define the problem is written as an expectation, including the constraints. Asymptotic and non-asymptotic convergence results involving the expected primal-dual gap will be provided. These results cover some recent findings regarding stochastic proximal (gradient) algorithms.

15:35

Konstantin Mishchenko

joint work with Peter Richtárik

Variance Reduction for Sums with Smooth and Nonsmooth Components with Linear Convergence

We present a stochastic variance reduction method for convex sums with smooth and nonsmooth entries. In the case where the smooth part is strongly convex and the nonsmooth entries are of the form $g_j(A_j x)$ the method achieves linear convergence with rate $O(\max(n, \kappa, \lambda_{\min}(A))*\log(1/\epsilon))$, where $n$ is the number of smooth terms, $\kappa$ is the conditioning and $\lambda_{\min}(A)$ is the smallest eigenvalue of the matrix constructed from the rows of $A_1$, $A_2$, etc. Without these assumption, the method has rate $O(1/t)$ which improves to $O(1/t^2)$ under strong convexity.