joint work with El houcine Bergou, Youssef Diouane, Clément Royer
Optimization of a large finite sum of nonconvex functions, such as appearing in the training of deep neural network architectures, suffers from two main challenges: 1) slow progress near saddle points and 2) required a priori knowledge of problem constants to define a learning rate. In this talk we present a stochastic line search method for finite sum optimization, in which first and second order information is sampled to define the step. We show thoretically that the method enjoys a favorable complexity rate and numerical tests validate the performance of the method.
joint work with Eduard Gorbunov, Peter Richtárik
In this paper we consider the unconstrained minimization problem of a smooth function in a setting where only function evaluations are possible. We design a novel randomized derivative-free algorithm--the stochastic three points (STP) method and analyze its iteration complexity. Given a current iterate x, STP compares the objective function at three points: x, x+αs and x-αs, where α>0 is a stepsize parameter and s is the random search direction. The best of these three points is the next iterate. We analyze the method STP under several stepsize selection schemes.
After the randomized Kaczmarz algorithm of Thomas Strohmer and Roman Vershynin was introduced, there have been many research papers on improving the algorithm further. What exactly is the role of randomness in all the effort? We will first give a brief survey on some recent works and point out that in some cases, randomness is not needed at all. Then we will present a new modification of the algorithm that can help us to accelerate the convergence of the algorithm where randomness is needed.