We study the problem of minimizing a strongly convex and smooth function when we have noisy estimates of its gradient. We first provide a framework that can systematically trade-off acceleration and robustness to gradient errors. Building on our framework, we propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics.
joint work with John Duchi
Standard stochastic optimization methods are brittle, sensitive to algorithmic parameters, and exhibit instability outside of well-behaved families of objectives. To address these challenges, we investigate models that exhibit better robustness to problem families and algorithmic parameters. With appropriately accurate models, which we call the aProx family, stochastic methods can be made stable, provably convergent and asymptotically optimal. We highlight the importance of robustness and accurate modeling with a careful experimental evaluation of convergence time and algorithm sensitivity.
joint work with Alexandre d'Aspremont, Raghu Bollapragada
We describe a convergence acceleration scheme for multistep optimization algorithms. The extrapolated solution is written as a nonlinear average of the iterates produced by the original optimization algorithm. Our scheme handles algorithms with momentum terms such as Nesterov's accelerated method, or primal-dual methods. The weights are computed via a simple linear system and we analyze performance in both online and offline modes. We use Crouzeix's conjecture to show that acceleration performance is controlled by the solution of a Chebyshev problem.