Tue.3 14:45–16:00 | H 0105 | BIG
.

Recent Advancements in Optimization Methods for Machine Learning (1/4)

Chair: Albert Berahas Organizer: Paul Grigas
14:45

Alexandre d'Aspremont

joint work with Thomas Kerdreux, Sebastian Pokutta

Restarting Frank-Wolfe

The vanilla Frank-Wolfe algorithm only ensures a worst-case complexity of O(1/epsilon). Various recent results have shown that for strongly convex functions, the method can be slightly modified to achieve linear convergence. However, this still leaves a huge gap between sublinear O(1/epsilon) convergence and linear O(log 1/epsilon) convergence. Here, we present a new variant of Conditional Gradients, that can dynamically adapt to the function's geometric properties using restarts and smoothly interpolates between the sublinear and linear regimes.

15:10

Heyuan Liu

joint work with Paul Grigas

[moved] New Methods for Regularization Path Optimization via Differential Equations

We develop and analyze several different second order algorithms for computing an approximately optimal solution/regularization path of a parameterized convex optimization problem with smooth Hessian. Our algorithms are inspired by a differential equations perspective on the parametric solution path and do not rely on the specific structure of the regularizer. We present computational guarantees that bound the oracle complexity to achieve an approximately optimal solution path under different sets of smoothness assumptions and that also hold in the presence of approximate subproblem solutions.

15:35

Lin Xiao

joint work with Junyu Zhang

A Composite Randomized Incremental Gradient Method

We consider the problem of minimizing the composition of a smooth function (which can be nonconvex) and a smooth vector mapping, where both of them can be expressed as the average of a large number of components. We propose a composite randomized incremental gradient method based on a SAGA-type of gradient estimator. The gradient sample complexity of our method matches that of several recently developed methods based on SVRG in the general case. However, for structured problems where linear convergence rates can be obtained, our method has much better complexity for ill-conditioned problems.