Thu.2 10:45–12:00 | H 0105 | BIG
.

Nonconvex Optimization in Machine Learning (1/2)

Chair: Ethan Fang Organizer: Ethan Fang
10:45

Yuxin Chen

Noisy Matrix Completion: Bridging Convex and Nonconvex Optimization

This talk is about noisy matrix completion. Arguably one of the most popular paradigms is convex relaxation, which achieves remarkable efficacy in practice. However, the theoretical support of this approach is still far from optimal. We make progress towards demystifying the practical efficacy of convex relaxation vis-à-vis random noise. When the rank of the matrix is O(1), we demonstrate that convex programming achieves near-optimal estimation errors. All of this is enabled by bridging convex relaxation with nonconvex optimization, a seemingly distinct algorithm that is robust against noise.

11:10

Zhuoran Yang

Variational Transport: A Convergent Particle-Based Algorithm for Distributional Optimization

We consider the problem of minimizing a functional defined over a family of probability distributions, where the functional is assumed to have a variational form. For this problem, we propose the variational transport algorithm, which approximately performs functional gradient descent over the manifold of probability distributions via iteratively pushing a set of particles. By characterizing both the statistical error of estimating the functional gradient and the progress of optimization algorithm, we show that the proposed algorithm enjoys global convergence with high probability.

11:35

Alfonso Lobos

joint work with Paul Grigas, Nathan Vermeersch

Stochastic In-Face Frank-Wolfe Methods for Non-Convex Optimization and Sparse Neural Network Training

The Frank-Wolfe (FW) method and its extensions are well-suited for delivering solutions with desirable structural properties, such as sparsity or low-rank structure. We adapt the methodology of in-face directions within the FW method to the setting of non-convex optimization. We are particularly motivated by the application of stochastic versions of FW and the extensions mentioned above to the training of neural networks with sparse and/or low-rank properties. We develop theoretical computational guarantees, and we complement these results with extensive numerical experiments.