ICCOPT 2019

11:00

Ruoyu Sun

Landscape of Over-parameterized Networks

Non-convexity of neural networks may cause bad local minima, but the recent guess is that over-parameterization can smooth the landscape. We prove that for any continuous activation functions, the loss function has no bad strict local minimum, both in the regular sense and in the sense of sets. This result holds for any convex and continuous loss function, and the data samples are only required to be distinct in at least one dimension. Furthermore, we show that bad local minima do exist for a class of activation functions.

11:25

Sijia Liu

joint work with Mingyi Hong

On the convergence of a class of adam-type algorithms for non-convex optimization

We study a class of adaptive gradient based momentum algorithms, called "Adam-type" algorithm, that update search directions and learning rate simultaneously using past gradients. The convergence of these algorithms for nonconvex problems remains an open question. We provide a set of mild sufficient conditions that guarantee the convergence of the Adam-type methods. We prove that under our derived conditions, these methods can achieve convergence rate O(log(T)/sqrt(T)) for nonconvex optimization. We show the conditions are essential in the sense that violating them may cause divergence.

11:50

Alex Schwing

Optimization for GANs via Sliced Distance

In the Wasserstein GAN, the Wasserstein distance is commonly transformed to its dual via the Kantorovich-Rubinstein Duality and then optimized. However, the dual problem requires a Lipschitz condition which is hard to impose. One idea is to solve the primal problem directly, but this is computationally expensive and has an exponential sample complexity. We propose to solve the problem using the sliced and max-sliced Wasserstein distance, which enjoy polynomial sample complexity. Analysis shows that this approach is easy to train, requires little tuning, and generates appealing results.

Non-Convex Optimization for Neural Networks