We consider the problem of learning a concave function defined on a compact set, using linear combinations of “bump-like” components. The resulting empirical risk minimization problem is highly non-convex. We prove that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions. Also, the behavior of this gradient flow can be described by a viscous porous medium equation. Using Displacement Convexity property, we get exponential dimension-free convergence rate for gradient descent.
An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems. We show that this property holds for any suitably smooth diffusion and that different diffusions are suitable for optimizing different classes of convex and non-convex functions. This allows us to design diffusions suitable for globally optimizing convex and non-convex functions not covered by the existing Langevin theory. Our non-asymptotic analysis delivers computable optimization and integration error bounds.
joint work with Murat Erdogdu, Asuman Ozdaglar, Pablo Parrilo
Semidefinite programming (SDP) with diagonal constraints arise in many optimization and machine learning problems, such as Max-Cut, community detection and group synchronization. Although SDPs can be solved to arbitrary precision in polynomial time, generic convex solvers do not scale well with the dimension of the problem. In order to address this issue, Burer and Monteiro \cite{burer2003nonlinear} proposed to reduce the dimension of the problem by appealing to a low-rank factorization, and solve the subsequent non-convex problem instead. In this paper, we present coordinate ascent based methods to solve this non-convex problem with provable convergence guarantees. In particular, we prove that the block-coordinate maximization algorithm applied to the non-convex Burer-Monteiro approach globally converges to a first-order stationary point with a sublinear rate without any assumptions on the problem. We also show that the block-coordinate maximization algorithm is locally linearly convergent to a local maximum under local weak strong concavity assumption. We establish that this assumption generically holds when the rank of the factorization is sufficiently large. Furthermore, we propose an algorithm based on the block-coordinate maximization and Lanczos methods that is guaranteed to return a solution that provide $1-\O{1/r}$ approximation to the original SDP, where $r$ is the rank of the factorization, and we quantify the number of iterations to obtain such a solution. This approximation ratio is known to be optimal under the unique games conjecture.