Mon.1 11:00–12:15 | H 0104 | BIG
.

Optimization for Data Science (1/2)

Chair: Jong-Shi Pang Organizers: Ying Cui, Jong-Shi Pang
11:00

Jong-Shi Pang

joint work with Ying Cui, Ziyu He

Multi-composite Nonconvex Optimization for Learning Deep Neural Network

We present in this paper a novel deterministic algorithmic framework that enables the computation of a directional stationary solution of the empirical deep neural network training problem formulated as a multi-composite optimization problem with coupled nonconvexity and nondifferentiability. This is the first time to our knowledge that such a sharp kind of stationary solutions is provably computable for a nonsmooth deep neural network. Numerical results from a matlab implementation demonstrate the effectiveness of the framework for solving reasonably sized networks.

11:25

Defeng Sun

A majorized proximal point dual Newton algorithm for nonconvex statistical optimization problems

We consider high-dimensional nonconvex statistical optimization problems such as the regularized square-root Lasso problem, where in the objective the loss terms are possibly nonsmooth or non-Lipschitzian and the regularization terms are the difference of convex functions. We introduce a majorized proximal point dual Newton algorithm (mPPDNA) for solving these large scale problems. We construct proper majorized proximal terms that are conducive to the design of an efficient dual based semismooth Newton method of low computational complexities. Extensive numerical results are provided.

11:50

Gesualdo Scutari

joint work with Ying Sun, Ye Tian, Guang Cheng

Distributed inference over networks: geometrically convergent algorithms and statistical guarantees

We study distributed high-dimensional M-estimation problems over networks, modeled as arbitrary graphs (with no centralized node). We design the first class of distributed algorithms with convergence guarantees for such a class of Empirical Risk Minimization (ERM) problems. We prove that, under mild technical assumptions, the proposed algorithm converges at a linear rate to a solution of the ERM that is statistically optimal. Furthermore, for a fixed network topology, such a linear rate is invariant when the problem dimension scales properly with respect to the sample size.