Thu.1 09:00–10:15 | H 0104 | NON
.

Recent Advances of First-Order Methods for Nonlinear Problems (1/2)

Chair: Shiqian Ma Organizers: Yangyang Xu, Shiqian Ma
9:00

Haihao Lu

Q-Stochastic Gradient Descent: a New Stochastic First-Order Method for Ordered Empirical Loss Minimization

We propose a new approach to stochastic first-order method for tackling empirical loss minimization problems such as those that arise in machine learning. We develop a theory and computationally simple way to construct a gradient estimate that is purposely biased. On the theory side, we show that the proposed algorithm is guaranteed to converges at a sublinear rate to the global optimum of the modified objective function. On the computational side, we present numerical experiments that confirm the usefulness of the new method compared with standard SGD in different settings.

9:25

Tianbao Yang

joint work with Yi Xu, Qi Qi, Qihang Lin, Rong Jin

Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence

We propose new stochastic optimization algorithms and study their first-order convergence theories for solving a broad family of DC functions. We improve the existing algorithms and theories of stochastic optimization for DC functions from both practical and theoretical perspectives. Moreover, we extend the proposed stochastic algorithms for DC functions to solve problems with a general non-convex non-differentiable regularizer, which does not necessarily have a DC decomposition but enjoys an efficient proximal mapping.

9:50

N. Serhat Aybat

joint work with Erfan Y. Hamedani, Afrooz Jalilzadeh, Uday Shanbhag

A Doubly-Randomized Block-Coordinate Primal-Dual Method for Large-scale Convex-Concave Saddle Point Problems: Acceleration via Variance-Reduction

We consider computing a saddle point $(x^*,y^*)$ of convex-concave $L(x,y)=\sum_{n=1}^N \Phi_n (x,y)$. This problem class includes convex optimization with a large number of constraints. To contend with the challenges in computing full gradients, we employ a primal-dual scheme in which randomly selected primal and dual variable blocks, $x_i$ and $y_j$, are updated at every iteration. When $N$ is large, we can utilize an increasing batch-size of the gradients of $\Phi_n$, to achieve the optimal rate of $O(1/k)$ in terms of $L(x^k,y^*)-L(x^*,y^k)$. Preliminary results on QCQPs with many constraints look promising.