We propose a new approach to stochastic first-order method for tackling empirical loss minimization problems such as those that arise in machine learning. We develop a theory and computationally simple way to construct a gradient estimate that is purposely biased. On the theory side, we show that the proposed algorithm is guaranteed to converges at a sublinear rate to the global optimum of the modified objective function. On the computational side, we present numerical experiments that confirm the usefulness of the new method compared with standard SGD in different settings.
joint work with Yi Xu, Qi Qi, Qihang Lin, Rong Jin
We propose new stochastic optimization algorithms and study their first-order convergence theories for solving a broad family of DC functions. We improve the existing algorithms and theories of stochastic optimization for DC functions from both practical and theoretical perspectives. Moreover, we extend the proposed stochastic algorithms for DC functions to solve problems with a general non-convex non-differentiable regularizer, which does not necessarily have a DC decomposition but enjoys an efficient proximal mapping.
joint work with Erfan Y. Hamedani, Afrooz Jalilzadeh, Uday Shanbhag
We consider computing a saddle point $(x^*,y^*)$ of convex-concave $L(x,y)=\sum_{n=1}^N \Phi_n (x,y)$. This problem class includes convex optimization with a large number of constraints. To contend with the challenges in computing full gradients, we employ a primal-dual scheme in which randomly selected primal and dual variable blocks, $x_i$ and $y_j$, are updated at every iteration. When $N$ is large, we can utilize an increasing batch-size of the gradients of $\Phi_n$, to achieve the optimal rate of $O(1/k)$ in terms of $L(x^k,y^*)-L(x^*,y^k)$. Preliminary results on QCQPs with many constraints look promising.