Wed.2 14:15–15:30 | H 0110 | BIG
.

Recent Advances in Convex and Non-Convex Optimization and Their Applications in Machine Learning (1/2)

Chair: Qihang Lin Organizers: Tianbao Yang, Qihang Lin
14:15

Yiming Ying

Stochastic Optimization for AUC Maximization in Machine Learning

In this talk, I will present our recent work on developing novel SGD-type algorithms for AUC maximization. Our new algorithms can allow general loss functions and penalty terms which are achieved through the innovative interactions between machine learning and applied mathematics. Compared with the previous literature which requires high storage and per-iteration costs, our algorithms have both space and per-iteration costs of one datum while achieving optimal convergence rates.

14:40

Niao He

Sample Complexity of Conditional Stochastic Optimization

We study a class of compositional stochastic optimization involving conditional expectations, which finds a wide spectrum of applications, particularly in reinforcement learning and robust learning. We establish the sample complexities of a modified Sample Average Approximation (SAA) and a biased Stochastic Approximation (SA) for such problems, under various structural assumptions such as smoothness and quadratic growth conditions. Our analysis shows that both modified SAA and biased SA achieve the same complexity bound, and our experiments further demonstrate that these bounds are tight.

15:05

Javier Pena

A unified framework for Bregman proximal methods

We propose a unified framework for analyzing the convergence of Bregman proximal first-order algorithms for convex minimization. Our framework hinges on properties of the convex conjugate and gives novel proofs of the convergence rates of the Bregman proximal subgradient, Bregman proximal gradient, and a new accelerated Bregman proximal gradient algorithm under fairly general and mild assumptions. Our accelerated Bregman proximal gradient algorithm attains the best-known accelerated rate of convergence when suitable relative smoothness and triangle scaling assumptions hold.