joint work with Daniel Robinson, Rene Vidal
A promising direction to study accelerated methods has been emerging through connections with continuous dynamical systems. However, it is unclear whether main properties of the dynamical system is preserved by such algorithms. We will show that the classical momentum method is a conformal symplectic integrator, i.e. it preserves the symplectic structure of the dynamical system. On the other hand, Nesterov's accelerated gradient is not a structure preserving discretization. We will comment on generalizations of these methods and introduce a symplectic alternative to Nesterov's method.
joint work with Manolis Tsakiris, Daniel Robinson, Rene Vidal
We consider the minimization of a general nonsmooth function over the unit sphere. We show that if the objective satisfies a certain Riemannian regularity condition relative to a set B, a Riemannian subgradient method with an appropriate initialization and geometrically diminishing step size converges at a linear rate to the set B. We also show that this regularity condition holds for orthogonal dictionary learning (ODL) and Dual Principal Component Pursuit (DPCP) under suitable choices of B. This leads to important improvements upon the known convergence results for both ODL and DPCP.
joint work with Alexander Gasnikov, Vladislav Elsukov
Alternating minimization (AM) optimization algorithms have known for a long time. These algorithms assume that the decision variable is divided into several blocks and minimization in each block can be done explicitly. AM algorithms have a number of applications in machine learning problems. In this paper, we are mostly motivated by optimal transport applications, which have become important for the machine learning community. The ubiquitous Sinkhorn algorithm can be seen as an alternating minimization algorithm for the dual to the entropy-regularized optimal transport problem. Sublinear $1/k$ convergence rate was proved for AM algorithm. Despite the same convergence rate as for the gradient descent method, AM-algorithms typically converge faster in practice as they are free of the choice of the step-size and are adaptive to the local smoothness of the problem. At the same time, there are accelerated gradient methods which use a momentum term to have a faster convergence rate of $1/k^2$. We introduce an accelerated alternating minimization method with $1/k^2$ convergence rate, where $k$ is the iteration counter. We also show that the proposed method is primal-dual, meaning that if we apply it to a dual problem, we can reconstruct the solution of the primal problem with the same convergence rate. We apply our method to the entropy regularized optimal transport problem and show experimentally, that it outperforms Sinkhorn's algorithm.