We show that factored gradient descent with spectral initialization converges linearly for noisy and 1-bit low-rank matrix completion in terms of both Frobenius and infinity norms, without the need of explicitly promoting incoherence via regularizations, at near-optimal sample and computational complexities. This is achieved by establishing an "implicit regularization" phenomenon of the algorithm via a leave-one-out perturbation argument.
joint work with Chi Jin, Michael Jordan
Minimax optimization, especially in its general nonconvex-nonconcave formulation, has found extensive applications in modern machine learning frameworks. While gradient descent ascent (GDA) is widely used in practice to solve these problems its theoretical behavior has been considered highly undesirable due to the possibility of convergence to non local-Nash equilibria. In this talk, we introduce the notion of local minimax as a more suitable alternative to the notion of local Nash equilibrium and show that up to some degenerate points, stable limit points of GDA are exactly local minimax.
joint work with Mikhail Belkin
I will talk about recent advances in understanding Stochastic Gradient Descent and its effectiveness in modern machine learning. I will discuss linear convergence of SGD for a broad class of convex as well as non-convex problems in the interpolated (over-parametrized) regime. I will also discuss some practical and theoretical implications pertaining to training neural networks.