Mon.2 13:45–15:00 | H 0104 | BIG
.

The Interface of Generalization and Optimization in Machine Learning (1/2)

Chair: Mahdi Soltanolkotabi Organizers: Benjamin Recht, Mahdi Soltanolkotabi
13:45

Benjamin Recht

Training on the Test Set and Other Heresies

Conventional wisdom in machine learning taboos training on the test set, interpolating the training data, and optimizing to high precision. This talk will present evidence demonstrating that this conventional wisdom is wrong. I will additionally highlight commonly overlooked phenomena imperil the reliability of current learning systems: surprising sensitivity to how data is generated and significant diminishing returns in model accuracies given increased compute resources. New best practices to mitigate these effects are critical for truly robust and reliable machine learning.

14:10

[canceled] Suriya Gunasekar

Characterizing optimization bias in terms of optimization geometry

In the modern practice of machine learning, especially deep learning, many successful models have far more trainable parameters compared to the number of training examples. Consequently, the optimization objective for such models have multiple minimizers that perfectly fit the training data, but most such minimizers will simply overfit or memorize the training data and will perform poorly on new examples. When minimizing the training objective for such ill posed problems, the implicit inductive bias from optimization algorithms like (S)GD plays a crucial role in learning. In this talk, I will specifically focus on how the characterization of this optimization bias depends on the update geometry of two families of local search algorithms - mirror descent w.r.t. strongly convex potentials and steepest descent w.r.t general norms.

14:35

Tengyuan Liang

New Thoughts on Adaptivity, Generalization and Interpolation

In the absence of explicit regularization, Neural Networks (NN) and Kernel Ridgeless Regression (KRR) have the potential to fit the training data perfectly. It has been observed empirically, however, that such interpolated solutions can still generalize well on test data. We show training NN with gradient flow learns an adaptive RKHS representation, and performs the global least squares projection onto the adaptive RKHS, simultaneously. We then isolate a phenomenon of implicit regularization for minimum-norm interpolated solutions for KRR which could generalize well in the high dim setting.