Skip to main content

An Empirical Investigation of Training Speed and Generalization "

Supervisors

Suitable for

MSc in Advanced Computer Science

Abstract

Predicting and understanding generalization in deep neural networks remains an important and open problem in the field. Recent work suggests that it’s possible to leverage properties of a neural network’s optimization trajectory both to predict generalization performance and to construct generalization bounds. This line of inquiry also promises to provide insight into the connection between training speed and generalization. The focus of this project will be to investigate an estimator of generalization error based on properties of minibatch stochastic gradient descent updates. This may lead to new generalization bounds based on optimization trajectories, or to a principled early stopping criterion for stochastic gradient descent depending on the interests of the student.

Prerequisites strong background in Python (ideally experience with a deep learning framework such as TensorFlow/PyTorch/Jax), strong background in probability and familiarity with learning theory