Skip to main content

Hitting two birds with one stone: the optimizer for learning both weights and hyperparameters

Supervisors

Suitable for

MSc in Advanced Computer Science

Abstract

The performance of deep learning models depends heavily on their training hyperparameters (e.g. learning rate, weight decay). Tuning hyperparameters often require laborious trial and error and/or strong expertise. Although query-based search strategies like Bayesian optimization or evolutionary algorithms have been used to automate the hyperparameter tuning, these methods often require training the network from scratch at each query and a decent number of expensive queries are needed to achieve satisfactory performance. In this project, we aim to develop an optimizer which learns the hyperparameter and weights of deep neural networks in an online fashion during a single round of training. Specifically, we would draw inspiration from Metropolis-Hasting algorithms to evolve a good subset of hyperparameter samples, modify the conventional training procedure without introducing much overheads and study various performance estimators for efficiently assessing the hyperparameter performance with minimum additional training.

Prerequisites Experience with deep neural network optimization and Bayesian inference methods, strong Python coding (preferably experience with PyTorch)