Improve differentiable neural architecture search via marginal likelihood estimator

Supervisors

Yarin Gal

Clare Lyle

Robin Ru

Suitable for

MSc in Advanced Computer Science

Mathematics and Computer Science, Part C

Computer Science and Philosophy, Part C

Computer Science, Part C

Abstract

Neural architecture search (NAS) aims to automate the design process of good neural network architectures for a given task and has achieved impressive performance outperforming the human experts’ design on a variety of applications. One popular class of NAS approaches is differentiable ones (e.g. [1]), which apply a continuous relaxation of architecture-related variables, and alternate between updating the architecture parameters and network weights via respective gradients in a bi-level optimization. Differentiable NAS methods often enjoy low computational costs, thus are practically appealing. However, their search performance tends to be suboptimal because the metric (often mini-batch validation loss) adopted to assess an architecture’s quality during the training correlates poorly with the architecture’s true generalization performance. On the other hand, a recently proposed metric, sum-over-training-losses (SoTL), has been theoretically shown to approximate Bayesian marginal likelihood in the linear setting [2] and empirically, achieves a good correlation with generalization performance in the non-linear setting [2, 3]. In this project, we aim to develop a SoTL-based metric for updating the architecture parameters to improve the search quality of differentiable NAS methods. We would start with revisiting the theoretical connection between SoTL and marginal likelihood, and then investigate possible ways to incorporate SoTL into the differential NAS framework as well as compute its gradient for optimizing architecture parameters.

Prerequisites: Experience with deep neural networks and Bayesian inference, strong Python coding (preferably experience with PyTorch)

Improve differentiable neural architecture search via marginal likelihood estimator

Supervisors

Suitable for

Abstract

Our Students