Learning compatible invariant representations in mixture of experts

Supervisors

Suitable for

Abstract

Background:

Factored deep mixtures of experts (FDM) can yield a potentially exponential number of network configurations. However, symmetries in the networks’ architectures – such as permuting the weights – can yield features incompatible with later layers. By incorporating symmetry breaking or compatibility blocks that rotate features between layers one can overcome this method’s limitations.

Focus:

Research questions – how can one get factored deep mixtures of experts to work, specifically regarding training.

Expected contribution – developing methods for better training FDMs by accommodating symmetries in the representations. Mostly empirical, consists of implementing and testing simple architectures, with a potential for formalising the approach depending on how independent the student is.

Method:

The student would build upon https://arxiv.org/pdf/1312.4314.pdf, with input from recent works exploring symmetries in neural architectures such as https://arxiv.org/pdf/2301.12780.pdf, https://arxiv.org/pdf/2106.07682.pdf, and https://arxiv.org/pdf/2209.04836.pdf.

Goals:

Better training FDMs by addressing their shortcomings.
Empirical evidence of their utility in larger datasets and models.

Pre-requisites: a deep learning course with an additional applied one (eg CV/RL/NLP/GDL/etc).

Learning compatible invariant representations in mixture of experts

Supervisors

Suitable for

Abstract

Student Space