Skip to main content

Explainable Neural Networks


Suitable for

MSc in Computer Science


Despite the increasing success of deep neural models, their general lack of interpretability is still a major drawback, carrying far-reaching consequences in safety-critical applications, such as healthcare and legal. Several directions of explaining neural models have recently been introduced, such as feature-based explanations and natural language explanations.  However, there are still several major open questions, such as:

Are explanations faithfully describing the decision-making processes of the models that they aim to explain?

Can explanations for the ground-truth label that are provided during training increase model robustness and generalization capabilities?

Can we do few-shot learning of natural language explanations? What are the advantages and disadvantages of each of the multiple types of explanations (e.g., feature-based, example-based, natural language, surrogate models)? The students will be able to pick one of these open questions or propose their own. The projects will also be co-supervised by Oana-Maria Camburu, a postdoctoral researcher with strong background and contributions in this area. "strong coding skills (preferably in deep learning platforms, such as Pytorch or Tensorflow), deep learning knowledge.

References: [1] [2] [3] [4] [5] "