Publications

Real Time Image Saliency for Black Box Classifiers

Real Time Image Saliency for Black Box Classifiers Link to this paper

In this work we develop a fast saliency detection method that can be applied to any differentiable image classifier. We train a masking model to manipulate the scores of the classifier by masking salient parts of the input image. Our model generalises well to unseen images and requires a single forward pass to perform saliency detection, therefore suitable for use in real-time systems. We test our approach on CIFAR-10 and ImageNet datasets and show that the produced saliency maps are easily interpretable, sharp, and free of artifacts. We suggest a new metric for saliency and test our method on the ImageNet object localisation task. We achieve results outperforming other weakly supervised methods.
Piotr Dabkowski, Yarin Gal
arXiv, 2017
[arXiv] [BibTex]
NIPS, 2017
[Paper] [BibTex]

Concrete Dropout

Concrete Dropout Link to this paper

Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary - a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout's discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed. We analyse the proposed variant extensively on a range of tasks, and give insights into common practice in the field where larger dropout probabilities are often used in deeper model layers.
Yarin Gal, Jiri Hron, Alex Kendall
arXiv, 2017
[arXiv] [Software] [BibTex]
NIPS, 2017
[Paper] [BibTex]

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?

What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? Link to this paper

There are two major types of uncertainty one can model. Aleatoric uncertainty captures noise inherent in the observations. On the other hand, epistemic uncertainty accounts for uncertainty in the model -- uncertainty which can be explained away given enough data. Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible. We study the benefits of modeling epistemic vs. aleatoric uncertainty in Bayesian deep learning models for vision tasks. For this we present a Bayesian deep learning framework combining input-dependent aleatoric uncertainty together with epistemic uncertainty. We study models under the framework with per-pixel semantic segmentation and depth regression tasks. Further, our explicit uncertainty formulation leads to new loss functions for these tasks, which can be interpreted as learned attenuation. This makes the loss more robust to noisy data, also giving new state-of-the-art results on segmentation and depth regression benchmarks.
Alex Kendall, Yarin Gal
arXiv, 2017
[arXiv] [BibTex]
NIPS, 2017
[Paper] [BibTex]

Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics

Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics Link to this paper

Numerous deep learning applications benefit from multi-task learning with multiple regression and classification objectives. In this paper we make the observation that the performance of such systems is strongly dependent on the relative weighting between each task's loss. Tuning these weights by hand is a difficult and expensive process, making multi-task learning prohibitive in practice. We propose a principled approach to multi-task deep learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. This allows us to simultaneously learn various quantities with different units or scales in both classification and regression settings. We demonstrate our model learning per-pixel depth regression, semantic and instance segmentation from a monocular input image. Perhaps surprisingly, we show our model can learn multi-task weightings and outperform separate models trained individually on each task.
Alex Kendall, Yarin Gal, Roberto Cipolla
In Submission, 2017
[arXiv] [Software] [BibTex]

Dropout Inference in Bayesian Neural Networks with Alpha-divergences

Dropout Inference in Bayesian Neural Networks with Alpha-divergences Link to this paper

To obtain uncertainty estimates with real-world Bayesian deep learning models, practical inference approximations are needed. Dropout variational inference (VI) for example has been used for machine vision and medical applications, but VI can severely underestimates model uncertainty. Alpha-divergences are alternative divergences to VI’s KL objective, which are able to avoid VI’s uncertainty underestimation. But these are hard to use in practice: existing techniques can only use Gaussian approximating distributions, and require existing models to be changed radically, thus are of limited use for practitioners. We propose a re-parametrisation of the alpha-divergence objectives, deriving a simple inference technique which, together with dropout, can be easily implemented with existing models by simply changing the loss of the model. We demonstrate improved uncertainty estimates and accuracy compared to VI in dropout networks. We study our model’s epistemic uncertainty far away from the data using adversarial images, showing that these can be distinguished from non-adversarial images by examining our model’s uncertainty.
Yingzhen Li, Yarin Gal
ICML, 2017
[Paper] [arXiv] [BibTex]

Deep Bayesian Active Learning with Image Data

Deep Bayesian Active Learning with Image Data Link to this paper

Even though active learning forms an important pillar of machine learning, deep learning tools are not prevalent within it. Relying on Bayesian approaches to deep learning, in this paper we combine recent advances in Bayesian deep learning into the active learning framework in a practical way. We develop an active learning framework for high dimensional data, a task which has been extremely challenging so far with very sparse existing literature, and demonstrate it in melanoma (skin cancer) diagnosis.
Yarin Gal, Riashat Islam, Zoubin Ghahramani
Bayesian Deep Learning workshop, NIPS, 2016
[PDF] [Poster] [Code] [BibTex]
ICML, 2017
[Paper] [arXiv] [BibTex]

Thesis: Uncertainty in Deep Learning

Thesis: Uncertainty in Deep Learning Link to this paper

So I finally submitted my PhD thesis. In it I organised the already published results on how to obtain uncertainty in deep learning, and collected lots of bits and pieces of new research I had lying around (which I hadn't had the time to publish yet).
Yarin Gal
PhD Thesis, 2016
[PDF] [Blog post] [BibTex]

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks Link to this paper

We present a new technique for recurrent neural network regularisation, relying on recent results at the intersection of Bayesian modelling and deep learning. Our RNN dropout variant is theoretically motivated and its effectiveness is demonstrated empirically, with the new approach improving on the single model state-of-the-art in language modelling with the Penn Treebank (73.4 test perplexity). This extends our arsenal of variational tools in deep learning.
Yarin Gal, Zoubin Ghahramani
arXiv, 2015
[arXiv] [Software] [BibTex]
Data-Efficient Machine Learning workshop, ICML, 2016
[Paper] [Poster]
NIPS, 2016
[Paper] [BibTex]

Improving PILCO with Bayesian Neural Network Dynamics Models

Improving PILCO with Bayesian Neural Network Dynamics Models Link to this paper

We attempt to answer PILCO's shortcomings by replacing its Gaussian process with a Bayesian deep dynamics model, while maintaining the framework’s probabilistic nature and its data-efficiency benefits. This task poses several interesting difficulties. First, we have to handle small data, and neural networks are notoriously known for their tendency to overfit. Furthermore, we must retain PILCO's ability to capture 1) dynamics model output uncertainty and 2) input uncertainty.
Yarin Gal, Rowan Mcallister and Carl E. Rasmussen
Data-Efficient Machine Learning workshop, ICML, 2016
[Paper] [Abstract] [Poster] [BibTex]

On Modern Deep Learning and Variational Inference

On Modern Deep Learning and Variational Inference Link to this paper

Bayesian modelling and variational inference are rooted in Bayesian statistics, and easily benefit from the vast literature in the field. In contrast, deep learning lacks a solid mathematical grounding. Instead, empirical developments in deep learning are often justified by metaphors, evading the unexplained principles at play. In this paper we extend previous results casting modern deep learning models as performing approximate variational inference in a Bayesian setting, and survey open problems to research.
Yarin Gal, Zoubin Ghahramani
Advances in Approximate Bayesian Inference workshop, NIPS, 2015
[PDF] [Poster] [BibTex]
We thank the workshop organisers for the travel award.

Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference

Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference Link to this paper

Perhaps ironically, the deep learning community is far closer to our vision of ``automated modelling'' than the probabilistic modelling community. Many complex models in deep learning can be easily implemented and tested, while variational inference (VI) techniques require specialised knowledge and long development cycles, making them extremely challenging for non-experts. We discuss a possible solution lifted from manufacturing. Similar ideas in deep learning have led to rapid development in model complexity, speeding up the innovation cycle.
Yarin Gal
Advances in Approximate Bayesian Inference workshop, NIPS, 2015
[PDF] [Poster] [BibTex]
We thank the workshop organisers for the travel award.

Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference

Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference Link to this paper

We present an efficient Bayesian convolutional neural network (convnet). The model offers better robustness to over-fitting on small data and achieves a considerable improvement in classification accuracy compared to previous approaches. We give state-of-the-art results on CIFAR-10 following our insights.
Yarin Gal, Zoubin Ghahramani
arXiv, 2015
[arXiv] [Software] [BibTex]
ICLR workshop, 2016
[CMT Reviews] [OpenReview] [BibTex]

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Link to this paper

We show that dropout in multilayer perceptron models (MLPs) can be interpreted as a Bayesian approximation. Results are obtained for modelling uncertainty for dropout MLP models - extracting information that has been thrown away so far, from existing models. This mitigates the problem of representing uncertainty in deep learning without sacrificing computational performance or test accuracy.
Yarin Gal, Zoubin Ghahramani
arXiv, 2015
[arXiv] [BibTex] [Appendix] [BibTex] [Software]
Invited for presentation at the First Deep Learning Symposium at NIPS 2015.
ICML, 2016
[Paper] [Presentation] [Poster] [BibTex]
We thank ICML for the travel award.

Dropout as a Bayesian Approximation: Insights and Applications

Dropout as a Bayesian Approximation: Insights and Applications Link to this paper

Deep learning techniques lack the ability to reason about uncertainty over the features. We show that a multilayer perceptron (MLP) with arbitrary depth and non-linearities, with dropout applied after every weight layer, is mathematically equivalent to an approximation to a well known Bayesian model. This paper is a short version of the appendix of "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning".
Yarin Gal, Zoubin Ghahramani
Deep Learning Workshop, ICML, 2015
[PDF] [Poster] [BibTex]

An Infinite Product of Sparse Chinese Restaurant Processes

An Infinite Product of Sparse Chinese Restaurant Processes Link to this paper

We define a new process that gives a natural generalisation of the Indian buffet process (used for binary feature allocation) into categorical latent features. For this we take advantage of different limit parametrisations of the Dirichlet process and its generalisation the Pitman–Yor process.
Yarin Gal, Tomoharu Iwata, Zoubin Ghahramani
10th Conference on Bayesian Nonparametrics (BNP), 2015
[Presentation] [BibTex]
We thank BNP for the travel award.

Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs

Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs Link to this paper

Standard sparse pseudo-input approximations to the Gaussian process (GP) cannot handle complex functions well. Sparse spectrum alternatives attempt to answer this but are known to over-fit. We use variational inference for the sparse spectrum approximation to avoid both issues. We extend the approximate inference to the distributed and stochastic domains.
Yarin Gal, Richard Turner
ICML, 2015
[PDF] [Presentation] [Poster] [Software] [BibTex]

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Link to this paper

Multivariate categorical data occur in many applications of machine learning. One of the main difficulties with these vectors of categorical variables is sparsity. The number of possible observations grows exponentially with vector length, but dataset diversity might be poor in comparison. Recent models have gained significant improvement in supervised tasks with this data. These models embed observations in a continuous space to capture similarities between them. Building on these ideas we propose a Bayesian model for the unsupervised task of distribution estimation of multivariate categorical data.
Yarin Gal, Yutian Chen, Zoubin Ghahramani
Workshop on Advances in Variational Inference, NIPS, 2014
[PDF] [Poster] [Presentation] [BibTex]
ICML, 2015
[PDF] [Presentation] [Poster] [Software] [BibTex]
We thank Google DeepMind for the travel award.

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models Link to this paper

We develop parallel inference for sparse Gaussian process regression and latent variable models. These processes are used to model functions in a principled way and for non-linear dimensionality reduction in linear time complexity. Using parallel inference we allow the models to work on much larger datasets than before.
Yarin Gal, Mark van der Wilk, Carl E. Rasmussen
Workshop on New Learning Models and Frameworks for Big Data, ICML, 2014
[arXiv] [Presentation] [Software] [BibTex]
NIPS, 2014
[PDF] [BibTex]
We thank NIPS for the travel award.

Feature Partitions and Multi-View Clusterings

Feature Partitions and Multi-View Clusterings Link to this paper

We define a new combinatorial structure that unifies Kingman's random partitions and Broderick, Pitman, and Jordan's feature frequency models. This structure underlies non-parametric multi-view clustering models, where data points are simultaneously clustered into different possible clusterings. The de Finetti measure is a product of paintbox constructions. Studying the properties of feature partitions allows us to understand the relations between the models they underlie and share algorithmic insights between them.
Yarin Gal, Zoubin Ghahramani
International Society for Bayesian Analysis (ISBA), 2014
[Link] [Poster]
We thank ISBA for the travel award.

Dirichlet Fragmentation Processes

Dirichlet Fragmentation Processes Link to this paper

We introduce a new class of models over trees based on the theory of fragmentation processes. The Dirichlet Fragmentation Process Mixture Model is an example model derived from this new class. This model has efficient and simple inference, and significantly outperforms existing approaches for hierarchical clustering and density modelling.
Hong Ge, Yarin Gal, Zoubin Ghahramani
In submission, 2014
[PDF] [BibTex]

Pitfalls in the use of Parallel Inference for the Dirichlet Process

Pitfalls in the use of Parallel Inference for the Dirichlet Process Link to this paper

We show that the recently suggested parallel inference for the Dirichlet process is conceptually invalid. The Dirichlet process is important for many fields such as natural language processing. However the suggested inference would not work in most real-world applications.
Yarin Gal, Zoubin Ghahramani
Workshop on Big Learning, NIPS, 2013
[PDF] [Presentation] [BibTex]
ICML, 2014
[PDF] [Talk] [Presentation] [Poster] [BibTex]

Variational Inference in the Gaussian Process Latent Variable Model and Sparse GP Regression – a Gentle Tutorial

Variational Inference in the Gaussian Process Latent Variable Model and Sparse GP Regression – a Gentle Tutorial Link to this paper

We present an in-depth and self-contained tutorial for sparse Gaussian Process (GP) regression. We also explain GP latent variable models, a tool for non-linear dimensionality reduction. The sparse approximation reduces the time complexity of the models from cubic to linear but its development is scattered across the literature. The various results are collected here.
Yarin Gal, Mark van der Wilk
Tutorial, 2014
[arXiv] [BibTex]

Semantics, Modelling, and the Problem of Representation of Meaning – a Brief Survey of Recent Literature

Semantics, Modelling, and the Problem of Representation of Meaning – a Brief Survey of Recent Literature Link to this paper

Over the past 50 years many have debated what representation should be used to capture the meaning of natural language utterances. Recently new needs of such representations have been raised in research. Here I survey some of the interesting representations suggested to answer for these new needs.
Yarin Gal
Literature survey, 2013
[arXiv] [BibTex]

A Systematic Bayesian Treatment of the IBM Alignment Models

A Systematic Bayesian Treatment of the IBM Alignment Models Link to this paper

We used a non-parametric process — the hierarchical Pitman–Yor process — in models that align words between pairs of sentences. These alignment models are used at the core of all machine translation systems. We obtained a significant improvement in translation using the process.

Yarin Gal, Phil Blunsom
Association for Computational Linguistics (NA-ACL), 2013
[PDF] [Presentation] [BibTex]

Relaxing HMM Alignment Model Assumptions for Machine Translation Using a Bayesian Approach

Relaxing HMM Alignment Model Assumptions for Machine Translation Using a Bayesian Approach Link to this paper

We used a non-parametric process — the hierarchical Pitman–Yor process — to relax some of the restricting assumptions often used in machine translation. When a long history of word alignments is not available the process falls-back onto shorter histories in a principled way.
Yarin Gal
Master's Dissertation, 2012
[PDF] [BibTex]

Overcoming Alpha-Beta Limitations Using Evolved Artificial Neural Networks

Overcoming Alpha-Beta Limitations Using Evolved Artificial Neural Networks Link to this paper

We trained a feed-forward neural network to play checkers. The network acts as both the value function for a min-max algorithm and a heuristic for pruning tree branches in a reinforcement learning setting. We used no supervised signal for training - a set of networks was assessed by playing against each-other and the winning networks' weights were adapted following the ES algorithm.
Yarin Gal, Mireille Avigal
Machine Learning and Applications (IEEE), 2010
[Paper] [BibTex]

Contact me

Email

yarin@cs.ox.ac.uk

Post

Computer Science Department
University of Oxford
Oxford, OX1 3QD
United Kingdom