Advanced Topics in Machine Learning: 20192020
Lecturers  
Degrees  Schedule C1 (CS&P) — Computer Science and Philosophy Schedule C1 — Computer Science 
Term  Hilary Term 2020 
Overview
This is an advanced course on machine learning, focusing on recent advances in deep learning with neural networks and on Bayesian approaches to machine learning. The course will also have a significant component on natural language processing (NLP) applications, including analysing latent dimensions in text, translating between languages, and answering questions.. Recent statistical techniques, particularly those based on neural networks, have achieved a remarkable progress in these fields, leading to a great deal of commercial and academic interest. The course will introduce the definitions of the relevant machine learning models, discuss their mathematical underpinnings, and demonstrate ways to effectively numerically train them.The coursework will be based on the reproduction/extension of a recent machine learning paper, with students working in teams to accomplish this. Each team will tackle a separate paper, with available topics including gradientbased Bayesian inference methods, deep generative models, and NLP applications.
The lectures for this course are not going to be recorded in Hilary Term 2020.
PLEASE NOTE: We will be happy to accept attendees to the lectures if there is space in the lecture theatre. However, we will not be permitting allow anyone not taking the course for credit to attend the practicals or undertake the assignment as we do not have the resources to support this. Therefore, please feel free to come to the lectures as a listener, although if the classroom ends up being overcrowded, we may have to contact you again asking not to join in person.
Learning outcomes
After studying this course, students will:
 Have knowledge of the different paradigms for performing machine learning and appreciate when different approaches will be more or less appropriate.
 Understand the definition of a range of neural network models.
 Be able to derive and implement optimisation algorithms for these models.
 Understand neural implementations of attention mechanisms and sequence embedding models and how these modular components can be combined to build state¬of¬the¬art NLP systems.
 Be able to implement and evaluate common neural network models for language.
 Understand the foundations of the Bayesian approach to machine learning.
 Be able to construct Bayesian models for data and apply computational techniques to draw inferences from them.
 Have an understanding of how to choose a model to describe a particular type of data.
 Know how to evaluate a learned model in practice.
 Understand the mathematics necessary for constructing novel machine learning solutions.
 Be able to design and implement various machine learning algorithms in a range of realworld applications.
Prerequisites
Required background knowledge includes probability theory, linear algebra, continuous mathematics, multivariate calculus and multivariate probability theory, as well as good programming skills. Students are required to have taken the Machine Learning course. The programming environment used in the lecture examples and practicals will be Python/TensorFlow.
Synopsis
Note that lecture numbers correspond to hour slots. Thus, for example, the 2hour Friday lecture will comprise of Lectures 2 and 3.
Bayesian Machine Learning Lectures 16  Dr Tom Rainforth. Course notes are available here. Lectures:
 Lecture 1  (Week 1  Wednesday 22 January 12:00  13:00) Machine Learning Paradigms: After giving an overview of the course, we will discuss different types of machine learning approaches, delineating between supervised and unsupervised learning, and between discriminative and generative approaches. We will introduce the Bayesian paradigm and show why it is an important part of the machine learning arsenal.
 Lecture 2  (Week 1  Friday 24 January 11:00  12:00) Bayesian Modelling (1): We will discuss the basic assumptions and processes of constructing a Bayesian model and introduce some common examples. After providing insights to how Bayesian models work, we will delve into what makes a good model and how we can compare between models, before finishing with the concept of Bayesian model averaging.
 Lecture 3  (Week 1  Friday 24 January 12:00  13:00) Bayesian Modelling (2): After establishing the importance of dependency relationships in Bayesian models, we will introduce some of the key methods for constructing and reasoning about generative models. Namely, we will introduce graphical models and probabilistic programming.
 Lecture 4  (Week 2  Wednesday 29 January 12:00  13:00) Bayesian Inference (1): We will discuss approaches for estimating Bayesian posteriors, marginal likelihoods, and expectations. We will introduce Monte Carlo sampling along with some basic Monte Carlo inference approaches like importance sampling.
 Lecture 5  (Week 2  Friday 31 January 11:00  12:00) Bayesian Inference (2): We will introduce more advanced and scalable inference approaches, namely Markov chain Monte Carlo (MCMC) sampling and variational inference.
 Lecture 6  (Week 2  Friday 31 January 12:00  13:00) Variational AutoEncoders: We will combine a number of ideas from the previous lectures to introduce variational autoencoders and show how they can be used to learn deep generative models from data.
Guest Lectures: Automatic Differentiation Lectures 78  Dr. Atılım Güneş Baydin
 Lecture 7  (Week 3  Wednesday 5 February 12:00  13:00, note change of time and day)
 Lecture 8  (Week 4  Wednesday 12 February 12:00  13:00, note change of time and day)
Natural Language Processing Lectures 916  Dr Alejo NevadoHolgado:
 Lecture 9 (video)  (Week 4  Friday 14 February 11:00  12:00) Intro and embeddings 1. We first explain what is the challenge that Natural Language Processing (NLP) is attempting to solve, why it is hard, and why every step towards solving it is extremely useful for industry and research. Then we show how the meaning of words can be represented into multidimensional vectors called embeddings.
 Lecture 10 (video)  (Week 4  Friday 14 February 12:00  13:00) Embeddings 2. We describe the different standard methods used to create embeddings, the disadvantages and advantages of each, and currently open (and fast processing!) lines of research that attempt at further improving them.
 Lecture 11 (video)  (Week 5  Friday 21 February 11:00  12:00) Classification and neural networks. We first present the classification task as one of the core tasks of machine learning, and how the tasks arises often in NLP problems. We then describe how general neural networks (NNs) are a very versatile and general mechanism to solve this task.
 Lecture 12 (video)  (Week 5  Friday 21 February 12:00  13:00) Language models and vanilla RNNs. We now present another typical NLP task called 'language modelling', which consists on capturing the probabilities of all possible patterns of speech. Solving this tasks can assist on many other NLP problems. We then describe how the simpler 'vanilla' RNNs partially solve this problem.
 Lecture 13 (video)  (Week 6  Friday 28 February 11:00  12:00) Vanishing gradients and fancy RNNs. We present the vanishing gradients phenomenon, which is one of the core technical difficulties that kept deep NNs from succeeding in the past. Then we show how more modern complex RNNs and some extra tricks mostly solve this problem.
 Lecture 14 (video)  (Week 6  Friday 28 February 12:00  13:00) Machine Translation, Seq2seq, and Attention. We now present another typical NLP task called 'machine translation', and how the socalled seq2seq architectures tackle it. We further show an architectural concept called 'attention' which greatly improves performance in NLP and general NNs.
 Lecture 15  (Week 7  Friday 6 March 11:00  12:00) Question answering, conference resolution and CNNs. We present the final two typical NLP tasks of this course, called 'question answering' and 'conference resolution'. We then present the convolutional neural network (CNN) in the framework of NLP, and the situations where it might be advantageous.
 Lecture 16  (Week 7  Friday 6 March 12:00  13:00) Transformers. We finally present the transformer model, which is a specialised architecture module that has greatly improved the performance of NNs across NLP tasks
 Lectures 17 and 18  Alex Davies, DeepMind
Location of the lectures: Lecture Theatre A in the basement of computer science Wolfson building. If you don't know the layout of the building, Reception (which is on the corner of Keeble Road and Parks Road) should be able to guide you how to find Lecture Theatre A
Assignment Papers based on Bayesian Machine Learning (each group chooses 1):
 Li, Y., & Turner, R. E. (2016). Rényi divergence variational inference. In Advances in Neural Information Processing Systems (pp. 10731081). https://arxiv.org/abs/1602.02311
 Rainforth, T., Kosiorek, A., Le, T. A., Maddison, C., Igl, M., Wood, F., & Teh, Y. W. (2018, July). Tighter Variational Bounds are Not Necessarily Better. In International Conference on Machine Learning (pp. 42774285). https://arxiv.org/abs/1802.04537
 Kucukelbir, A., Ranganath, R., Gelman, A., & Blei, D. (2015). Automatic variational inference in Stan. In Advances in neural information processing systems (pp. 568576). https://arxiv.org/abs/1506.03431
 Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M. (2017). Deep probabilistic programming. International Conference on Learning Representations. https://arxiv.org/abs/1701.03757
 Cremer, C., Li, X., & Duvenaud, D. (2018, July). Inference Suboptimality in Variational Autoencoders. In International Conference on Machine Learning (pp. 10861094). https://arxiv.org/abs/1801.03558
 Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., ... & Lerchner, A. (2016). betaVAE: Learning Basic Visual Concepts with a Constrained Variational Framework. International Conference on Learning Representations. https://openreview.net/forum?id=Sy2fzU9gl
 Chen, T., Fox, E., & Guestrin, C. (2014, January). Stochastic gradient hamiltonian monte carlo. In International conference on machine learning (pp. 16831691). https://arxiv.org/abs/1402.4102
Assignment Papers based on Natural Language Processing:
 Bahdanau, Cho, and Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate”. arXiv. 2014. https://arxiv.org/abs/1409.0473
 Kalchbrenner, Espeholt, Simonyan, van den Oord, Graves, and Kavukcuoglu. arXiv. 2016. “Neural Machine Translation in Linear Time” https://arxiv.org/abs/1610.10099
 Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, and Polosukhin. NIPS. 2017. “Attention Is All You Need” https://arxiv.org/abs/1706.03762
 Xiong, Zhong, and Socher. “Dynamic Coattention Networks For Question Answering”. ICLR. 2017. https://arxiv.org/abs/1611.01604
 Clark and Manning. “Improving Coreference Resolution by Learning EntityLevel Distributed Representations”. ACL. 2016.
https://arxiv.org/abs/1606.01323  McCann, Bradbury, Xiong, and Socher. “Learned in Translation: Contextualized Word Vectors”. NIPS. 2017. https://arxiv.org/abs/1708.00107
Syllabus
Mathematics of machine learning. Overview of supervised, unsupervised, and multitask techniques. The Bayesian paradigm and its use in machine learning. Advanced machine learning topics: generative models, Bayesian inference, Monte Carlo methods, variational inference, probabilistic programming, model selection and learning, amortized inference, deep generative models, variational autoencoders. Applications of machine learning in natural language processing: recurrent neural networks, backpropagation through time, long short term memory, attention networks, memory networks, neural Turing machines, machine translation, question answering, speech recognition, syntactic and semantic parsing, GPU optimisation for neural networks.
Reading list
 Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press 2012

Christopher M. Bishop. Pattern recognition and machine learning. 2006.
 Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press 2016
Related research
Themes 