Home

About

This course will provide a rigorous introduction to online/sequential decision-making with applications to various problems in machine learning and beyond, and introduce students to the multi-armed bandit problem and its variants. The emphasis of the course will be on fundamental algorithms and their detailed analysis. This mini-course will prepare students for the subsequent course on reinforcement learning. A list of topics that will be covered includes

The stochastic multi-armed bandit problem
The ε-greedy algorithm, the UCB algorithm
Lower bounds on the regret for bandit problems
Thompson Sampling (time-permitting)
Online-decision making and the experts problem
Non-stochastic multi-armed bandits (time-permitting)

Prerequisites

Machine learning is a mathematical discipline and it is necessary to have a good background in linear algebra, calculus, probability and algorithms. You should have taken courses on linear algebra, probability, statistics and multivariate calculus. If you need to brush up on these concepts please refer to the links provided on the resources tab.

(BC) S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. [link]
(CL) N. Cesa-Bianchi and G. Lugosi. Prediction, Learning and Games. Cambridge University Press 2006.
(LS) T. Lattimore and C. Szepesvári. Bandit Algorithms. [link]
(Sli) Alex Slivkins. Introduction to Multi-Armed Bandits. [link]

About

Prerequisites

Recommended Textbooks