This course will provide a rigorous introduction to online/sequential decision-making with applications to various problems in machine learning and beyond, and introduce students to the multi-armed bandit problem and its variants. The emphasis of the course will be on fundamental algorithms and their detailed analysis. This mini-course will prepare students for the subsequent course on reinforcement learning. A list of topics that will be covered includes

  • The stochastic multi-armed bandit problem
  • The ε-greedy algorithm, the UCB algorithm
  • Lower bounds on the regret for bandit problems
  • Thompson Sampling (time-permitting)
  • Online-decision making and the experts problem
  • Non-stochastic multi-armed bandits (time-permitting)


Machine learning is a mathematical discipline and it is necessary to have a good background in linear algebra, calculus, probability and algorithms. You should have taken courses on linear algebra, probability, statistics and multivariate calculus. If you need to brush up on these concepts please refer to the links provided on the resources tab.

Recommended Textbooks

There is no specific textbook recommended for this course. However, having access to at least one the following will be helpful.

  • (BC) S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. [link]
  • (CL) N. Cesa-Bianchi and G. Lugosi. Prediction, Learning and Games. Cambridge University Press 2006.
  • (LS) T. Lattimore and C. Szepesvári. Bandit Algorithms. [link]
  • (Sli) Alex Slivkins. Introduction to Multi-Armed Bandits. [link]