University of Oxford Logo University of OxfordSoftware Engineering - Home
On Facebook
Facebook
Follow us on twitter
Twitter
Linked in
Linked in
Google plus
Google plus
Digg
Digg
Pinterest
Pinterest
Stumble Upon
Stumble Upon
CML

Classical Machine Learning

Machine learning is an increasingly important aspect of software engineering as ML- based products and systems are entering into widespread use across industries and sectors.

Establishing a broad introductory understanding of ML techniques is important for several reasons. There are many cases in which classical ML techniques are still preferable to Deep Neural Network (DNN) based methods. In some cases, classical ML techniques still either outperform, or perform on par with DNNs. The more common reason classical methods are often seen as preferable is that they are easier to interpret and validate, which is critical for many kinds of important applications, including safety-critical systems and those supporting high-stakes decisions.  

This course will introduce the core mathematical language used in this domain to the extent that it is necessary to approach the supporting resources. However, the course does not require a deep mathematical background as a pre-requisite nor will there be a significant mathematical treatment of algorithms or the properties of classifiers etc.

Course dates

9th September 2024Oxford University Department of Computer Science - Held in the Department 0 places remaining.
27th January 2025Oxford University Department of Computer Science - Held in the Department 0 places remaining.

Objectives

The CML has course has three fundamental goals:

  1. to ease students new to machine learning to a curated selection of simple methods that both represent a diversity of fundamental approaches which are in common use today.
  2. To give students hands-on experience with the practice of machine learning, through popular libraries and frameworks used today. The use of simpler models will allow students to fully explore these learning tools and frameworks with real datasets, to train, test, interrogate and compare models and their relative strengths and weaknesses
  3. To provide practical methods that will enable students to match the capabilities of Machine Learning with high-value, real-world problems in science, industry and business then execute valuable projects on this basis.

The objective of the course is to give a taste of the theory and approach of each in a way that is suited to a general audience (i.e. without requiring deep mathematical background), and to spend a significant amount of time working practical ML experiments and examples with these methods applied to real datasets. This hands-on experience will provide students with plenty of opportunity to experience the practical challenges of applying ML in practice, including incorrect or missing data, the need to think deeply about representations and transformation for creating features, and significant challenges around over-fitting and generalisation.

Due to its importance to ML at present, we will do all practicals in Jupyter Notebook, with Python 3, scikit-learn, Pandas and associated tooling.

At the conclusion of this course students should understand:

  • The principles and approaches used within classical machine learning.
  • The main variants of classical machine learning e.g. classification; regression; supervised and unsupervised learning and reinforcement learning.
  • The key concepts, issues and practices when modelling with these techniques; as well as hands-on experience in using frameworks for this purpose.
  • How to identify valuable problems that can be addressed by ML then execute projects accordingly.
  • How classical machine learning fits within the context of other machine learning approaches, and what tasks it is considered to be suited and not well suited to perform.

Contents

Below we briefly list the core topics that will be covered in the class, first describing the concepts, the specific learning methods, and what will be covered in practicals.

  • Fundamental concepts:
    • Supervised, Unsupervised, Semi-supervised, Reinforcement Learning
    • Classification, Multi-Class Classification and Regression
    • Model performance: Accuracy, Generalisation, and Over-fitting
    • Training vs Test Performance, Loss and reward functions
    • Features: Selection and representation
    • Bayes theorem
  • Learning methods:
    • Linear and polynomial regression, logistic regression
    • Clustering, including k-Nearest Neighbours and Hierarchical Clustering
    • Decision Trees, Random Forests
    • Support Vector Machines
    • Gaussian Mixture Models
    • Reinforcement Learning using the Bellman update equation
    • Dimensionality reduction with Principal Component Analysis
    • Hyper-parameter tuning
    • Regularisation and model complexity reduction
  • Practical Applications:
    • Matching ML capability to operational needs in science, industry and business
    • Introduction to planning and ML project execution
  • Core Practicals:
    • Python refresh (Optional)
    • Matrices and linear algebra
    • Feature engineering (data cleaning and preparation)
    • End-to-end supervised learning
    • Clustering using k-means and other algorithms
    • Implementing gradient descent
    • Polynomial regression and ROC
  • Optional (Bonus) practicals:
    • Trees and Ensemble Methods
    • Dimensionality Reduction
    • Hyper-parameter Tuning
    • Natural Language Processing
    • Gaussian Mixture Models

Assignment

The assignment will test students’ theoretical and practical understanding of the topics covered in the course, through a mixture of dataset exploration and analysis, as well as explanations of different machine learning concepts. It will test students’ ability to provide high-quality documentation, rationale and explanation of methods, tools and algorithms. Finally it may test students’ ability to identify valuable opportunities and problems and to explain how these can be addressed by Machine Learning.

Teaching methods

The main teaching methods will be lectures (around 3 hours daily, broken into two 1.5 hour lectures each) presented using highly visual slides and a few whiteboard explanations. Hands-on practicals will make up the rest of the work; it will consist of a series of exercises applying methods covered using both real and synthetic datasets, Jupyter Notebook and Python, with supporting libraries.

Requirements

CML is intended as an introductory course in Machine Learning that will be accessible to all PRO students. The course is less appropriate for students who already have significant experience of Machine Learning methods and tools.  

In terms of mathematics, some familiarity with techniques of probability; introductory calculus and linear algebra will be helpful. Resources covering these topics will be provided in the pre-study material and in the early part of the course (We will not be doing rigorous mathematical derivations of classifier properties, proofs of convergence or delve into detail in training algorithms).

The MSc contains two modules relating to Machine Learning: CML (this module) and ‘Deep Neural Networks’ (DNN). Whereas DNN focusses exclusively on neural network methods, CML provides a broader and more fundamental view of Machine Learning.  As such, a natural ordering for these two courses would be for students to complete first CML and then DNN. Note that CML is not a formal pre-requisite for DNN.