# Computer Vision:  2023-2024

 Lecturer Christian Rupprecht Degrees Term Hilary Term 2024  (20 lectures)

## Overview

This is an advanced course in modern computer vision and machine learning. It contains fundamental concepts from classical computer vision: filtering, matching, indexing and 3D computer vision. On top of that, a large portion of the course focuses on current computer vision methodologies and problems, which build on top of deep learning techniques: detection, segmentation, generation, and vision and language models. This course will introduce the fundamental mathematical concepts behind these tasks and how they can be integrated into modern machine-learning models. The taught material and assessment include both theoretical derivations as well as applied implementations, and students are expected to be proficient with both.

## Learning outcomes

After studying this course, students will:

• Understand a wide range of computer vision concepts.
• Be able to use, implement and evaluate common models for vision tasks.
• Have a solid understanding of core computer vision tasks, matching, segmentation, detection, tracking, and generation.
• Be able to visualise features and decision boundaries to understand trained vision models.
• Gain the ability to derive and understand the mathematics behind 3D/geometric/multi-view computer vision.
• Understand the ethical and privacy-related implications of large datasets and models.
• Be able to design and implement various computer vision algorithms for a wide variety of tasks.

## Prerequisites

Required background knowledge includes machine learning, linear algebra, continuous mathematics, and multivariate calculus, as well as good programming skills in Python. Students are required to have taken the Machine Learning course. The programming environment used in the lecture examples and practicals will be Python/PyTorch.

Undergraduate students are required to have taken the following courses:
● Machine Learning

## Synopsis

1. Introduction
a) Computer Vision Overview
b) Historical Context

2. Image enhancement: image basics and terminology
a) point operations
b) spatial filters
c) matched filters.

3. 2D Fourier transforms and applications:
a) spatial frequencies
b) convolution theorem
c) aliasing.

4. Image Restoration
a) inverse and Wiener filters
b) applications to defocus and motion deblurring
c) MAP estimation

5. Matching, indexing, and search
a) detectors
b) descriptors
c) bag of words

6. Image Classification
a. Data-driven approaches
b. k-NN
c. learning

7. Convolutional Networks
a) history
b) convolution

8. Transformer Networks for Images
a) attention
b) discretization

9. Visualization and Understanding
a) feature visualisation methods
b) explainability
c) transformers

10. Object Detection
a) two-stage approaches
b) single-stage approaches

11. Image Segmentation
a) semantic segmentation
b) instance segmentation
c) panoptic segmentation

12. Videos
a) classification
b) detection
c) temporal regularization

13. Tracking
a) points
b) templates
c) optical flow

14. Generative Models I
a) autoencoder
b) VAE
c) auto-regressive models

15. Generative Models II
a) GANs
b) diffusion models
c) Deep equilibrium models

16. Camera models and triangulation
a) fundamental matrix
b) homogeneous coordinates
c) triangulation

17. 3D Reconstruction
a) Multiple-view geometry
b) Learning-based methods

18. Representation Learning
b) learning from negatives
c) distillation

19. Unsupervised Computer Vision
a) Priors
b) learning signals

20. Vision and Language
a) Captioning
c) LVMs

21. Ethics and Privacy
a) Ethics
b) Privacy
c) Who is harmed?