Database Systems Implementation: 2019-2020

Lecturer

Amir Shaikhha

Degrees

Schedule C1 (CS&P) — Computer Science and Philosophy

Schedule C1 — Computer Science

Schedule C1 — Mathematics and Computer Science

Schedule C — MSc in Advanced Computer Science

Term

Hilary Term 2020 (22 lectures)

Overview

This course examines the data structures and algorithms underlying database management systems such as Oracle or PostgreSQL. It covers techniques from both research literature and commercial systems.

The lectures for this course are not going to be recorded in Hilary Term 2019.

Learning outcomes

At the end of this course, students should

have a good insight into how DBMSs function internally
understand how to analyse the performance of data-intensive systems
be familiar with a variety of programming techniques for large-scale data manipulation
apply the insights achieved to build the major components of a mini-DBMS

Prerequisites

This course assumes a reasonable understanding of the following topics:

developing applications on relational DBMSs (SQL, relational algebra - completed an introductory course on Databases)
sorting/searching techniques (quick/merge sorts, binary trees, hash tables - course on Design and Analysis of Algorithms)

This course will use C and C++. The practicals will have increasing difficulty, to allow students to learn the languages.

Synopsis

Hardware: Secondary-storage devices; disk access time; Input/Output model of computation; optimized disk access;
File and System Structure: page layout and access; buffer management; file organizations (heap, sorted, clustered); row stores versus column stores;
Indexes: Tree-structured (ISAM, B+tree); hash-based (static, extendible, linear); multi-dimensional (UB-tree, k-d-b tree, R-tree)
External Sorting: external n-way merge sort; sorting based on B+trees;
Query Evaluation: Selection (index-based, hash-based, arbitrary selection predicates); projection (duplicate elimination; hash-based, sorting-based); joins (nested-loops, index nested, block nested, sort-merge, hash joins); set operations; aggregation; impact of buffering, pipelining, blocking; evaluation techniques in existing systems;
Query Optimization: cardinality estimation for all query operators, histograms ; equivalences of relational algebra; query plans; cost estimation; nested queries; join optimization algorithms (dynamic programming and greedy join enumeration approaches); optimization techniques in existing systems;
Transaction Management: ACID properties; concurrency control (serializability criteria); locking (two-phase locking, index locking, multiple granularity locks, intention locks); deadlock detection; isolation levels; concurrency control in existing systems;

Syllabus

The internals of a relational database management system; data storage; buffer management; indexes; sorting; query evaluation and optimization; transaction management; concurrency control; comparisons of existing commercial systems; use cases: PostgreSQL and IBM DB2.

Reading list

We will follow closely chapters from these two books

Ramakrishan and Gehrke. Database Management Systems, McGraw-Hill, 2002 (3rd ed).
Garcia-Molina, Ullman, Widom. Database Systems: The Complete Book. Prentice Hall, 2002 (or 2008, 2nd ed).

We will also discuss (or refer to) relevant research papers published in recent years in top conferences and journals in database systems and data management.

Taking our courses

This form is not to be used by students studying for a degree in the Department of Computer Science, or for Visiting Students who are registered for Computer Science courses

Other matriculated University of Oxford students who are interested in taking this, or other, courses in the Department of Computer Science, must complete this online form by 17.00 on Friday of 0th week of term in which the course is taught. Late requests, and requests sent by email, will not be considered. All requests must be approved by the relevant Computer Science departmental committee and can only be submitted using this form.