Database Systems Implementation: 2010-2011
This course examines the data structures and algorithms underlying database management systems such as Oracle or PostgreSQL. It covers techniques from both research literature and commercial systems.
At the end of this course, students should
- have a good insight into how DBMSs function internally
- understand how to analyse the performance of data-intensive systems
- be familiar with a variety of programming techniques for large-scale data manipulation
- apply the insights achieved to build the major components of a mini-DBMS.
This course assumes a reasonable understanding of the following topics:
- developing applications on relational DBMSs (SQL, relational algebra - completed an introductory course on Databases)
- sorting/searching techniques (quick/merge sorts, binary trees, hash tables - course on Design and Analysis of Algorithms)
This course will use C and C++. The practicals will have increasing difficulty, to allow students to learn the languages.
- Hardware: Secondary-storage devices; disk access time; Input/Output model of computation; optimized disk access;
- File and System Structure: page layout and access; buffer management; file organizations (heap, sorted, clustered); row stores versus column stores;
- Indexes: Tree-structured (ISAM, B+tree); hash-based (static, extendible, linear); multi-dimensional (UB-tree, k-d-b tree, R-tree)
- External Sorting: external n-way merge sort; sorting based on B+trees;
- Query Evaluation: Selection (index-based, hash-based, arbitrary selection predicates); projection (duplicate elimination; hash-based, sorting-based); joins (nested-loops, index nested, block nested, sort-merge, hash joins); set operations; aggregation; impact of buffering, pipelining, blocking; evaluation techniques in existing systems;
- Query Optimization: cardinality estimation for all query operators, histograms ; equivalences of relational algebra; query plans; cost estimation; nested queries; join optimization algorithms (dynamic programming and greedy join enumeration approaches); optimization techniques in existing systems;
- Transaction Management: ACID properties; concurrency control (serializability criteria); locking (two-phase locking, index locking, multiple granularity locks, intention locks); deadlock detection; isolation levels; concurrency control in existing systems;
SyllabusThe internals of a relational database management system; data storage; buffer management; indexes; sorting; query evaluation and optimization; transaction management; concurrency control; comparisons of existing commercial systems; use cases: PostgreSQL and IBM DB2.
We will follow closely chapters from these two books
- Ramakrishan and Gehrke. Database Management Systems, McGraw-Hill, 2002 (3rd ed).
- Garcia-Molina, Ullman, Widom. Database Systems: The Complete Book. Prentice Hall, 2002 (or 2008, 2nd ed).
We will also discuss (or refer to) relevant research papers published in recent years in top conferences and journals in database systems and data management.