FDB: Factorised Databases
Factorised databases are relational databases that use compact factorised representations at the physical layer to reduce
data redundancy and boost query performance. They are based on algebraic factorisation using distributivity of relational
product over relational union and commutativity of product and union.
The goal of the project is to understand the benefit
that factorised representations can bring to query evaluation in relational databases, and in particular build a relational
data management system that uses factorisations at the physical layer and relations at the logical layer.
Our initial investigation quantifies the gap between sizes of relations and of their factorised representations. In an ICDT 2012 paper we give two characterisations of conjunctive queries based on factorisations of their results whose nesting structure is defined by so-called factorisation trees. The first characterisation concerns sizes of factorised representations. For any query, we derive a size bound that is asymptotically tight within our class of factorisations. We also characterise the queries by tight bounds on the readability of the provenance of result tuples and define syntactically the class of queries with bounded readability. Future work is to investigate the space of possible compact representation systems for relational data and quantify the trade-off between their succinctness and tractability of query evaluation.
We are currently building FDB, an in-memory query engine for select-project-join queries on factorised databases. Key components of FDB are novel algorithms for query optimisation and evaluation that exploit the succinctness brought by data factorisation. Preliminary experiments show that for data sets with many-to-many relationships FDB can outperform relational engines by orders of magnitude. A first report of FDB is published in a VLDB 2012 paper. FDB was also demonstrated at VLDB 2012. Future work is to extend FDB to evaluate queries with aggregates and order-by clauses, as well as to investigate cost-based optimisation in the context of factorised databases.
News
- September 2012: Tomáš Kočiský was awarded the Gloucester Research Project Prize for best 4th year Maths&CS project 2012. His project investigates the evaluation problem of queries with ORDER-BY and GROUP-BY clauses on factorised databases.
- June 2012: Jakub Závodný will do an internship over the summer of 2012 at Google Zurich.
System prototype
We are currently working on a second, improved version of FDB to be released later in the year. If you want to try out the current version, please contact a member of our team.
Talks
- Factorised Relational Databases
Given by Jakub in Constraints Seminar, Oxford, March 2012.
Given by Dan in: Systems Seminar, EPFL, Feb 2012, Lausanne; Birkbeck Computer Science Departmental Seminar, March 2012, London.
- Factorised Representations of Query Results
Given by Jakub in Information Systems Seminar, Oxford, November 2011.
Given by Dan in: Database Seminar, University of Edinburgh, March 2011, Edinburgh; Dagstuhl Seminar on "Foundations of Distributed Data Management", Oct 2011, Dagstuhl.
Theses
- Szymon Wyleżoł: Cost-based Query Optimisation for Factorised Databases.
MSc in Maths and CS, Oxford 2012.
- Tomáš Kočiský: Queries with Aggregates and Order-By on Factorised Databases.
MSc in Maths and CS, Oxford 2012.
- Nurzhan Bakibayev: A Query Engine for Factorised Databases
MSc in CS, Oxford 2011.
Publications
- Demonstration of the FDB Query Engine for Factorised Databases. [pdf;
poster]
Nurzhan Bakibayev and Dan Olteanu and Jakub Závodný.
System demonstration. In Very Large Data Bases (PVLDB), 5(12), 2012. Istanbul, 2012.
- FDB: A Query Engine for Factorised Relational Databases. [pdf]
Nurzhan Bakibayev and Dan Olteanu and Jakub Závodný.
In Very Large Data Bases (PVLDB), 5(12), 2012. Istanbul, 2012.
Prior version in arXiv technical report abs/1203.2672.
- Factorised Representations of Query Results: Size Bounds and Readability. [pdf,
slides]
Dan Olteanu and Jakub Závodný.
In Int Conf on Database Theory (ICDT), Berlin, 2012.
Prior version (strict subset of results) in arXiv technical report 1104.0867, April 2011.
- On Factorisation of Provenance Polynomials. [pdf,
poster]
Dan Olteanu and Jakub Závodný.
In 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP), June 2011, Heraklion, Crete.
Current Team
- Dan Olteanu
- Jakub Závodný
Alumni: Nurzhan Bakibayev (MSc in CS, 2011); Szymon Wyleżoł (4th year Maths&CS, 2012); Tomáš Kočiský (4th year Maths&CS, 2011-2012);
Acknowledgments
Jakub Závodný is supported by an EPSRC DTA Grant EP/P505216/1. From June 2011 to April 2012, Jakub was partially supported by the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under the FETOpen grant agreement FOX, number FP7-ICT-233599.
info
|
Duration |
January 2010 to October 2013 |
|
People |
|
|
Activities |
|
|
Themes |