Uncertain Database Management Systems
Supervisor
Suitable for
Abstract
Not available in 2013/14
Today, uncertainty is commonplace in data management scenarios dealing with data integration, sensor readings, information extraction from unstructured sources, or whenever information is manually entered and is therefore prone to inaccuracy or partiality. In these scenarios, uncertainty arises from the existence of alternatives for mapping schemas of different sources or for possible non-identical record duplicates, different interpretations of sensor data, multiple extraction possibilities from unstructured data, or several possible readings of manually filled forms respectively.
To accommodate uncertainty, the current data management technology should pursue a paradigm shift from deterministic to possible worlds semantics and address the basic data management problems in the new context. Projects on this topic should offer support for this paradigm shift by investigating some of the following directions
- compact representation systems for large sets of possible worlds,
- techniques for processing and constraints on succinct representations of possible worlds,
- uncertainty-aware query languages beyond relational algebra.
All aforementioned directions can lead to both theoretical and practical (implementation-oriented) projects. Anyone interested in doing a project in one of these topics is encouraged to get in touch with Dan Olteanu to explore specific ideas, such as
- Query Evaluation: Tractability and Efficient Algorithms
- Approximate and incremental view maintenance in probabilistic databases
- Synthesising query mappings for input and output probabilistic data
- View materialization for query optimization in probabilistic databases
- Modelling and processing streams of uncertain sensor data
- Algebraic optimizations for the MayBMS query language
Prerequisites: All projects within this framework require prior exposure to databases, though some projects may only require knowledge of either database theory or database systems. In the latter case, strong C/C++ skills are essential (Proficiency in any other general-purpose programming language is in any case an important start). Students with very good marks in the Database Systems Implementation course are preferred.
Suggested reading
- Latest publications on SPROUT and MayBMS.
- Lyublena Antova and Christoph Koch and Dan Olteanu. From Complete to Incomplete Information and Back. SIGMOD 2007.
- Lyublena Antova and Thomas Jansen and Christoph Koch and Dan Olteanu. Fast and Simple Relational Processing of Uncertain Data. ICDE 2008.
- Nilesh Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. VLDB 2004.