# Leveraging the Structure of Uncertain Data

- 11:30 5th March 2019 ( week 8, Hilary Term 2019 )051

Modern data management techniques must deal with data that is

inherently noisy or uncertain: this is the case, e.g., of the results of

automated processes (information extraction, machine learning, etc.),

and of untrustworthy information (e.g., Web pages or crowdsourced

datasets). To address this challenge, the field of probabilistic data

management has studied how to model uncertain data in a principled way,

e.g., using relational databases with probabilistic annotations, that

concisely represents a probability distribution. Probabilistic data

management also studies the computational complexity of query evaluation

on probabilistic data, e.g., determining in which cases we can

efficiently compute the probability of a query result.

In this talk, I will present how probabilistic data management can

leverage the structure of uncertain data, using graph-theoretic measures

such as treewidth. The general approach is to decompose the input

database instance to a tree-shaped encoding, and use Courcelle's theorem

to rewrite the query to a tree automaton that runs on this encoding. I

will present how these tools can be extended to probabilistic data,

following the approach known as knowledge compilation: we compute a

circuit representation of the uncertain query result, we show that it

belongs to a restricted class of circuits, and we use this to perform

probability computations on the circuit. The talk will also present

other ways to use these circuit representations to enumerate query

results or to describe the provenance of the results.