Skip to main content

Leveraging the Structure of Uncertain Data

Antoine Amarilli ( Telecom ParisTech )

Modern data management techniques must deal with data that is
inherently noisy or uncertain: this is the case, e.g., of the results of
automated processes (information extraction, machine learning, etc.),
and of untrustworthy information (e.g., Web pages or crowdsourced
datasets). To address this challenge, the field of probabilistic data
management has studied how to model uncertain data in a principled way,
e.g., using relational databases with probabilistic annotations, that
concisely represents a probability distribution. Probabilistic data
management also studies the computational complexity of query evaluation
on probabilistic data, e.g., determining in which cases we can
efficiently compute the probability of a query result.

In this talk, I will present how probabilistic data management can
leverage the structure of uncertain data, using graph-theoretic measures
such as treewidth. The general approach is to decompose the input
database instance to a tree-shaped encoding, and use Courcelle's theorem
to rewrite the query to a tree automaton that runs on this encoding. I
will present how these tools can be extended to probabilistic data,
following the approach known as knowledge compilation: we compute a
circuit representation of the uncertain query result, we show that it
belongs to a restricted class of circuits, and we use this to perform
probability computations on the circuit. The talk will also present
other ways to use these circuit representations to enumerate query
results or to describe the provenance of the results.

 

 

Share this: