Skip to main content

Query Compilation: the View from the Database Side

Dan Suciu ( University of Washington )

We study knowledge compilation for Boolean formulas that are given as groundings of First Order formulas. This problem is motivated by probabilistic databases, where each record in the database is an independent probabilistic event, and the query is given by a SQL expression or, equivalently, a First Order formula. The query's probability can be computed in linear time in the size of the compilation representation, hence the interest in studying the size of such a representation. We consider the "data complexity" setting, where the query is fixed, and the input to the problem consists only of the database instance. We consider several compilation targets, of increasing expressive power: OBDDs, FBDDs, and decision-DNNFs (a subclass of d-DNNFs). For the case of OBDDs we establish a dichotomy theorem for queries in restricted languages FO(\exists, \wedge, \vee) and FO(\forall, \wedge, \vee): for each such query the OBDD is either linear in the size of the database, or grows exponentially, and the complexity can be determined through a simple analysis of the query expression. For the other targets we describe a class of queries for which (a) the decision-DNNF is exponentially large in the size of the database, and (b) the probability of the query can be computed in polynomial time in the size of the database. This suggests that the compilation target decision-DNNF is too weak to capture all tractable cases of probabilistic inference. Our lower bound for decision-DNNF's relies on a translation into FBDD's, which is of independent interest. Joint work with Paul Beame, Abhay Jha, Jerry Li, and Sudeepa Roy.

Speaker bio

Dan Suciu is a Professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, was a principal member of the technical staff at AT&T Labs and joined the University of Washington in 2000. Suciu is conducting research in data management, with an emphasis on topics related to Big Data and data sharing, such as probabilistic data, data pricing, parallel data processing, data security. He is a co-author of two books Data on the Web: from Relations to Semistructured Data and XML, 1999, and Probabilistic Databases, 2011. He is a Fellow of the ACM, holds twelve US patents, received the best paper award in SIGMOD 2000 and ICDT 2013, the ACM PODS Alberto Mendelzon Test of Time Award in 2010 and in 2012, the 10 Year Most Influential Paper Award in ICDE 2013, the VLDB Ten Year Best Paper Award in 2014, and is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship. Suciu serves on the VLDB Board of Trustees, and is an associate editor for the VLDB Journal, ACM TWEB, and Information Systems and is a past associate editor for ACM TODS and ACM TOIS. Suciu's PhD students Gerome Miklau and Christopher Re received the ACM SIGMOD Best Dissertation Award in 2006 and 2010 respectively, and Nilesh Dalvi was a runner up in 2008.

 

 

Share this: