Query Rewriting for Expressive Ontology Languages
24th September 2010 to 31st December 2013
Background and Motivation:
Critical decisions in industry, science, government, and healthcareare are increasingly based on information derived from data sources whose number, size and heterogeneity continue to grow at a phenomenal rate. Improved access to and exploitation of such data is thus becoming critical to both our future competitiveness and our quality of life. One such example in the telecoms industry relates to the organisation and management of global communications networks, which has become highly complex, and involves the adjustment in real time of numerous parameters. Decisions on how to adjust these parameters can have a huge impact on performance and on profitability. These decisions are based on information derived from a large number of data sources, ranging from historical data to data streaming from monitoring equipment.
In order to meet the challenges presented by this kind of application, a new generation of Ontology-Based Information Systems (OBISs) is beginning to emerge. These systems aim to offer high performance and large storage capacities, while at the same time exploiting rich knowledge about the application domain captured in ontologies; this knowledge is used to present a unified view over the data sources, provide a domain-centric vocabulary for use in queries, deal with incomplete and semi-structured data, and enrich query answers with implicit information. The hope is that OBISs can be realised via a principled synthesis of ontological reasoning and database management techniques in a flexible architecture that will enable them to access data directly from different kinds of data repository, exploit different kinds of query answering technology, and adapt to a wide range of different application settings, while still providing formal guarantees about the quality of query answers.
Currently, all practical approaches to building OBISs rely on query answering via query rewriting: answers to a query q over an ontology O and a database instance DB are computed by first rewriting q (using O) into another query q' and then evaluating q' over DB. Query rewriting means that query evaluation can be delegated to existing information systems and thus exploit the capabilities that they provide, including robust scalability and the ability to handle rapidly changing data. However, existing query rewriting techniques have focused on reducing the expressivity of the ontology language in order to obtain formal tractability guarantees; this seriously restricts the modelling capabilities of the system, while not necessarily delivering robust scalability in practice.
The goal of the project is to design practically effective query answering algorithms for more expressive ontology languages with features such as transitivity and equality, and demonstrating that OBISs based on such languages can be effectively deployed in applications. In achieving these goals we will work closely with Alcatel-Lucent Bell Labs, who have extensive technical expertise in this area, as well as motivating applications in the telecoms domain.
Due to the fundamental trade-off between ontology language expressivity and query answering complexity, the currently prevalent view is that Ontology-Based Information Systems (OBISs) should use ontology languages that allow for first-order reducibility of query answering, with the low data complexity of the resulting algorithm often being given as a supporting argument. Reasoning with individual equality, transitivity, recursive axioms, or any kind of disjunctive information is LogSpace-hard in data complexity, which prevents first-order reducibility, so these features were excluded from OBIS languages such as OWL 2 QL. Such features, however, are often needed in practice; for example, transitivity is needed to describe part-whole relationships, and equality is critical in information integration. Furthermore, even with very restricted ontology languages, the existing query rewriting algorithms produce rewritings whose size is of the order ((|O| . |q|)^|q|), where |q| and |O| are the sizes of the original query and the ontology, respectively. Data complexity is useful only when the size of the query is negligible w.r.t. the size of the data; but then, since the rewritings can be large, the low data complexity of query answering via rewriting does not provide a realistic estimate of performance in practice. Thus, while seriously restricting the ontology language, first-order reducibility does not provide a practical scalability guarantee.
The technical challenge of this project is to develop a new approach to query answering in OBISs. On the one hand, in order to accommodate the expressivity required in practice, more expressive languages should be considered as targets for query rewriting. On the other hand, finer-grained complexity measures are needed in order to identify assumptions about the content and usage of the OBIS under which scalability can still be guaranteed.
Regarding more expressive targets for query rewriting, first steps in this direction have already been taken: if an ontology O is expressed in the DL ELHIO, a conjunctive query q over O can be rewritten into a Datalog query q'. The polynomial data complexity of Datalog may be too high in general; however, little is known about using languages with data complexity between AC0 and PTime as targets for query rewriting. Regular path queries (RPQs) are a promising candidate since they provide a limited form of recursion. RPQs were used as the basis for the query languages in semi-structured databases, so a rewriting approach based on RPQs should allow for ontology-based access to semi-structured data.
Regarding finer-grained complexity measures, in related fields such as propositional satisfiability, constraint satisfaction and conjunctive query answering, measures based on (hyper)treewidth have proved very effective. Adapting such measures to OBISs, and developing query answering algorithms that exploit them, is thus an important research problem. It is currently unknown whether assumptions about the structure of data (rather than on the structure of the ontology and/or the query) may be used to obtain practically effective query answering algorithms. Furthermore, determining classes of ontology languages and/or queries that admit `small' rewritings is an important and practically relevant open problem.
Some usage scenarios may also allow alternative architectures for OBISs to be considered. For example, if the ontology and the data change relatively infrequently, ontology-based ISs could be based on the "combined" query answering techniques, where some entailed facts are materialised in the data. It is, however, currently unclear whether these can be extended to expressive languages such as Horn-SHIQ and the related dependency classes. Furthermore, if the data changes more frequently, the cost of recomputing the preprocessed database instance may be prohibitive; in such cases, however, an incremental maintenance approach may be efficacious.