A Massively Scalable Intelligent Information Infrastructure
Ontology-based Data Management Systems (ODMSs) are a new kind of data
management systems specifically designed to manage large semi-structured data
sets needed to power modern intelligent applications. ODMS data is typically
expressed using formalisms such as the Resource Description Framework (RDF),
the Web Ontology Language (OWL), and the Semantic Web Rule Language (SWRL).
The main task of an ODMS is to answer queries over the given ontology and data
set, with the queries commonly being expressed in the SPARQL language.
Reasoning plays a key role in ODMSs, and modern intelligent applications
commonly require an integration of taxonomic, spatio-temporal, mereological,
and other kinds of reasoning.
ODMSs can and do exploit implementation techniques described in the database
literature. The computational problems that such systems need to solve,
however, are very hard, so developing robustly scalable systems is extremely
challenging, usually requiring a combination heuristics and careful
engineering. Although significant progress has been made and state of the art
ODMSs can now deal with nontrivial data sets, their performance still falls
far short of what is required by modern `data hungry' applications. This is
partly due to the sheer size of the data sets that need to be processed, but
also partly due to the complexity of the computational tasks that need to be
The main hypothesis of this project is that the robust scalability required by
modern ODMS applications can only be achieved through the principled
application of techniques that provide provable performance and/or
tractability guarantees. The use of such techniques will not only allow for
better and more consistent performance, but will also help ODMS users to
better understand and thus avoid performance bottlenecks. This is to be
achieved by a synthesis of the techniques from three distinct fields:
knowledge representation will provide the necessary reasoning algorithms,
databases will provide the techniques for scalable data storage and analysis
of the query structure, and mathematical network theory will provide the
techniques for describing the statistical properties of ontology data.
Combining all of these techniques with insightful engineering and extensive
optimisation will enable the implementation a new ODMS with scalability
surpassing that of existing systems by several orders of magnitude. This
project thus aims to lay both the theoretical and the practical foundations
for a massively scalable intelligent information infrastructure capable of
powering modern data-intensive applications.