Data and Knowledge Group

― Knowledge Representation and Reasoning

ConCur: Constructing and Curating Knowledge Graphs

A D&KG project

Description

Advanced applications relying on intelligent management of loosely structured and large-scale datasets play a key role in domains such as healthcare, business and government. Ontology-based technologies lie at the core of many such applications. In a nutshell, an ontology-based data management system (ODMS) enables intelligent information processing by providing means for representing background knowledge about the application in an ontology, and exploiting automated reasoning techniques to infer information that is implicit in the data and the ontology.

State-of-the art ODMSs are, however, not well-suited for applications which require real-time analysis of rapidly changing data. For instance, oil and gas companies continuously monitor sensor readings to detect equipment malfunction and predict maintenance needs; network providers analyse flow data to identify traffic anomalies and Denial of Service attacks; knowledge graphs such as GDELT are continuously updated with information about news media to monitor breaking developments anywhere on the planet; and Internet of Things (IoT) applications such as Smart Cities require real-time analysis of data stemming from multiple types of device.

ODMSs often borrow implementation techniques from the database literature, where real-time analysis of rapidly changing data has been tackled using two main approaches.

(1) In a stream processing system, the input data is conceptually seen as an unbounded sequence of time-stamped tuples that flow through the system; data is only available for processing in a single pass and information stored by the system is inherently incomplete. Streaming jobs are long-running: queries are deployed once and continue to produce results until removed.State-of-the art systems, such as Apache Storm, Apache Spark Streaming, Google's Millwheel, Linked In's Samza, and Apache Flink, achieve sub-second latencies by distributing the streaming workload in a cluster, which requires sophisticated scheduling and fault-tolerance techniques.

(2) In a real-time database, the data is seen as a finite collection of records that is continuously evolving. This traditional concept of a finite and persistent collection is ubiquitous in the database world is well-suited for applications requiring a consistent and complete view of the data.The key feature that distinguishes real-time from traditional databases is that, similarly to streaming systems, they allow clients to subscribe to long-running continuous queries that instantaneously push incremental updates.

Many theoretical and practical difficulties arise, however, when adapting these approaches to ODMSs. In the OASIS project, we will address these difficulties and lay the foundations for a new generation of ODMSs capable of ingesting and processing rapidly changing data in real time. Such systems will support the aforementioned applications by enabling fast execution of complex analytics pipelines supporting intelligent decisions. Moreover, we will exploit the resulting insights to implement a prototype and test it in real-life deployments.

Support

ConCur is sponsored by the UK Engineering and Physical Sciences Research Council (EPSRC).

Project Summary

Duration

September 2021 until August 2024

Principal Investigator

Ian Horrocks

Researcher Co-Investigator

Jiaoyan Chen

Other Investigators

Bernardo Cuenca Grau, and Boris Motik

Researchers

TBD

Project Partners

Tencent, and Samsung Electronics Research Institute

Sponsors

UK Engineering and Physical Sciences Research Council