Data and Knowledge Group

― Knowledge Representation and Reasoning

ConCur: Constructing and Curating Knowledge Graphs

A D&KG project

Description

Knowledge graphs are graph-structured knowledge resources which are often expressed as triples such as ("UK", "hasCapital", "London") and ("London", "instanceOf", "City"). As well as such basic "facts", knowledge graphs often include structural knowledge about the domain, typically based on a hierarchy of entity types (AKA classes or concepts); e.g., ("City", "subClassOf", "HumanSettlement"). A knowledge graph that consist largely or wholly of structural knowledge is often called an ontology.

Some knowledge graphs are general purpose, such as Wikidata and the Google knowledge graph, while others are developed for specific domains such as medicine. They are rapidly gaining in importance and are playing a key role in many applications. For example, Google uses its knowledge graph for search, question answering and Google Assistant, while Amazon and Apple also use knowledge graphs to power their personal assistants Alexa and Siri, respectively. Knowledge graphs are widely used in the domain of health and wellbeing, e.g., for organising and exchanging information and to power clinical artificial intelligence (AI). One example is FoodOn, an ontology representing food knowledge such as fine-grained food product categorization, nutrition and allergens, as well as related activities such as agriculture.

Knowledge graph construction and maintenance is, however, very challenging, and may require a considerable amount of human effort. Notwithstanding the high cost of knowledge creation, knowledge graphs are often still biased, incomplete or too coarse-grained. Take HeLis, an ontology for health and lifestyle, as an example. Its food knowledge is quite simple and often represents many different variants with a single entity (e.g., "Banana" for all kinds and derivatives of bananas), and its knowledge of health is highly incomplete when compared with dedicated biomedical ontologies. In addition, it is hard to avoid errors such as incorrect facts and categorisations in knowledge graphs; e.g., FoodOn categorises soy milk as a kind of milk, but not as a kind of soy product. Such errors may be inherited from the information source or be caused by the construction procedure. These issues significantly impact the usefulness of knowledge graphs and the reliability of the systems that use them; e.g., the categorisation of soy milk could be dangerous if the knowledge graph were used in a food allergen alert system.

Therefore, effective knowledge graph construction and curation is urgently required and will play a critical role in exploiting the full value of knowledge graphs. As there are now many available knowledge resources, one possible approach is to use multiple sources to address both coverage and quality issues, e.g., via integration and cross-checking. For example, integrating HeLis with FoodOn would combine fine-grained categorization of food products (including bananas) with lifestyle knowledge. Moreover, cross-checking FoodOn with HeLis will reveal the problem with soy milk, which is correctly categorized as a soy product in HeLis. Automating the integration of knowledge resources is challenging, but combining semantic and learning-based techniques seems to be a very promising approach, and we have already obtained some encouraging preliminary results in this direction.

The ConCur project will therefore study a range of semantic and machine learning techniques, and how to combine them to support knowledge graph construction and curation. As well as its application to knowledge graph construction and curation, this research will also contribute to the development of new neural-symbolic theories, paradigms and methods, such as deep semantic embedding for learning representations for expressive knowledge, and knowledge-guided learning for addressing sample shortage problems. These techniques promise to revolutionize many AI and big data technologies.

Support

ConCur is sponsored by the UK Engineering and Physical Sciences Research Council (EPSRC).

Project Summary

Duration

September 2021 until August 2024

Principal Investigator

Ian Horrocks

Researcher Co-Investigator

Jiaoyan Chen

Other Investigators

Bernardo Cuenca Grau, and Boris Motik

Researchers

TBD

Project Partners

Tencent, and Samsung Electronics Research Institute

Sponsors

UK Engineering and Physical Sciences Research Council