Skip to main content

Knowledge Extraction from Natural Language

Supervisor

Suitable for

MSc in Advanced Computer Science
Computer Science, Part C

Abstract

Knowledge Graphs (KGs) provide a structured representation of entities and their relationships, enabling powerful semantic querying and reasoning across diverse domains such as enterprise data integration, bioinformatics, and the semantic web. However, constructing high-quality KGs from unstructured sources remains a significant challenge. Enterprise textual data stored in web pages, manuals and documentation often contain rich information that must be accurately extracted, normalized, and linked to ontologies to ensure consistency and usability.

The goal of this project is to design and implement a system for automated extraction of RDF Knowledge Graphs from textual data. The work will explore state-of-the-art techniques in natural language processing, entity recognition, leveraging Large Language Models and semantic indexing for semantic understanding and disambiguation.

This project presents the opportunity to work with one of the Computer Science department’s spinout companies, and success story, Oxford Semantic Technologies. As well as help candidates build their CV strong candidates will have the opportunity of summer internships with Oxford Semantics.

 

Background reading

 

Aidan Hogan et al. Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers 2021, ISBN 978-3-031-00790-3, pp. 1-257

 

Heiko Paulheim.  Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8(3): 489-508 (2017)

 

Igor MelnykPierre L. DogninPayel Das:
Knowledge Graph Generation From Text. EMNLP (Findings) 2022: 1610-1622

 

Belinda Mo, Kyssen YuJoshua KazdanProud MpalaLisa YuChris CundyCharilaos I. KanatsoulisSanmi Koyejo: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models. CoRR abs/2502.09956 (2025)