Knowledge Extraction from Natural Language

Supervisor

Suitable for

Abstract

Knowledge Graphs (KGs) provide a structured representation of entities and their relationships, enabling powerful semantic querying and reasoning across diverse domains such as enterprise data integration, bioinformatics, and the semantic web. However, constructing high-quality KGs from unstructured sources remains a significant challenge. Enterprise textual data stored in web pages, manuals and documentation often contain rich information that must be accurately extracted, normalized, and linked to ontologies to ensure consistency and usability.

The goal of this project is to design and implement a system for automated extraction of RDF Knowledge Graphs from textual data. The work will explore state-of-the-art techniques in natural language processing, entity recognition, leveraging Large Language Models and semantic indexing for semantic understanding and disambiguation.

This project presents the opportunity to work with one of the Computer Science department’s spinout companies, and success story, Oxford Semantic Technologies. As well as help candidates build their CV strong candidates will have the opportunity of summer internships with Oxford Semantics.

Background reading

Aidan Hogan et al. Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers 2021, ISBN 978-3-031-00790-3, pp. 1-257

Heiko Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8(3): 489-508 (2017)

Igor Melnyk, Pierre L. Dognin, Payel Das:
Knowledge Graph Generation From Text. EMNLP (Findings) 2022: 1610-1622

Belinda Mo, Kyssen Yu, Joshua Kazdan, Proud Mpala, Lisa Yu, Chris Cundy, Charilaos I. Kanatsoulis, Sanmi Koyejo: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models. CoRR abs/2502.09956 (2025)

Knowledge Extraction from Natural Language

Supervisor

Suitable for

Abstract

Student Space