Knowledge Extraction from Natural Language
Supervisor
Suitable for
Abstract
Knowledge Graphs (KGs) provide a structured representation of entities and their relationships, enabling powerful semantic querying and reasoning across diverse domains such as enterprise data integration, bioinformatics, and the semantic web. However, constructing high-quality KGs from unstructured sources remains a significant challenge. Enterprise textual data stored in web pages, manuals and documentation often contain rich information that must be accurately extracted, normalized, and linked to ontologies to ensure consistency and usability.
The goal of this project is to design and implement a system for automated extraction of RDF Knowledge Graphs from textual data. The work will explore state-of-the-art techniques in natural language processing, entity recognition, leveraging Large Language Models and semantic indexing for semantic understanding and disambiguation.
This project presents the opportunity to work with one of the Computer Science department’s spinout companies, and success story, Oxford Semantic Technologies. As well as help candidates build their CV strong candidates will have the opportunity of summer internships with Oxford Semantics.
Background reading
Aidan Hogan et al. Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers 2021, ISBN 978-3-031-00790-3, pp. 1-257
Heiko Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8(3): 489-508 (2017)
Igor Melnyk, Pierre L.
Dognin, Payel Das:
Knowledge Graph Generation
From Text. EMNLP (Findings) 2022: 1610-1622
Belinda Mo, Kyssen Yu, Joshua Kazdan, Proud Mpala, Lisa Yu, Chris Cundy, Charilaos I. Kanatsoulis, Sanmi Koyejo: KGGen: Extracting Knowledge Graphs from Plain Text with Language Models. CoRR abs/2502.09956 (2025)