Skip to main content

Realistic Benchmarks for Explainable GNN-Based Models

Supervisors

Suitable for

Mathematics and Computer Science, Part C
Computer Science and Philosophy, Part C
Computer Science, Part C
Computer Science, Part B

Abstract

Graph Neural Networks (GNNs) are a family of machine learning models that have been successfully applied to many real-world tasks involving graph-structured data. The predictions made by GNNs, however, are often difficult to explain because of the opaque nature of the model. This becomes especially problematic in safety-critical applications or in contexts that require compliance with principles of explainability. The Data & Knoweldge Group has recently implemented a GNN-based system where every prediction made by the model can be explained by a rule expressed in Datalog, a logical language with well-defined semantics. For example, suppose we use our system to recommend novels to users based on bibliographic information about the novels and previous user-novel interactions. Imagine that the system recommends the novel "Pride and Prejudice" to user Alex. This prediction can be explained by the Datalog rule “Recommend(x, y) :- RomanticGenre(x) & Liked(y,z) & RomanticGenre(z)” (recommend a romantic novel if the user already liked a romantic novel) if the input contains facts such as RomanticGenre(PrideAndPrejudice), RomanticGenre(TheNotebook), and Liked(Alex,TheNotebook). The first aim of this project is to prepare a new suite of benchmarks for our system, by finding and pre-processing relevant data sources. The student will then train and test our GNN-based system on these benchmarks, and evaluate the quality of the computed explanations. If time permits, the student will also investigate how to adjust our system's rule extraction algorithm to produce more useful explanations.

Pre-requisites: Experience with Python and familiarity with first-order logic is recommended. Experience with Knowledge Graphs is optional, but desirable.