AI Scientist for Scientific Discovery with LLM

Supervisors

Yarin Gal

Hao Fei

Suitable for

MSc in Advanced Computer Science

Abstract

Prerequisites:

E: familiar with the basics of LLMs, reasoning-aware training (SFT/RL) experience is a big plus
D: familiar with theory or applications of causal inference/discovery

Background

Large Language Models (LLMs) have demonstrated strong capabilities in understanding and summarizing scientific literature. However, most existing systems remain correlation-driven and lack the ability to generate causal, testable scientific hypotheses. In contrast, scientific discovery requires reasoning about mechanisms, variables, and cause–effect relationships, rather than surface-level patterns. This project aims to explore how LLMs can be used as AI scientists: systems that read scientific literature, reason over structured evidence, and propose executable hypotheses that can be evaluated against data. The project focuses on causal discovery as a central challenge in modern AI-driven scientific research.

Focus

The project focuses on developing an AI scientist that uses large language models to support scientific discovery by generating and evaluating causal hypotheses from scientific literature, including, how LLMs can reason over scientific texts, how causal hypotheses can be represented in a structured and executable form, and how such hypotheses can be evaluated using data.

Method

The project follows a modular pipeline: 1) Literature grounding: scientific documents are ingested and structured using retrieval-augmented methods (e.g., entity extraction and evidence linking). 2) Hypothesis generation: an LLM proposes candidate hypotheses represented as constrained, executable programs that encode causal mechanisms. 3) Evaluation: hypotheses are executed on data (simulated or real) and scored based on explanatory power and complexity, following principles similar to causal discovery and model selection. This approach connects natural language understanding with causal reasoning and algorithmic evaluation.

[1] Peters, Jonas, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. The MIT press, 2017.

[2] Kiciman, Emre, et al. "Causal reasoning and large language models: Opening a new frontier for causality." Transactions on Machine Learning Research (2023).

[3] Karimi Mamaghan, Amir Mohammad, et al. "Challenges and considerations in the evaluation of bayesian causal discovery." arXiv e-prints (2024): arXiv-2406.

[4] Narayanan, Siddharth M., et al. "Training a Scientific Reasoning Model for Chemistry." arXiv preprint arXiv:2506.17238 (2025).