Skip to main content

Understanding and mitigating hallucinations in LLM distillation

Supervisors

Suitable for

MSc in Advanced Computer Science

Abstract

Prerequisites:
1. E: familiar with the basics of LLMs
2. D: familiar with model distillation and LLM hallucination detection

Background
Large language models (LLMs) have demonstrated remarkable capabilities across reasoning, generation, and knowledge-intensive tasks, yet their immense computational and memory demands limit accessibility and
real-world deployment. Model distillation offers a promising solution—compressing the knowledge of powerful teacher models into smaller, faster students while maintaining much of their performance. This process enables
efficient, cost-effective, and on-device language intelligence. However, distilled models often inherit or even amplify the hallucination tendencies of their teachers, posing risks to factual reliability and trustworthiness.
Understanding and mitigating hallucination during knowledge distillation is therefore both scientifically important and practically valuable. Our project explores methods to reduce hallucination in distilled models, contributing to
the broader effort to make compact LLMs not only efficient but also faithful and dependable.

Focus
This project focuses on understanding how student models inherit hallucinations from teacher models, and the methods to reduce hallucination in distilled models by intervening in the distillation process.

Method
Students could begin by reviewing recent advances in large language model distillation [1, 2] and hallucination detection like [3, 4]. The project will first conduct an empirical study to examine how student
models inherit hallucinations from their teacher models. Building on these insights, the project will develop a method that uses hallucination detectors to identify and filter hallucinated content in the
teacher’s outputs before distillation, aiming to reduce hallucinations in the resulting student models. 
[1] Zhu, Xunyu, et al. "A survey on model compression for large language models." Transactions of the Association for Computational Linguistics 12 (2024): 1556-1577.
[2] Xu, Xiaohan, et al. "A survey on knowledge distillation of large language models." arXiv preprint arXiv:2402.13116 (2024).
[3] Farquhar, Sebastian, et al. "Detecting hallucinations in large language models using semantic entropy." Nature 630.8017 (2024): 625-630.
[4] Obeso, Oscar, et al. "Real-time detection of hallucinated entities in long-form generation." arXiv preprint arXiv:2509.03531 (2025).