Understanding and mitigating hallucinations in LLM distillation
Supervisors
Suitable for
Abstract
Prerequisites:
1. E: familiar with the basics of LLMs
2. D: familiar with model distillation and LLM hallucination
detection
Background
Large language models (LLMs) have demonstrated remarkable capabilities across reasoning, generation, and
knowledge-intensive tasks, yet their immense computational and memory demands limit accessibility and
real-world deployment.
Model distillation offers a promising solution—compressing the knowledge of powerful teacher models into smaller, faster
students while maintaining much of their performance. This process enables
efficient, cost-effective, and on-device language
intelligence. However, distilled models often inherit or even amplify the hallucination tendencies of their teachers, posing
risks to factual reliability and trustworthiness.
Understanding and mitigating hallucination during knowledge distillation
is therefore both scientifically important and practically valuable. Our project explores methods to reduce hallucination
in distilled models, contributing to
the broader effort to make compact LLMs not only efficient but also faithful and
dependable.
Focus
This project focuses on understanding how student models inherit hallucinations from teacher models, and the
methods to reduce hallucination in distilled models by intervening in the distillation process.
Method
Students could begin by reviewing recent advances in large language model distillation [1, 2] and hallucination
detection like [3, 4]. The project will first conduct an empirical study to examine how student
models inherit hallucinations
from their teacher models. Building on these insights, the project will develop a method that uses hallucination detectors
to identify and filter hallucinated content in the
teacher’s outputs before distillation, aiming to reduce hallucinations
in the resulting student models.
[1] Zhu, Xunyu, et al. "A survey on model compression for large language models."
Transactions of the Association for Computational Linguistics 12 (2024): 1556-1577.
[2] Xu, Xiaohan, et al. "A survey
on knowledge distillation of large language models." arXiv preprint arXiv:2402.13116 (2024).
[3] Farquhar, Sebastian,
et al. "Detecting hallucinations in large language models using semantic entropy." Nature 630.8017 (2024): 625-630.
[4]
Obeso, Oscar, et al. "Real-time detection of hallucinated entities in long-form generation." arXiv preprint arXiv:2509.03531
(2025).