Skip to main content

KNOW How to Make Up Your Mind! Adversarially Detecting and Remedying Inconsistencies in Natural Language Explanations

Myeongjun Jang‚ Bodhisattwa Prasad Majumder‚ Julian McAuley‚ Thomas Lukasiewicz and Oana−Maria Camburu


While recent works have been considerably improving the quality of the natural language explanations (NLEs) generated by a model to justify its predictions, there is very limited research in detecting and remedying inconsistencies among generated NLEs. In this work, we leverage external knowledge bases to significantly improve on an existing adversarial framework for detecting inconsistent NLEs. We apply our framework to high-performing NLE models and show that models with higher NLE quality do not necessarily generate fewer inconsistencies. Moreover, we propose an off-the-shelf remedy that alleviates NLE inconsistency by injecting external background knowledge into the model. Our remedy decreases the inconsistencies of previous high-performing NLE models as detected by our framework.

Book Title
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics‚ ACL 2023‚ Toronto‚ Canada‚ July 9–14‚ 2023
Association for Computational Linguistics