Skip to main content

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations

Oana−Maria Camburu‚ Brendan Shillingford‚ Pasquale Minervini‚ Thomas Lukasiewicz and Phil Blunsom


To increase trust in artificial intelligence systems, a growing amount of works are enhancing these systems with the capability of producing natural language explanations that support their predictions. In this work, we show that such appealing frameworks are nonetheless prone to generating inconsistent explanations, such as “A dog is an animal.” and “A dog is not an animal.”, which are likely to decrease users’ trust in these systems. To detect such inconsistencies, we introduce a simple yet effective adversarial framework for generating a complete target sequence, a scenario that has not been addressed so far. Finally, we instantiate our framework on a state-of-the-art neural model that provides natural language explanations on SNLI, and we show that this model is capable of generating a significant amount of inconsistencies.

Book Title
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics‚ ACL 2020‚ Seattle‚ Washington‚ USA‚ July 5 − 10‚ 2020
Joyce Chai and Natalie Schluter and Joel Tetreault
Association for Computational Linguistics