Skip to main content

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations

Oana−Maria Camburu‚ Brendan Shillingford‚ Pasquale Minervini‚ Thomas Lukasiewicz and Phil Blunsom


To increase trust in artificial intelligence systems, a growing amount of works are enhancing these systems with the capability of producing natural language explanations that support their predictions. In this work, we show that such appealing frameworks are nonetheless prone to generating inconsistent explanations, such as "A dog is an animal" and "A dog is not an animal", which are likely to decrease users' trust in these systems. To detect such inconsistencies, we introduce a simple but effective adversarial framework for generating a complete target sequence, a scenario that has not been addressed so far. Finally, we apply our framework to a state-of-the-art neural model that provides natural language explanations on SNLI, and we show that this model is capable of generating a significant amount of inconsistencies.