Learning Proof Path Selection Policies in Neural Theorem Proving
Matthew Morris‚ Pasquale Minervini and Phil Blunsom
Neural Theorem Provers (NTPs) are neural relaxations of the backward-chaining logic reasoning algorithm. They can learn continuous representations for predicates and constants, induce interpretable rules, can provide logic explanations for their predictions, and show strong systematic generalisation properties. However, since they enumerate all possible proof paths for proving a goal, they suffer from high computational complexity, and are thus unsuitable for complex reasoning tasks. Conditional Theorem Provers (CTPs) try to overcome this issue by generating relevant rules on-the-fly based on the goal, rather than considering all possible rules. Nonetheless, CTPs suffer from similar computational constraints, as they still have to consider multiple proof paths while reasoning. We propose Adaptive CTPs (ACTPs), where CTPs are augmented with a learned policy to dynamically select the most promising proof paths. This allows the model designer to specify the number of proof paths to consider, to conform to the computational constraints of their use case, while retaining all of the benefits of CTPs. By evaluating on the CLUTRR dataset, we provide evidence for the computational issues in existing CTP models, show that ACTPs alleviate these issues, and demonstrate that, in certain scenarios, the accuracy achieved by ACTPs is higher than CTPs while retaining the same computational complexity.