Gaussian Transformer: A Lightweight Approach for Natural Language Inference
Natural Language Inference (NLI) is an active research area, where numerous approaches based on recurrent neural networks (RNNs), convolutional neural networks (CNNs), and self-attention networks (SANs) has been proposed. Although obtaining impressive performance, previous recurrent approaches are hard to train in parallel; convolutional models tend to cost more parameters, while self-attention networks are not good at capturing local dependency of texts. To address this problem, we introduce a Gaussian prior to selfattention mechanism, for better modeling the local structure of sentences. Then we propose an efficient RNN/CNN-free architecture named Gaussian Transformer for NLI, which consists of encoding blocks modeling both local and global dependency, high-order interaction blocks collecting the evidence of multi-step inference, and a lightweight comparison block saving lots of parameters. Experiments show that our model achieves new state-of-the-art performance on both SNLI and MultiNLI benchmarks with significantly fewer parameters and considerably less training time. Besides, evaluation using the Hard NLI datasets demonstrates that our approach is less affected by the undesirable annotation artifacts.