Table 1.
Training and test accuracy on natural language inference task. d is the word embedding size and |θ|M the number of model parameters.
Model | d | |θ|M | Train | Test |
---|---|---|---|---|
| ||||
Classifier with handcrafted features (Bowman et al., 2015a) | - | - | 99.7 | 78.2 |
| ||||
LSTMs encoders (Bowman et al., 2015a) | 300 | 3.0M | 83.9 | 80.6 |
Dependency Tree CNN encoders (Mou et al., 2016) | 300 | 3.5M | 83.3 | 82.1 |
NTI-SLSTM (Ours) | 300 | 3.3M | 83.9 | 82.4 |
SPINN-PI encoders (Bowman et al., 2016) | 300 | 3.7M | 89.2 | 83.2 |
NTI-SLSTM-LSTM (Ours) | 300 | 4.0M | 82.5 | 83.4 |
| ||||
LSTMs attention (Rocktäschel et al., 2016) | 100 | 242K | 85.4 | 82.3 |
LSTMs word-by-word attention (Rocktäschel et al., 2016) | 100 | 250K | 85.3 | 83.5 |
NTI-SLSTM node-by-node global attention (Ours) | 300 | 3.5M | 85.0 | 84.2 |
NTI-SLSTM node-by-node tree attention (Ours) | 300 | 3.5M | 86.0 | 84.3 |
NTI-SLSTM-LSTM node-by-node tree attention (Ours) | 300 | 4.2M | 88.1 | 85.7 |
NTI-SLSTM-LSTM node-by-node global attention (Ours) | 300 | 4.2M | 87.6 | 85.9 |
mLSTM word-by-word attention (Wang and Jiang, 2016) | 300 | 1.9M | 92.0 | 86.1 |
LSTMN with deep attention fusion (Cheng et al., 2016) | 450 | 3.4M | 88.5 | 86.3 |
Tree matching NTI-SLSTM-LSTM tree attention (Ours) | 300 | 3.2M | 87.3 | 86.4 |
Decomposable Attention Model (Parikh et al., 2016) | 200 | 580K | 90.5 | 86.8 |
Tree matching NTI-SLSTM-LSTM global attention (Ours) | 300 | 3.2M | 87.6 | 87.1 |
Full tree matching NTI-SLSTM-LSTM global attention (Ours) | 300 | 3.2M | 88.5 | 87.3 |