Skip to main content
. Author manuscript; available in PMC: 2017 Oct 26.
Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2017 Apr;1:11–21.

Table 1.

Training and test accuracy on natural language inference task. d is the word embedding size and |θ|M the number of model parameters.

Model d |θ|M Train Test

Classifier with handcrafted features (Bowman et al., 2015a) - - 99.7 78.2

LSTMs encoders (Bowman et al., 2015a) 300 3.0M 83.9 80.6
Dependency Tree CNN encoders (Mou et al., 2016) 300 3.5M 83.3 82.1
NTI-SLSTM (Ours) 300 3.3M 83.9 82.4
SPINN-PI encoders (Bowman et al., 2016) 300 3.7M 89.2 83.2
NTI-SLSTM-LSTM (Ours) 300 4.0M 82.5 83.4

LSTMs attention (Rocktäschel et al., 2016) 100 242K 85.4 82.3
LSTMs word-by-word attention (Rocktäschel et al., 2016) 100 250K 85.3 83.5
NTI-SLSTM node-by-node global attention (Ours) 300 3.5M 85.0 84.2
NTI-SLSTM node-by-node tree attention (Ours) 300 3.5M 86.0 84.3
NTI-SLSTM-LSTM node-by-node tree attention (Ours) 300 4.2M 88.1 85.7
NTI-SLSTM-LSTM node-by-node global attention (Ours) 300 4.2M 87.6 85.9
mLSTM word-by-word attention (Wang and Jiang, 2016) 300 1.9M 92.0 86.1
LSTMN with deep attention fusion (Cheng et al., 2016) 450 3.4M 88.5 86.3
Tree matching NTI-SLSTM-LSTM tree attention (Ours) 300 3.2M 87.3 86.4
Decomposable Attention Model (Parikh et al., 2016) 200 580K 90.5 86.8
Tree matching NTI-SLSTM-LSTM global attention (Ours) 300 3.2M 87.6 87.1
Full tree matching NTI-SLSTM-LSTM global attention (Ours) 300 3.2M 88.5 87.3