. Author manuscript; available in PMC: 2017 Oct 26.

Published in final edited form as: Proc Conf Assoc Comput Linguist Meet. 2017 Apr;1:11–21.

Table 1.

Training and test accuracy on natural language inference task. d is the word embedding size and |θ|_M the number of model parameters.

Model	d	\|θ\|_M	Train	Test

Classifier with handcrafted features (Bowman et al., 2015a)	-	-	99.7	78.2

LSTMs encoders (Bowman et al., 2015a)	300	3.0M	83.9	80.6
Dependency Tree CNN encoders (Mou et al., 2016)	300	3.5M	83.3	82.1
NTI-SLSTM (Ours)	300	3.3M	83.9	82.4
SPINN-PI encoders (Bowman et al., 2016)	300	3.7M	89.2	83.2
NTI-SLSTM-LSTM (Ours)	300	4.0M	82.5	83.4

LSTMs attention (Rocktäschel et al., 2016)	100	242K	85.4	82.3
LSTMs word-by-word attention (Rocktäschel et al., 2016)	100	250K	85.3	83.5
NTI-SLSTM node-by-node global attention (Ours)	300	3.5M	85.0	84.2
NTI-SLSTM node-by-node tree attention (Ours)	300	3.5M	86.0	84.3
NTI-SLSTM-LSTM node-by-node tree attention (Ours)	300	4.2M	88.1	85.7
NTI-SLSTM-LSTM node-by-node global attention (Ours)	300	4.2M	87.6	85.9
mLSTM word-by-word attention (Wang and Jiang, 2016)	300	1.9M	92.0	86.1
LSTMN with deep attention fusion (Cheng et al., 2016)	450	3.4M	88.5	86.3
Tree matching NTI-SLSTM-LSTM tree attention (Ours)	300	3.2M	87.3	86.4
Decomposable Attention Model (Parikh et al., 2016)	200	580K	90.5	86.8
Tree matching NTI-SLSTM-LSTM global attention (Ours)	300	3.2M	87.6	87.1
Full tree matching NTI-SLSTM-LSTM global attention (Ours)	300	3.2M	88.5	87.3