Table 5.
Theta scores and area under curve percentiles for LSTM trained on SNLI and tested on GSIRT. We also report the accuracy for the same LSTM tested on all SNLI quality control items (see Section 3.1). All performance is based on binary classification for each label.
Item Set | Theta Score | Percentile | Test Acc. |
---|---|---|---|
5GS | |||
Entailment | −0.133 | 44.83% | 96.5% |
Contradiction | 1.539 | 93.82% | 87.9% |
Neutral | 0.423 | 66.28% | 88% |
4GS | |||
Contradiction | 1.777 | 96.25% | 78.9% |
Neutral | 0.441 | 67% | 83% |