Skip to main content
. 2020 Mar 4;2019:794–803.

Table 3:

Machine learning models’ performances (average scores) on the test sets without balancing

Models Feature sets* Sensitivity (p-value)# Specificity (p-value)# PPV NPV F0.5 (p-value)#
Random NO NLP 0.70 0.95 0.46 0.98 0.47
Forest ND BIN 0.58 (<0.001) 0.97 (<0.001) 0.54 0.98 0.54 (<0.001)
ND FRE 0.59 (<0.001) 0.97 (<0.001) 0.53 0.98 0.53 (<0.001)
ND NOR 0.59 (<0.001) 0.97 (<0.001) 0.53 0.98 0.52 (<0.001)
ND ALL 0.57 (<0.001) 0.97 (<0.001) 0.53 0.98 0.52 (<0.001)
NM BIN 0.59 (<0.001) 0.96 (<0.001) 0.51 0.98 0.50 (0.99)
NM FRE 0.70 (0.56) 0.94 (0.02) 0.45 0.98 0.46 (0.08)
NM NOR 0.69 (0.65) 0.95 (0.81) 0.46 0.98 0.47 (0.93)
NM ALL 0.58 (<0.001) 0.96 (<0.001) 0.51 0.98 0.50 (0.73)
ND NM 0.59 (<0.001) 0.96 (<0.001) 0.52 0.98 0.51 (0.14)
KW BIN 0.58 (<0.001) 0.96 (<0.001) 0.51 0.98 0.50 (0.63)
KW FRE 0.70 (0.32) 0.95 (0.29) 0.45 0.98 0.47 (0.57)
KW NOR 0.60 (<0.001) 0.96 (<0.001) 0.52 0.98 0.51 (0.06)
KW ALL 0.59 (<0.001) 0.96 (<0.001) 0.51 0.98 0.50 (0.84)
SVM NO NLP 0.80 0.91 0.32 0.99 0.37
ND BIN 0.80 (0.48) 0.91 (0.61) 0.33 0.99 0.37 (0.63)
ND FRE 0.79 (0.65) 0.91 (0.04) 0.33 0.99 0.38 (0.11)
ND NOR 0.80 (1.00) 0.91 (0.09) 0.32 0.99 0.36 (1.00)
ND ALL 0.80 (1.00) 0.91 (0.68) 0.32 0.99 0.37 (0.01)
NM BIN 0.81 (0.53) 0.91 (0.60) 0.33 0.99 0.37 (0.01)
NM FRE 0.78 (0.06) 0.91 (0.10) 0.33 0.99 0.37 (0.15)
NM NOR 0.80 (1.00) 0.92 (<0.001) 0.34 0.99 0.38 (1.00)
NM ALL 0.81 (0.39) 0.92 (0.16) 0.34 0.99 0.38 (0.01)
ND NM 0.81 (0.49) 0.91 (0.21) 0.34 0.99 0.38 (0.01)
KW BIN 0.80 (0.81) 0.91 (0.48) 0.32 0.99 0.36 (0.62)
KW FRE 0.79 (0.26) 0.92 (<0.001) 0.35 0.99 0.39 (0.01)
KW NOR 0.79 (0.26) 0.91 (0.05) 0.33 0.99 0.38 (0.34)
KW ALL 0.80 (1.00) 0.91 (0.86) 0.32 0.99 0.36 (0.86)
Logistic NO NLP 0.77 0.95 0.43 0.99 0.48
Regression ND BIN 0.78 (0.18) 0.95 (1.00) 0.44 0.99 0.48 (0.92)
ND FRE 0.77 (0.32) 0.95 (0.04) 0.45 0.99 0.49 (0.15)
ND NOR 0.77 (1.00) 0.95 (1.00) 0.43 0.99 0.48 (0.15)
ND ALL 0.79 (0.10) 0.94 (<0.001) 0.40 0.99 0.45 (0.70)
NM BIN 0.79 (0.18) 0.94 (<0.001) 0.40 0.99 0.44 (0.47)
NM FRE 0.78 (0.65) 0.95 (<0.001) 0.45 0.99 0.49 (0.60)
NM NOR 0.77 (1.00) 0.95 (1.00) 0.43 0.99 0.48 (0.01)
NM ALL 0.79 (0.18) 0.94 (<0.001) 0.40 0.99 0.44 (0.11)
ND NM 0.79 (0.11) 0.94 (<0.001) 0.40 0.99 0.44 (0.17)
KW BIN 0.79 (0.17) 0.93 (<0.001) 0.39 0.99 0.43 (<0.001)
KW FRE 0.77 (1.00) 0.95 (1.00) 0.43 0.99 0.48 (1.00)
KW NOR 0.77 (1.00) 0.95 (1.00) 0.43 0.99 0.48 (1.00)
KW ALL 0.79 (0.17) 0.93 (<0.001) 0.39 0.99 0.43 (<0.001)

* The abbreviations of feature sets are explained in Table 2. NO NLP, structured data features alone.

ND, NLP document-level features; NM, NLP mention-level features; BIN, binary features;

FRE, frequency features; NOR, normalized frequency features; ALL, use all document or mention features.

# Computed against the models using structured data alone.