Skip to main content
. 2021 Oct 20;26:100467. doi: 10.1016/j.invent.2021.100467

Table 5.

Macro-average model performances on 5-fold cross validation on 80% of our annotated data and performances of models trained on our training set (80% of our annotated data) and evaluated on our test set (20% of our annotated data).

Features Cross validation
Precision Recall Macro-F1 Accuracy AUC
N-Gram L + D 0.82 (±0.11) 0.81 (±0.09) 0.81 (±0.09) 0.81 (±0.09) 0.91 (±0.12)
N-Gram L + D 0.82 (±0.10) 0.81 (±0.09) 0.81 (±0.08) 0.81 (±0.09) 0.89 (±0.12)
TFIDF L + D 0.84 (±0.04) 0.83 (±0.04) 0.82 (±0.05) 0.83 (±0.04) 0.92 (±0.06)
BERT 0.82 (±0.05) 0.78 (±0.06) 0.78 (±0.06) 0.81 (±0.05) 0.87 (±0.04)
Baseline 0.75 (±0.08) 0.75 (±0.09) 0.71 (±0.10) 0.72 (±0.10) 0.86 (±0.09)



Features Test set
Precision Recall Macro-F1 Accuracy AUC
N-Gram L + D 0.73 0.76 0.74 0.76 0.81
N-Gram L + D 0.71 0.73 0.72 0.74 0.78
TFIDF L + D 0.72 0.74 0.73 0.76 0.79
BERT 0.76 0.79 0.76 0.77 0.83
Baseline 0.64 0.66 0.61 0.62 0.66

LR — logistic regression, SVM — linear support vector machine, RF — random forest, LSTM NN — long short-term neural network.

L + D — lemmatized and debiased (see Supplement section “Debiasing” for more information about debiasing).