Skip to main content
. 2023 May 9;195(8):713–719. doi: 10.1055/a-2061-6562

Table 1. Overview table of AUC values of various feature extraction methods used to train different ML algorithms and evaluated with 10-fold cross-validation on the training dataset. : BOW: bag of words; LDA: Latent Dirichlet allocation; LR: Logistic regression; NMF: Non-negative matrix factorization; NN: Neural network; PCA: Principal component analysis; SVM: Support vector machine; TF-IDF: Term frequency-inverse document frequency.

NN SVM LR Average AUC
Dummy 0.5
BOW 0.99 0.97 0.97 0.977
TF-IDF 0.99 0.96 0.96 0.970
NMF 0.98 0.9 0.9 0.927
PCA 0.95 0.91 0.9 0.920
LDA 0.94 0.89 0.88 0.903
Doc2Vec 0.94 0.9 0.85 0.897