Skip to main content
. 2022 Jun 11;5(2):ooac045. doi: 10.1093/jamiaopen/ooac045

Table 3.

Evaluation of different classifiers

Classifier Segment-level
Document-level
Recall Precision F1 AUROC (95% CI) Accuracy (95% CI)
AHI
Bag-of-word models LR 0.4819 0.8383 0.612 0.9093 (0.8932–0.9254) 87.41 (83.57–91.26)
LASSO (L1) 0.4819 0.8889 0.625 0.9169 (0.9014–0.9325) 89.16 (85.56–92.76)
Ridge (L2) 0.4802 0.8429 0.6118 0.9176 (0.9021–0.9331) 87.41 (83.57–91.26)
SVM 0.6093 0.9752 0.75 0.9050 (0.8886–0.9215) 93.01 (90.05–95.96)
kNN 0.6713 0.8534 0.7514 0.8644 (0.8454–0.8834) 93.57 (90.36–96.78)
NaiveBayes 0.5577 0.4367 0.4898 0.9179 (0.9024–0.9334) 75.87 (70.92–80.83)
Random Forest 0.6299 0.9865 0.7689 0.9476 (0.9350–0.9603) 93.71 (90.89–96.52)
Sequence models BiLSTM 0.6454 0.9843 0.7796 0.9637 (0.9530–0.9743) 94.06 (91.32–96.80)
BERT 0.747 0.8803 0.8082 0.9705 (0.9609–0.9802) 95.10 (92.60–97.61)
ClinicalBERT 0.7315 0.914 0.8126 0.9743 (0.9652–0.9833) 94.76 (92.17–97.34)
SaO2
Bag-of-word models LR 0.567 0.4914 0.5265 0.9153 (0.8992–0.9314) 82.87 (78.50–87.23)
LASSO (L1) 0.538 0.5103 0.5238 0.9151 (0.8990–0.9312) 84.62 (80.43–88.80)
Ridge (L2) 0.5543 0.4904 0.5204 0.9143 (0.8981–0.9305) 83.22 (78.89–87.55)
SVM 0.6105 0.9133 0.7318 0.8860 (0.8678–0.9042) 87.76 (83.96–91.56)
kNN 0.587 0.8663 0.6998 0.8429 (0.8223–0.8634) 87.86 (83.84–91.88)
NaiveBayes 0.6322 0.2705 0.3789 0.9082 (0.8915–0.9248) 51.75 (45.96–57.54)
Random Forest 0.6087 0.9307 0.736 0.9264 (0.9113–0.9415) 89.51 (85.96–93.06)
Sequence models BiLSTM 0.6739 0.9051 0.7726 0.9274 (0.9123–0.9424) 91.61 (88.40–94.82)
BERT 0.7319 0.8651 0.7929 0.9358 (0.9215–0.9500) 91.61 (88.40–94.82)
ClinicalBERT 0.683 0.8871 0.7718 0.9523 (0.9398–0.9647) 91.61 (88.40–94.82)

Note: Logistic Regression does not apply penalty; Lasso regression has L1 penalty (λ = 0.01); Ridge has L2 penalty (λ = 0.01). Support Vector Machine uses a polynomial kernel. kNN uses k = 3. NaiveBayes classifier uses alpha = 0.5. BiLSTM uses Word2Vec model for embedding pretrained on the training set with CBOW, input vector of 100 dimensions. BERT and ClinicalBERT are fine-tuned for 100 epochs with sequence length 32, and batch size 64. We highlight the highest F1, AUROC, and Accuracy in bold.