Table 3.
Evaluation of different classifiers
Classifier | Segment-level |
Document-level |
||||
---|---|---|---|---|---|---|
Recall | Precision | F1 | AUROC (95% CI) | Accuracy (95% CI) | ||
AHI | ||||||
Bag-of-word models | LR | 0.4819 | 0.8383 | 0.612 | 0.9093 (0.8932–0.9254) | 87.41 (83.57–91.26) |
LASSO (L1) | 0.4819 | 0.8889 | 0.625 | 0.9169 (0.9014–0.9325) | 89.16 (85.56–92.76) | |
Ridge (L2) | 0.4802 | 0.8429 | 0.6118 | 0.9176 (0.9021–0.9331) | 87.41 (83.57–91.26) | |
SVM | 0.6093 | 0.9752 | 0.75 | 0.9050 (0.8886–0.9215) | 93.01 (90.05–95.96) | |
kNN | 0.6713 | 0.8534 | 0.7514 | 0.8644 (0.8454–0.8834) | 93.57 (90.36–96.78) | |
NaiveBayes | 0.5577 | 0.4367 | 0.4898 | 0.9179 (0.9024–0.9334) | 75.87 (70.92–80.83) | |
Random Forest | 0.6299 | 0.9865 | 0.7689 | 0.9476 (0.9350–0.9603) | 93.71 (90.89–96.52) | |
Sequence models | BiLSTM | 0.6454 | 0.9843 | 0.7796 | 0.9637 (0.9530–0.9743) | 94.06 (91.32–96.80) |
BERT | 0.747 | 0.8803 | 0.8082 | 0.9705 (0.9609–0.9802) | 95.10 (92.60–97.61) | |
ClinicalBERT | 0.7315 | 0.914 | 0.8126 | 0.9743 (0.9652–0.9833) | 94.76 (92.17–97.34) | |
SaO2 | ||||||
Bag-of-word models | LR | 0.567 | 0.4914 | 0.5265 | 0.9153 (0.8992–0.9314) | 82.87 (78.50–87.23) |
LASSO (L1) | 0.538 | 0.5103 | 0.5238 | 0.9151 (0.8990–0.9312) | 84.62 (80.43–88.80) | |
Ridge (L2) | 0.5543 | 0.4904 | 0.5204 | 0.9143 (0.8981–0.9305) | 83.22 (78.89–87.55) | |
SVM | 0.6105 | 0.9133 | 0.7318 | 0.8860 (0.8678–0.9042) | 87.76 (83.96–91.56) | |
kNN | 0.587 | 0.8663 | 0.6998 | 0.8429 (0.8223–0.8634) | 87.86 (83.84–91.88) | |
NaiveBayes | 0.6322 | 0.2705 | 0.3789 | 0.9082 (0.8915–0.9248) | 51.75 (45.96–57.54) | |
Random Forest | 0.6087 | 0.9307 | 0.736 | 0.9264 (0.9113–0.9415) | 89.51 (85.96–93.06) | |
Sequence models | BiLSTM | 0.6739 | 0.9051 | 0.7726 | 0.9274 (0.9123–0.9424) | 91.61 (88.40–94.82) |
BERT | 0.7319 | 0.8651 | 0.7929 | 0.9358 (0.9215–0.9500) | 91.61 (88.40–94.82) | |
ClinicalBERT | 0.683 | 0.8871 | 0.7718 | 0.9523 (0.9398–0.9647) | 91.61 (88.40–94.82) |
Note: Logistic Regression does not apply penalty; Lasso regression has L1 penalty (λ = 0.01); Ridge has L2 penalty (λ = 0.01). Support Vector Machine uses a polynomial kernel. kNN uses k = 3. NaiveBayes classifier uses alpha = 0.5. BiLSTM uses Word2Vec model for embedding pretrained on the training set with CBOW, input vector of 100 dimensions. BERT and ClinicalBERT are fine-tuned for 100 epochs with sequence length 32, and batch size 64. We highlight the highest F1, AUROC, and Accuracy in bold.