Table 2.
Task | Dataset | Precision* | Recall* | F1 score* | AUC* |
---|---|---|---|---|---|
Binary classification of statin use | 10-fold cross-validation (N = 1,393) | 0.88 (0.86–0.90) | 0.82 (0.77-0.87) | 0.85 (0.83–0.87) | 0.94 (0.93–0.95) |
Test set (N = 349) | 0.87 (0.82–0.91) | 0.82 (0.76–0.88) | 0.84 (0.81–0.88) | 0.94 (0.93–0.96) | |
Two-step classifier* for statin nonuse reasons | 10-fold cross-validation (N = 800) | 0.63 (0.59–0.65) | 0.62 (0.54–0.72) | 0.62 (0.59–0.64) | 0.84 (0.81–0.85) |
Test set (N = 200) | 0.68 (0.63–0.75) | 0.69 (0.60–0.79) | 0.68 (0.62–0.75) | 0.88 (0.86–0.91) | |
Multilabel classification of statin nonuse reasons (simple mutlilabel model) | 10-fold cross-validation (N = 800) | 0.60 (0.58–0.64) | 0.61 (0.56–0.66) | 0.59 (0.56–0.63) | 0.85 (0.83–0.87) |
Test set (N = 200) | 0.64 (0.61–0.70) | 0.66 (0.60–0.73) | 0.64 (0.58–0.71) | 0.86 (0.82–0.89) |
*The two-step classifier represents the predicted probabilities of multiple classifiers (each reason for statin nonuse versus others) reconciled by a Random Forest.
ASCVD atherosclerotic cardiovascular disease, NLP natural language processing.