Skip to main content
. 2022 Jul 15;2:88. doi: 10.1038/s43856-022-00157-w

Table 2.

Performance of deep learning NLP models to characterize statin nonuse from unstructured clinical notes in persons with ASCVD.

Task Dataset Precision* Recall* F1 score* AUC*
Binary classification of statin use 10-fold cross-validation (N = 1,393) 0.88 (0.86–0.90) 0.82 (0.77-0.87) 0.85 (0.83–0.87) 0.94 (0.93–0.95)
Test set (N = 349) 0.87 (0.82–0.91) 0.82 (0.76–0.88) 0.84 (0.81–0.88) 0.94 (0.93–0.96)
Two-step classifier* for statin nonuse reasons 10-fold cross-validation (N = 800) 0.63 (0.59–0.65) 0.62 (0.54–0.72) 0.62 (0.59–0.64) 0.84 (0.81–0.85)
Test set (N = 200) 0.68 (0.63–0.75) 0.69 (0.60–0.79) 0.68 (0.62–0.75) 0.88 (0.86–0.91)
Multilabel classification of statin nonuse reasons (simple mutlilabel model) 10-fold cross-validation (N = 800) 0.60 (0.58–0.64) 0.61 (0.56–0.66) 0.59 (0.56–0.63) 0.85 (0.83–0.87)
Test set (N = 200) 0.64 (0.61–0.70) 0.66 (0.60–0.73) 0.64 (0.58–0.71) 0.86 (0.82–0.89)

*The two-step classifier represents the predicted probabilities of multiple classifiers (each reason for statin nonuse versus others) reconciled by a Random Forest.

ASCVD atherosclerotic cardiovascular disease, NLP natural language processing.