Skip to main content
. 2023 Mar 28;12(7):e028120. doi: 10.1161/JAHA.122.028120

Table 2.

Performance of Deep Learning NLP Models to Characterize Statin Nonuse From Unstructured Clinical Notes

Task Dataset Model Precision Recall F1‐score AUC
Binary classification of statin nonuse

Training: N=800 documents

Test: N=200 documents

word2vec+CNN Unable to do better than a constant classifier (labels everything as the majority class of the training set)
BaseBERT 0.92 (0.85–0.98) 0.90 (0.82–0.97) 0.91 (0.85–0.96) 0.96 (0.93–1.00)
BioBERT 0.87 (0.77–0.95) 0.90 (0.85–0.93) 0.88 (0.82–0.94) 0.98 (0.96–1.00)
ClinicalBERT 0.92 (0.85–0.99) 0.92 (0.86–0.98) 0.92 (0.87–0.96) 0.99 (0.98–1.00)
Multilabel classification of reasons for statin nonuse

Training: N=600 documents

Test: N=151 documents

word2vec+CNN 0.14 (0.11–0.19) 0.38 (0.15–0.44) 0.21 (0.17–0.27) 0.45 (0.40–0.52)
BaseBERT 0.59 (0.51–0.66) 0.60 (0.50–0.69) 0.59 (0.52–0.66) 0.83 (0.79–0.87)
BioBERT 0.66 (0.60–0.73) 0.66 (0.59–0.73) 0.66 (0.59–0.72) 0.87 (0.83–0.91)
ClinicalBERT 0.68 (0.62–0.77) 0.68 (0.61–0.76) 0.68 (0.61–0.76) 0.90 (0.85–0.93)

AUC indicates area under the curve; BERT, Bidirectional Encoder Representations from Transformers; and NLP, natural language processing.