Table 2.
Task | Dataset | Model | Precision | Recall | F1‐score | AUC |
---|---|---|---|---|---|---|
Binary classification of statin nonuse |
Training: N=800 documents Test: N=200 documents |
word2vec+CNN | Unable to do better than a constant classifier (labels everything as the majority class of the training set) | |||
BaseBERT | 0.92 (0.85–0.98) | 0.90 (0.82–0.97) | 0.91 (0.85–0.96) | 0.96 (0.93–1.00) | ||
BioBERT | 0.87 (0.77–0.95) | 0.90 (0.85–0.93) | 0.88 (0.82–0.94) | 0.98 (0.96–1.00) | ||
ClinicalBERT | 0.92 (0.85–0.99) | 0.92 (0.86–0.98) | 0.92 (0.87–0.96) | 0.99 (0.98–1.00) | ||
Multilabel classification of reasons for statin nonuse |
Training: N=600 documents Test: N=151 documents |
word2vec+CNN | 0.14 (0.11–0.19) | 0.38 (0.15–0.44) | 0.21 (0.17–0.27) | 0.45 (0.40–0.52) |
BaseBERT | 0.59 (0.51–0.66) | 0.60 (0.50–0.69) | 0.59 (0.52–0.66) | 0.83 (0.79–0.87) | ||
BioBERT | 0.66 (0.60–0.73) | 0.66 (0.59–0.73) | 0.66 (0.59–0.72) | 0.87 (0.83–0.91) | ||
ClinicalBERT | 0.68 (0.62–0.77) | 0.68 (0.61–0.76) | 0.68 (0.61–0.76) | 0.90 (0.85–0.93) |
AUC indicates area under the curve; BERT, Bidirectional Encoder Representations from Transformers; and NLP, natural language processing.