Skip to main content
. 2022 May 17;45(5):535–548. doi: 10.1007/s40264-022-01153-8

Table 2.

Model comparison metrics, using the prediction threshold maximising F-measure

Validation Task Models AUC F-measure PPV Sensibility Specificity TN FP FN TP
Internal validation ADR identification

TF-IDF

+ LGBM

0.97 (0.96–0.97)

0.80

(0.78–0.81)

0.85

(0.83–0.87)

0.75

(0.73–0.78)

1

(1–1)

353,938

(345,353–360,293)

703

(612–842)

1329

(1194–1438)

4028 (3857–4271)
XLM 0.97 (0.96–0.97)

0.78

(0.76–0.79)

0.84

(0.82–0.86)

0.73

(0.70–0.75)

1

(1–1)

353,854

(353,563–354,099)

736

(609–883)

1469

(1314–1592)

3916

(3702–4131)

Seriousness assessment

FastText

+ LGBM

0.85

(0.82–0.87)

0.63

(0.59–0.68)

0.58

(0.52–0.69)

0.69

(0.60–0.82)

0.85

(0.77–0.91)

629

(559–682)

110

(62–166)

69

(43–94)

156

(129–194)

CamemBERT + LGBM

0.84

(0.81–0.87)

0.63

(0.57–0.67)

0.56

(0.49–0.65)

0.71

(0.57–0.81)

0.83

(0.75–0.90)

615

(542–672)

126

(72–183)

65

(44–96)

160

(125–192)

External validation ADR identification

TF-IDF

+ LGBM

0.97

(0.97–0.97)

0.82

(0.81–0.82)

0.88 (0.86–0.89)

0.76

(0.75–0.78)

1

(1–1)

287,770

(287,640–287,896)

502

(444–573)

1128

(1054–1198)

3631

(3530–3751)

XLM

0.97

(0.97–0.97)

0.80

(0.79–0.80)

0.87 (0.86–0.88)

0.74

(0.73–0.75)

1

(1–1)

288,717

(288,604–288,837)

530

(476–558)

1256

(1208–1310)

3527

(3434–3602)

Seriousness assessment

FastText

+ LGBM

0.87

(0.85–0.89)

0.65

(0.60–0.70)

0.58 (0.49–0.69)

0.77

(0.60–0.88)

0.85

(0.75–0.92)

274

(244–299)

50

(25–80)

21

(11–36)

69

(54–79)

CamemBERT + LGBM

0.86

(0.83–0.89)

0.63

(0.59–0.68)

0.56 (0.49–0.67)

0.74

(0.59–0.84)

0.84

(0.76–0.91)

271

(246–294)

53

(30–78)

23

(14–37)

67

(53–76)

ADR adverse drug reactions, AUC area under the curve, FP false positive, FN false negative, LGBM Light Gradient Boosted Machine, PPV positive predictive value, TF-IDF Term Frequency-Inverse Document Frequency, TN true negative, TP true positive, XLM Cross-lingual Language Model