Table 2.
Validation | Task | Models | AUC | F-measure | PPV | Sensibility | Specificity | TN | FP | FN | TP |
---|---|---|---|---|---|---|---|---|---|---|---|
Internal validation | ADR identification |
TF-IDF + LGBM |
0.97 (0.96–0.97) |
0.80 (0.78–0.81) |
0.85 (0.83–0.87) |
0.75 (0.73–0.78) |
1 (1–1) |
353,938 (345,353–360,293) |
703 (612–842) |
1329 (1194–1438) |
4028 (3857–4271) |
XLM | 0.97 (0.96–0.97) |
0.78 (0.76–0.79) |
0.84 (0.82–0.86) |
0.73 (0.70–0.75) |
1 (1–1) |
353,854 (353,563–354,099) |
736 (609–883) |
1469 (1314–1592) |
3916 (3702–4131) |
||
Seriousness assessment |
FastText + LGBM |
0.85 (0.82–0.87) |
0.63 (0.59–0.68) |
0.58 (0.52–0.69) |
0.69 (0.60–0.82) |
0.85 (0.77–0.91) |
629 (559–682) |
110 (62–166) |
69 (43–94) |
156 (129–194) |
|
CamemBERT + LGBM |
0.84 (0.81–0.87) |
0.63 (0.57–0.67) |
0.56 (0.49–0.65) |
0.71 (0.57–0.81) |
0.83 (0.75–0.90) |
615 (542–672) |
126 (72–183) |
65 (44–96) |
160 (125–192) |
||
External validation | ADR identification |
TF-IDF + LGBM |
0.97 (0.97–0.97) |
0.82 (0.81–0.82) |
0.88 (0.86–0.89) |
0.76 (0.75–0.78) |
1 (1–1) |
287,770 (287,640–287,896) |
502 (444–573) |
1128 (1054–1198) |
3631 (3530–3751) |
XLM |
0.97 (0.97–0.97) |
0.80 (0.79–0.80) |
0.87 (0.86–0.88) |
0.74 (0.73–0.75) |
1 (1–1) |
288,717 (288,604–288,837) |
530 (476–558) |
1256 (1208–1310) |
3527 (3434–3602) |
||
Seriousness assessment |
FastText + LGBM |
0.87 (0.85–0.89) |
0.65 (0.60–0.70) |
0.58 (0.49–0.69) |
0.77 (0.60–0.88) |
0.85 (0.75–0.92) |
274 (244–299) |
50 (25–80) |
21 (11–36) |
69 (54–79) |
|
CamemBERT + LGBM |
0.86 (0.83–0.89) |
0.63 (0.59–0.68) |
0.56 (0.49–0.67) |
0.74 (0.59–0.84) |
0.84 (0.76–0.91) |
271 (246–294) |
53 (30–78) |
23 (14–37) |
67 (53–76) |
ADR adverse drug reactions, AUC area under the curve, FP false positive, FN false negative, LGBM Light Gradient Boosted Machine, PPV positive predictive value, TF-IDF Term Frequency-Inverse Document Frequency, TN true negative, TP true positive, XLM Cross-lingual Language Model