Table 2.
Model metrics for the internal and external validation datasets
Dataset |
AUC, % (95% CI) |
True positive rate sensitivity, % (95% CI) |
False positive rate 1-specificity, % (95% CI) |
Number needed to screen (95% CI) |
---|---|---|---|---|
Internal validation This dataset had 600 articles, with ~ 15% being CRTs Number needed to read: 6.8a | ||||
Convolutional neural network—Word2Vec | 98.2 (96.9, 99.5) | 96.6 (92.0, 100) | 13.9 (10.7, 17.0) | 1.8 (1.6, 2.1) |
Convolutional neural network—FastText | 98.4 (97.3, 99.5) | 89.8 (83.0, 96.6) | 3.5 (2.0, 5.1) | 1.2 (1.1, 1.3) |
Support vector machines | 97.2 (95.7, 98.8) | 97.7 (94.3, 100) | 19.9 (16.4, 23.2) | 2.2 (1.9, 2.6) |
Ensemble | 98.6 (97.8, 99.4) | 97.7 (94.3, 100) | 15.0 (11.9, 18.2) | 1.9 (1.7, 2.2) |
External validation This dataset had 1916 articles, with ~ 35% being CRTs Number needed to read: 2.9a | ||||
Convolutional neural network—Word2Vec | 97.9 (97.2, 98.6) | 97.0 (95.6, 98.2) | 20.8 (18.5, 23.0) | 1.4 (1.3, 1.5) |
Convolutional neural network—FastText | 97.7 (97.0, 98.4) | 91.7 (89.8, 93.8) | 4.8 (3.7, 6.0) | 1.1 (1.1, 1.1) |
Support vector machines | 96.8 (96.0, 97.6) | 97.3 (96.1, 98.5) | 32.2 (29.7, 34.9) | 1.6 (1.6, 1.7) |
Ensemble | 97.8 (97.0, 98.5) | 97.6 (96.4, 98.6) | 21.8 (19.6, 24.1) | 1.4 (1.4, 1.5) |
aThe number needed to read was calculated as one divided by the % of articles that are CRTs
AUC Area under the receiver operating characteristic curve, CI Confidence interval