Table 2.
Reference | Model type | Sensitivity, % (95% CI) |
Specificity, % (95% CI) |
PPV, % (95% CI) |
NPV, % (95% CI) |
Calculated performance measures, accuracy, % (95% CI) | Internal validation (size and performance) | External validation (size and performance) | Distribution of development and validation set |
---|---|---|---|---|---|---|---|---|---|
Xu, 2019 (7) | CART | 94.2 (90.1 to 98.4) | 98.3 (97.2 to 99.5) | 93.4 (89.1 to 97.8) | 98.5 (97.4 to 99.6) | 97.5 (95.8 to 98.5)a | Yes (—) | — | 60-40 |
Rasmussen, 2019 (15) | Rules | 97.3 (93 to 99) | 97.2 (95 to 99) | 94.2 | 98.7 | 97.2 (95.3 to 98.4)a | — | — | — |
Cronin-Fenton, 2018 (12) | Rules | 88.1 (75.9 to 95.3) | 87.6 (80.6 to 92.7) | 72.5 (59.3 to 83.3) | 95.2 (89.8 to 98.1) | 87.7 (81.6 to 92.1)a | — | — | — |
Ritzwoller, 2018 (16) | Logistic regression | 80.5 (77.5 to 87.7) | 97.3 (91.9 to 98.1) | 70.0 (44.2 to 77.7) | 98.5 (98.2 to 99.0) | 96.1 (95.4 to 96.7) | Yes (3370; AUROC of validation set: 0.96 [0.94 to 0.97]) | Yes (3961; AUROC of validation set: 0.90 [0.87 to 0.93]) | 50-50 |
Chubak, 2012 (17), | CART | 94 (90 to 97) | 92 (91 to 94) | 58 (52 to 63) | 99 (99 to 100) | 92.2 (90.9 to 93.3) | Yes (—) | — | 60-40 |
Chubak, 2017 (18) | CART | 94 (90 to 97) | 92 (91 to 94) | 58 (52 to 63) | 99 (99 to 100) | 92.2 (90.9 to 93.3) | — | — | — |
Kroenke, 2016 (5) | CART | LACE cohort: 88.5 | LACE cohort: 94.6 | LACE cohort: 76.4 | LACE cohort: 97.7 | 93.6 (92.3 to 94.7) | — | Validates algorithms for a previous dataset published by Chubak et al. (18) | — |
WHI cohort: 86.7 | WHI cohort: 92.3 | WHI cohort: 63.4 | WHI cohort: 97.8 | 91.6 (87.1 to 94.6) | |||||
Haque, 2015 (13) | Rules | 96.8 (87.6 to 99.4) | 93.0 (85.1 to 96.1) | 88.2 (75.9 to 93.4) | 98.1 (92.8 to 99.7) | 94.3 (89.7 to 97.0) | Yes 500; sensitivity: 96.9% (88.4% to 99.5%); specificity: 92.4% (89.4% to 94.6%); PPV: 65.6% (55.2% to 74.8%); NPV: 99.5% (98.0% to 99.9%) | — | — |
Liede, 2015 (6) | Rules | 77.5 (73.9 to 81.1) | 98.1 (97.8 to 98.4) | 72.1 (68.4 to 75.8) | 98.6 (98.3 to 98.8) | 96.9 (96.5 to 97.2) | — | — | — |
Lamont, 2006 (19) | Rules | 92 (66 to 100) | 94 (82 to 99) | — | — | 93.3 (81.5 to 98.4) | — | — | — |
Nordstrom, 2012 (8) | CART and random forestsb | 62 | 97 | 75 | 95 | 92.6 (91.1 to 93.9)a | Yes (—) | — | 60-40 |
Nordstrom, 2015 (9) | Random forestsb | 47.1 | 95.9 | 28.6 | 98.1 | 94.2 (91.8 to 96.0) | Yes (—) | — | — |
Whyte, 2015 (11) | Rules | 77.19 | 79.54 | 30.91 | 97.71 | 79.3 (78.1 to 80.5)a | — | Validates 28 different algorithms used to identify metastatic cancers (algorithms not published) | — |
Chawla, 2014 (20) | Rules | 43.9 | 98.6 | 93.8 | 78.5 | 80.8 (80.3 to 81.3) | — | — | — |
Hassett, 2014 (21) | Rules | 81 (67 to 90) | 78 (76 to 80) | 20 | — | 78.1 (76.4 to 79.6) | — | — | — |
Sathiakumar, 2017 (10) | Rules | 96.8 (83.8 to 99.4) | 98.6 (95.9 to 99.5) | 90.9 (76.4 to 96.9) | 99.5 (97.3 to 99.9) | 98.3 (95.6 to 99.5)a | — | — | — |
McClish, 2003 (14) | Logistic regression | — | — | — | — | —a | Yes (—) | — | — |
Studies reporting performance measures: Xu et al. (2019): accuracy = 97.5% (96.2%–98.7%); Ritzwoller et al. (2019): AUROC = 0.96 (0.94–0.97); Rasmussen et al. (2019): kappa = 0.94 (0.90–0.97); Nordstrom et al. (2012): AUROC = 0.82; Whyte et al. (2015): accuracy: 79.3%; Sathiakumar et al. (2017): kappa = 0.93 (0.80–1.05); McClish et al. (2003): AUROC = 0.90; — = not reported. AUROC = area under the receiver-operator characteristics curve; CART = classification, regression and decision tree; CI = confidence interval; LACE = Life After Cancer Epidemiology; NPV = negative predictive value; PPV = positive predictive value; WHI = Women's Health Initiative.
CART is a decision tree built on a dataset, whereas random forests are a collection of decision trees which randomly selects variables. CART and random trees are categorized together as model-based approaches.