Skip to main content
. 2020 Nov 11;3:148. doi: 10.1038/s41746-020-00354-8

Table 6.

Prediction ability of the reference and four machine-learning-based prediction models for HNHC patients using part of clinical and claims data.

Outcome c-statistics P-valueb Sensitivity Specificity PPV NPV PLR NLR
The prediction model using only clinical data collected from the screening program
 Reference modela 0.71 (0.70–0.72) [Reference] 0.56 (0.54–0.58) 0.75 (0.75–0.75) 0.11 (0.10–0.11) 0.97 (0.97–0.97) 2.2 (2.1–2.3) 0.59 (0.56–0.62)
 Logistic regression with Lasso regularization 0.71 (0.70–0.72) 0.99 0.54 (0.52–0.56) 0.77 (0.77–0.78) 0.11 (0.10–0.12) 0.97 (0.97–0.97) 2.4 (2.3–2.5) 0.60 (0.57–0.63)
 Random forest 0.74 (0.73–0.75) 0.001 0.64 (0.62–0.66) 0.71 (0.71–0.72) 0.11 (0.10–0.11) 0.97 (0.97–0.98) 2.2 (2.1–2.3) 0.51 (0.48–0.54)
 Gradient-boosted decision tree 0.72 (0.70–0.73) 0.41 0.62 (0.60–0.65) 0.70 (0.69–0.70) 0.10 (0.09–0.10) 0.97 (0.97–0.97) 2.1 (2.0–2.1) 0.54 (0.51–0.57)
 Deep neural network 0.72 (0.70–0.73) 0.39 0.53 (0.51–0.56) 0.79 (0.79–0.80) 0.12 (0.11–0.13) 0.97 (0.97–0.97) 2.6 (2.4–2.7) 0.59 (0.56–0.62)
The prediction model using only patient age, gender, and healthcare cost data from claims data
 Reference modela 0.82 (0.81–0.83) [Reference] 0.68 (0.66–0.70) 0.84 (0.84–0.84) 0.18 (0.17–0.19) 0.98 (0.98–0.98) 4.2 (4.1–4.4) 0.38 (0.36–0.41)
 Logistic regression with Lasso regularization 0.82 (0.81–0.83) 0.99 0.68 (0.66–0.70) 0.84 (0.84–0.85) 0.18 (0.18–0.19) 0.98 (0.98–0.98) 4.3 (4.1–4.5) 0.38 (0.36–0.41)
 Random forest 0.82 (0.80–0.83) 0.53 0.63 (0.61–0.65) 0.88 (0.87–0.88) 0.21 (0.20–0.22) 0.98 (0.98–0.98) 5.1 (4.9–5.4) 0.42 (0.40–0.45)
 Gradient-boosted decision tree 0.84 (0.83–0.85) 0.02 0.67 (0.64–0.69) 0.89 (0.89–0.89) 0.24 (0.23–0.25) 0.98 (0.98–0.98) 6.0 (5.7–6.2) 0.38 (0.35–0.40)
 Deep neural network 0.84 (0.83–0.85) 0.02 0.69 (0.67–0.72) 0.86 (0.86–0.87) 0.21 (0.20–0.22) 0.98 (0.98–0.98) 5.1 (4.9–5.3) 0.35 (0.33–0.38)

PPV positive predictive value, NPV negative predictive value, PLR positive likelihood ratio, NLR negative likelihood ratio.

aWe used a non-penalized logistic regression model as the reference model.

bWe compared the area under the curve between each machine-learning-based prediction model and the logistic regression model (the reference model) using the DeLong’s test.