Table 6.
Prediction ability of the reference and four machine-learning-based prediction models for HNHC patients using part of clinical and claims data.
Outcome | c-statistics | P-valueb | Sensitivity | Specificity | PPV | NPV | PLR | NLR |
---|---|---|---|---|---|---|---|---|
The prediction model using only clinical data collected from the screening program | ||||||||
Reference modela | 0.71 (0.70–0.72) | [Reference] | 0.56 (0.54–0.58) | 0.75 (0.75–0.75) | 0.11 (0.10–0.11) | 0.97 (0.97–0.97) | 2.2 (2.1–2.3) | 0.59 (0.56–0.62) |
Logistic regression with Lasso regularization | 0.71 (0.70–0.72) | 0.99 | 0.54 (0.52–0.56) | 0.77 (0.77–0.78) | 0.11 (0.10–0.12) | 0.97 (0.97–0.97) | 2.4 (2.3–2.5) | 0.60 (0.57–0.63) |
Random forest | 0.74 (0.73–0.75) | 0.001 | 0.64 (0.62–0.66) | 0.71 (0.71–0.72) | 0.11 (0.10–0.11) | 0.97 (0.97–0.98) | 2.2 (2.1–2.3) | 0.51 (0.48–0.54) |
Gradient-boosted decision tree | 0.72 (0.70–0.73) | 0.41 | 0.62 (0.60–0.65) | 0.70 (0.69–0.70) | 0.10 (0.09–0.10) | 0.97 (0.97–0.97) | 2.1 (2.0–2.1) | 0.54 (0.51–0.57) |
Deep neural network | 0.72 (0.70–0.73) | 0.39 | 0.53 (0.51–0.56) | 0.79 (0.79–0.80) | 0.12 (0.11–0.13) | 0.97 (0.97–0.97) | 2.6 (2.4–2.7) | 0.59 (0.56–0.62) |
The prediction model using only patient age, gender, and healthcare cost data from claims data | ||||||||
Reference modela | 0.82 (0.81–0.83) | [Reference] | 0.68 (0.66–0.70) | 0.84 (0.84–0.84) | 0.18 (0.17–0.19) | 0.98 (0.98–0.98) | 4.2 (4.1–4.4) | 0.38 (0.36–0.41) |
Logistic regression with Lasso regularization | 0.82 (0.81–0.83) | 0.99 | 0.68 (0.66–0.70) | 0.84 (0.84–0.85) | 0.18 (0.18–0.19) | 0.98 (0.98–0.98) | 4.3 (4.1–4.5) | 0.38 (0.36–0.41) |
Random forest | 0.82 (0.80–0.83) | 0.53 | 0.63 (0.61–0.65) | 0.88 (0.87–0.88) | 0.21 (0.20–0.22) | 0.98 (0.98–0.98) | 5.1 (4.9–5.4) | 0.42 (0.40–0.45) |
Gradient-boosted decision tree | 0.84 (0.83–0.85) | 0.02 | 0.67 (0.64–0.69) | 0.89 (0.89–0.89) | 0.24 (0.23–0.25) | 0.98 (0.98–0.98) | 6.0 (5.7–6.2) | 0.38 (0.35–0.40) |
Deep neural network | 0.84 (0.83–0.85) | 0.02 | 0.69 (0.67–0.72) | 0.86 (0.86–0.87) | 0.21 (0.20–0.22) | 0.98 (0.98–0.98) | 5.1 (4.9–5.3) | 0.35 (0.33–0.38) |
PPV positive predictive value, NPV negative predictive value, PLR positive likelihood ratio, NLR negative likelihood ratio.
aWe used a non-penalized logistic regression model as the reference model.
bWe compared the area under the curve between each machine-learning-based prediction model and the logistic regression model (the reference model) using the DeLong’s test.