Skip to main content
. 2020 Nov 11;3:148. doi: 10.1038/s41746-020-00354-8

Table 3.

Prediction ability of the reference and four machine-learning-based prediction models for top 1% or 10% healthcare cost spenders.

Outcome c-statistics P-valueb Sensitivity Specificity PPV NPV PLR NLR
The prediction model for top 1% healthcare cost spenders
 Reference modela 0.85 (0.82–0.87) [Reference] 0.71 (0.67–0.76) 0.84 (0.83–0.84) 0.04 (0.04–0.05) 0.99 (0.99–0.99) 4.4 (4.1–4.7) 0.35 (0.29–0.41)
 Logistic regression with Lasso regularization 0.86 (0.84–0.88) 0.42 0.78 (0.73–0.82) 0.78 (0.78–0.79) 0.04 (0.03–0.04) 0.99 (0.99–0.99) 3.6 (3.4–3.8) 0.29 (0.24–0.35)
 Random forest 0.83 (0.80–0.85) 0.26 0.66 (0.61–0.71) 0.88 (0.87–0.88) 0.05 (0.05–0.06) 0.99 (0.99–0.99) 5.4 (4.9–5.8) 0.39 (0.34–0.45)
 Gradient-boosted decision tree 0.85 (0.83–0.88) 0.69 0.70 (0.65–0.74) 0.87 (0.87–0.88) 0.05 (0.05–0.06) 0.99 (0.99–0.99) 5.4 (5.0–5.8) 0.35 (0.30–0.41)
 Deep neural network 0.85 (0.82–0.87) 0.91 0.74 (0.69–0.78) 0.80 (0.80–0.80) 0.04 (0.03–0.04) 0.99 (0.99–0.99) 3.7 (3.4–3.9) 0.33 (0.28–0.39)
The prediction model for top 10% healthcare cost spenders
 Reference modela 0.85 (0.85–0.86) [Reference] 0.74 (0.73–0.76) 0.83 (0.83–0.84) 0.33 (0.32–0.34) 0.97 (0.97–0.97) 4.4 (4.3–4.6) 0.31 (0.29–0.33)
 Logistic regression with Lasso regularization 0.85 (0.85–0.86) 0.99 0.74 (0.73–0.76) 0.83 (0.83–0.84) 0.33 (0.32–0.34) 0.97 (0.97–0.97) 4.5 (4.3–4.6) 0.31 (0.30–0.33)
 Random forest 0.87 (0.86–0.88) <0.001 0.75 (0.73–0.76) 0.87 (0.87–0.88) 0.39 (0.38–0.41) 0.97 (0.97–0.97) 5.8 (5.7–6.1) 0.29 (0.27–0.31)
 Gradient-boosted decision tree 0.88 (0.87–0.88) <0.001 0.76 (0.75–0.77) 0.87 (0.86–0.87) 0.38 (0.37–0.40) 0.97 (0.97–0.97) 5.6 (5.4–5.8) 0.28 (0.26–0.30)
 Deep neural network 0.88 (0.87–0.88) <0.001 0.75 (0.74–0.77) 0.87 (0.87–0.88) 0.39 (0.38–0.41) 0.97 (0.97–0.97) 5.8 (5.6–6.0) 0.28 (0.27–0.30)

PPV positive predictive value, NPV negative predictive value, PLR positive likelihood ratio, NLR negative likelihood ratio.

aWe used a non-penalized logistic regression model as the reference model.

bWe compared the area under the curve between each machine-learning-based prediction model and the logistic regression model (the reference model) using the DeLong’s test.