. 2020 Nov 11;3:148. doi: 10.1038/s41746-020-00354-8

Table 6.

Prediction ability of the reference and four machine-learning-based prediction models for HNHC patients using part of clinical and claims data.

Outcome	c-statistics	P-value^b	Sensitivity	Specificity	PPV	NPV	PLR	NLR
The prediction model using only clinical data collected from the screening program
Reference model^a	0.71 (0.70–0.72)	[Reference]	0.56 (0.54–0.58)	0.75 (0.75–0.75)	0.11 (0.10–0.11)	0.97 (0.97–0.97)	2.2 (2.1–2.3)	0.59 (0.56–0.62)
Logistic regression with Lasso regularization	0.71 (0.70–0.72)	0.99	0.54 (0.52–0.56)	0.77 (0.77–0.78)	0.11 (0.10–0.12)	0.97 (0.97–0.97)	2.4 (2.3–2.5)	0.60 (0.57–0.63)
Random forest	0.74 (0.73–0.75)	0.001	0.64 (0.62–0.66)	0.71 (0.71–0.72)	0.11 (0.10–0.11)	0.97 (0.97–0.98)	2.2 (2.1–2.3)	0.51 (0.48–0.54)
Gradient-boosted decision tree	0.72 (0.70–0.73)	0.41	0.62 (0.60–0.65)	0.70 (0.69–0.70)	0.10 (0.09–0.10)	0.97 (0.97–0.97)	2.1 (2.0–2.1)	0.54 (0.51–0.57)
Deep neural network	0.72 (0.70–0.73)	0.39	0.53 (0.51–0.56)	0.79 (0.79–0.80)	0.12 (0.11–0.13)	0.97 (0.97–0.97)	2.6 (2.4–2.7)	0.59 (0.56–0.62)
The prediction model using only patient age, gender, and healthcare cost data from claims data
Reference model^a	0.82 (0.81–0.83)	[Reference]	0.68 (0.66–0.70)	0.84 (0.84–0.84)	0.18 (0.17–0.19)	0.98 (0.98–0.98)	4.2 (4.1–4.4)	0.38 (0.36–0.41)
Logistic regression with Lasso regularization	0.82 (0.81–0.83)	0.99	0.68 (0.66–0.70)	0.84 (0.84–0.85)	0.18 (0.18–0.19)	0.98 (0.98–0.98)	4.3 (4.1–4.5)	0.38 (0.36–0.41)
Random forest	0.82 (0.80–0.83)	0.53	0.63 (0.61–0.65)	0.88 (0.87–0.88)	0.21 (0.20–0.22)	0.98 (0.98–0.98)	5.1 (4.9–5.4)	0.42 (0.40–0.45)
Gradient-boosted decision tree	0.84 (0.83–0.85)	0.02	0.67 (0.64–0.69)	0.89 (0.89–0.89)	0.24 (0.23–0.25)	0.98 (0.98–0.98)	6.0 (5.7–6.2)	0.38 (0.35–0.40)
Deep neural network	0.84 (0.83–0.85)	0.02	0.69 (0.67–0.72)	0.86 (0.86–0.87)	0.21 (0.20–0.22)	0.98 (0.98–0.98)	5.1 (4.9–5.3)	0.35 (0.33–0.38)

PPV positive predictive value, NPV negative predictive value, PLR positive likelihood ratio, NLR negative likelihood ratio.

^aWe used a non-penalized logistic regression model as the reference model.

^bWe compared the area under the curve between each machine-learning-based prediction model and the logistic regression model (the reference model) using the DeLong’s test.