Skip to main content
. 2023 Feb 4;13:9. doi: 10.1186/s13561-023-00422-1

Table 3.

Sensitivity analysis for various high-cost user thresholds: predictive model performance

Prediction models 30% high-cost users prevalence 20% prevalence (the base case) 10% prevalence 5% prevalence
Sensitivitya F1d Sensitivitya F1d Sensitivitya F1d Sensitivitya F1d
Traditional regression models
 All conventional variables (TRM1)e 17.9% 26.4% 4.9% 9.1% * * * *
 As per TRM1 but no ethnicity variables (TRM2) 16.5% 25.8% 4.9% 9.0% * * * *
 As per TRM2 but no smoking variables (TRM3) 16.3% 25.6% 4.6% 8.6% * * * *
Machine learning modelsf
 Random forest 45.2% 49.3% 37.8% 41.2% 29.9% 32.6% 25.6% 28.5%
 KNN 45.7% 46.5% 38.0% 39.0% 29.2% 30.1% 25.2% 26.0%
 L1-regularised logistic regression 75.2% 50.9% 78.9% 34.5% 72.5% 21.0% 76.2% 25.0%
 Classification trees 46.1% 55.3% 19.5% 30.6% 11.4% 19.8% 10.9% 19.5%

Note: aResults produced from the model were unstable due to a small number of CVD events in relation to the total observations

a, b, c, d, e, f: see Table 2

The results for the traditional regression model as per TRM3 but no chronic condition variables were not reported as this model had very poor predictive power