. 2018 Jul 23;190(29):E871–E882. doi: 10.1503/cmaj.170914

Table 3:

Summary statistics showing goodness of fit for the Cardiovascular Disease Population Risk Tool in the initial development model, the validation model, the final model derived from the combined data and the parsimonious model after applying the step-down procedure^*

Variable	Development	Validation	Combined	Reduced
Male model
Discrimination
C-statistic (95% CI)	0.82 (0.81–0.83)	0.79 (0.76–0.81)	0.82 (0.81–0.83)	0.82 (0.81–0.83)
Ratio of 95 to 5 risk percentile	298.2 (0.0963/0.0003)	468.7 (0.0770/0.0002)	345.3 (0.0914/0.0003)	337.8 (0.0913/0.0003)
Calibration
Observed v. predicted, %	0.08	1.38	0.28	0.28
5-year cumulative incidence (observed) (95% CI)	0.027 (0.026–0.029)	0.023 (0.020–0.025)	0.026 (0.025–0.028)	0.026 (0.025–0.028)
5-year risk (predicted)	0.027	0.022	0.026	0.026
Overall performance
Brier_scaled score	0.025	0.022	0.024	0.024
Nagelkerke R²	0.096	0.086	0.089	0.089
Female model
Discrimination
C–statistic (95% CI)	0.87 (0.86–0.88)	0.85 (0.83–0.87)	0.86 (0.85–0.87)	0.86 (0.85–0.87)
Ratio of 95 to 5 risk percentile	645.0 (0.0811/0.0001)	810.5 (0.0709/0.0001)	482.3 (0.0794/0.0002)	477.5 (0.0794/0.0002)
Calibration
Observed v. predicted, %	0.30	7.13	0.39	0.38
5-year cumulative incidence (observed) (95% CI)	0.018 (0.017–0.019)	0.017 (0.015–0.019)	0.018 (0.017–0.019)	0.018 (0.017–0.019)
5-year risk (predicted)	0.018	0.016	0.018	0.018
Overall performance
Brier_scaled score	0.017	0.016	0.017	0.017
Nagelkerke R²	0.124	0.126	0.117	0.117

Note: CI = confidence interval

Three types of performance tests were examined:²⁸ 1) Discrimination is the ability of a prediction model to differentiate between those who do and do not develop the outcome of interest. C-statistic is a rank order statistic for predictions against true outcomes.¹⁸^,²⁹ The statistic ranges from 0 to 1: a value of 0.5 indicates the model is no better than random prediction, a value of 1 indicates the model perfectly predicts those who will develop the outcome of interest and those who will not. Ratio of 95 to 5 risk percentiles is a test of discrimination. A higher ratio indicates a more discriminating algorithm. For example, a ratio of 100 indicates that the absolute risk is 100 times higher for a person in the 95th percentile than for a person in the 5th percentile. The ratio can be used to gauge the potential absolute benefit of treatment for different individuals in the development and validation cohorts. For an intervention with the same relative benefit, a risk ratio of 100 indicates that 1 person will have 100 times the absolute benefit of the comparative person. 2) Calibration reflects agreement between the observed outcomes and predictions. Calibration (or accuracy) describes how well the predicted probability of disease agrees with the observed outcome. Observed versus predicted (O v. P) is the relative difference between the observed incidence and predicted risk. A 1% difference in O v. P indicates 1% more cardiovascular events were observed than predicted. This table shows overall O v. P. Appendices 6 and 7, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.170914/-/DC1, show O v. P for specific subgroups. This table presents an absolute measure of O v. P as the observed 5-year cumulative incidence and the predicted 5-year risk. A graphical assessment of calibration is presented in Appendix 8, available at www.cmaj.ca/lookup/suppl/doi:10.1503/cmaj.170914/-/DC1 (calibration plots). 3) Overall performance measures. Brier_scaled score is a measure of overall agreement between observed and predictive risk with values between 0 and 1.³⁰ This scaled Brier score happens to be very similar to the Pearson R² statistic.³¹ Nagelkerke R² is a measure of amount the model explains the variation of risk between respondents in the development or validation data with values from 0 to 1.³²^,³³ Larger R² values indicate that more of the variation is explained by the model, to a maximum of 1.