. 2014 Sep 19;9:1581–1593. doi: 10.2147/CIA.S65475

Table S2.

Criteria to consider when evaluating the quality of risk prediction models

Standard criteria^a	Explanation	Example
Study design	Prospective: allows optimal collection of potential candidate variables; smaller dataset often generated. Retrospective: enables use of large previously collected datasets; quality of candidate variable data may be compromised due to missing data, which rarely occurs at random.	Prospective study design, n=690, all exclusions were for appropriate reasons.¹ Retrospective study design, n=5,936, unknown number of exclusions due to missing data.²
Participant recruitment	Inclusion and exclusion criteria should be clearly described to allow full assessment of patient population studied. Any systematic variation in recruitment of patients should be viewed with caution due to risk of sampling bias. There is no predetermined satisfactory number for loss to follow-up; however, it should be considered that missing data impacts on the statistical power of the study.	Interview data was only collected for half of the patients during the development phase. Patients not wishing to participate in the interview may systematically differ.³
Candidate predictor variables	Variables and their measurement should be clearly defined to allow for replication. Investigators should be blind to outcome to reduce risk of bias. Continuous variables should be assessed for conformity to linear gradient. Not necessary for dichotomous variables; however, dichotomization of continuous variables not recommended as it impacts on the statistical power of the study. Correlation (test for colinearity) between risk variables should be examined and reported.	Unclear how key variables, eg, liver disease, were defined. To replicate, study investigators would be required to apply their own definition, which may have an impact on reproducibility.²
Outcome	Method of measuring outcome: must be reproducible and, where assessment scales are applied, these should be validated to increase accuracy and reproducibility of the measurement. Dichotomization of continuous outcomes is not recommended as it can affect statistical power.	Investigators generated own causality assessment of unknown validity.⁴ Applied widely-used validated causality assessment (Naranjo algorithm).²
Statistical power	Sample size is calculated based on number of outcome events per variable, where ten events per variable is often recommended. A high number of variables and a rare outcome can result in over-fitting of the model, causing poor generalizability.	Reported 86 ADRs in a sample of 690 patients and assessed 34 candidate predictor variables, resulting in only 2.5 events per variable.¹
Selection of variables	Independent variable selection should be described clearly, and can be based on the literature and/or statistical association as determined by univariate analysis with outcome variable. Selection based upon univariate analysis alone increases likelihood of developing an over-fitted model. Inclusion of variables applicable to over 5% of population may help exclude artifact variables. Fitting procedure (entering of variables into model) should be explicitly stated, including removal criteria.	Variables were entered into multivariate analysis if P<0.05 after univariate analysis, or if P<0.25 for variables identified from other studies. Liver disease was removed as it applied to <5% of population. Backward elimination and forward selection were used with a removal criteria of P=0.10.¹
Model performance	In both development and validation phases, assessment of discrimination and calibration should be reported to determine how well the model distinguishes those who have an ADR from those who have not, as well as how close the prediction is to the observed outcome for that risk group. AUROC >0.7 is often deemed acceptable, but this alone is not sufficient to determine the clinical usefulness of the model.⁶ Assessment of the generalizability of the model is important to determine the accuracy of predictions in another population and is recommended prior to routine clinical application. Internal validation, by methods such as bootstrapping (data resampling) or split-sample, assesses how well predictors correspond to the outcome, but leads to optimistic estimates of model performance. External validation is more rigorous and enables assessment of accuracy when the model is applied by investigators not involved in the development of the model.	Discrimination (AUROC) and calibration (Hosmer-Lemeshow) reported in the development and validation phases.¹ Trivalle applied bootstrapping.⁵ Onder applied external validation whereby the model was applied by investigators not involved in the development of the model and in a different geographical location.⁷

Note:

Criteria derived from the published literature.⁸^–¹¹

Abbreviations: ADR, adverse drug reaction; AUROC, area under the receiver operator curve.