This editorial refers to ‘Real-time imputation of missing predictor values in clinical practice’, by S.W.J. Nijman et al., on page 154.
The use of prediction models in clinical practice has become common over the past decades. Examples in the cardiovascular domain are SCORE,1 estimating the 10-year risk of fatal cardiovascular (CV) disease (CVD), the GRACE models,2 estimating the risk of CV mortality and morbidity in various populations of patients with coronary artery disease (CAD), and CHA2DS2-VASc for the prediction of stroke in patients with atrial fibrillation.3 The application of prediction models in clinical practice is strongly recommended by the European Society of Cardiology (ESC),4 in order to stimulate evidence-based cardiology and to harmonize treatment in the European region. The mentioned models, among others, have transparency and simplicity as a common feature. Risk estimates are based on a limited number of variables, and the end users have clear insight into the relation between the value of the variable in an individual patient and the level of estimated risk, which, obviously, promotes use of the models.
The creation of prediction models is an art. In an article worth reading,5 Steyerberg and Vergouwe described a systematic method for model development and validation. Without going into much detail, among others, they refer to the fact that many data sets used for model development are incomplete with respect to the values of potential relevant predictor variables. The solution to exclude patients with missing values from analysis is inefficient (i.e. a complete case analysis reduces statistical power), since available information of other predictors is then also lost. More worrisome, however, is the scenario that missingness is associated with the magnitude of the unobserved variable or with the presence of the outcome of interest. A complete case analysis might then result in biased estimates of the predictor-outcome relation, with obvious consequences for the quality of the resulting prediction model.6 Therefore, ‘imputation’ of missing data is recommended when developing prediction models, and several statistical approaches for this have been described.5 Best estimates of the missing values are thus obtained, based on relations between variables in the dataset.7
An appropriate application of published prediction models in routine clinical practice is an art as well. In particular, one needs to understand that, while outcome predictors to be used in prediction models might be population invariant, the actual level of risk might vary considerably between populations. Hence, prediction models developed in another population must be recalibrated for appropriate risk estimation and clinical decision making based thereon. We refer to the study by Harrison et al.8 for an example. Another relevant aspect is the (in)completeness of data on predictors that compose a published (and validated) prediction model. Indeed, data on certain predictors might be missing in a particular patient presenting at the outpatient clinic, so that model-based outcome estimates are inaccurate or even cannot be obtained. An obvious solution, which might work in most scenario’s, is to determine the value of the predictor in question at the moment of the patient visit. However, sometimes, this takes time, while risk assessment is needed on the spot. In such cases, more complex solutions are required.
Nijman et al.9 describe two methods for real-time handling of missing predictor values when using prediction models in practice and faced with missing values in individual patients presenting at the outpatient clinic. Both methods aim to impute missing values based on available data in other individuals. Utilizing the mean imputation (M-imp) method, missing values are estimated by the mean value in a representative sample of the target population. The method of joint modelling imputation (JMI) allows personalized imputation by adjusting for observed characteristics of the individual patient, which results from regression analyses on multiple (representative) datasets. In a simulation study, exploring the data of the Utrecht Cardiovascular Cohorts, JMI showed better performance and calibration than M-imp regarding models aiming to predict onset of CVD or coronary death. Also, the authors demonstrated that performance of JMI greatly improved when imputations were based on all observed patient data and not restricted to only the predictors that were in the prediction model. Moreover, real-time missing predictor imputations were most accurate when the imputation method relied on characteristics that were directly estimated from a sample from the target population, rather than from an external though related dataset.
Nijman et al.10 are to be complimented for putting the somewhat overlooked topic of ‘real-time handling of missing predictors’ on the research agenda. The poorer performance of the M-imp method will not have surprised the authors, as one of the co-authors already stated in 2006 that ‘… overall mean imputation … almost always results in biased estimates’. Further research of the potential of JMI is warranted, and the method might be extended to joint modelling with multiple imputation (JMMI), as such multiple imputation is known to result in better estimates of standard errors (and confidence intervals).10 The argument that this approach is ‘computationally expensive’9 is not convincing now that desktops are available with high computing power. For now, we concur with the lesson learned: if it is not possible to determine the value of a missing predictor on the spot, then use all observed data, including auxiliary variables, from representative samples of the target population to determine its value. To facilitate such an approach, patients, clinicians, researchers, and regulators should be encouraged to work together to foster secure data sharing.
Conflict of interest: none declared.
The opinions expressed in this article are not necessarily those of the Editors of the European Heart Journal – Digital Health or of the European Society of Cardiology.
References
- 1. Conroy RM, Pyorala K, Fitzgerald AP, Sans S, Menotti A, De Backer G, DeBacquer D, Ducimetiere P, Jousilahti P, Keil U, Njolstad I, Oganov RG, Thomsen T, Tunstall-Pedoe H, Tverdal A, Wedel H, Whincup P, Wilhelmsen L, Graham IM.. Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. Eur Heart J 2003;24:987–1003. [DOI] [PubMed] [Google Scholar]
- 2. Fox KA, Eagle KA, Gore JM, Steg PG, Anderson FA; GRACE and GRACE2 Investigators. The Global Registry of Acute Coronary Events, 1999 to 2009 - GRACE. Heart 2010;96:1095–1101. [DOI] [PubMed] [Google Scholar]
- 3. Lip GY, Nieuwlaat R, Pisters R, Lane DA, Crijns HJ.. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the Euro Heart Survey on atrial fibrillation. Chest 2010;137:263–272. [DOI] [PubMed] [Google Scholar]
- 4. https://www.escardio.org/Guidelines/Clinical-Practice-Guidelines (last accessed 13-2-2021).
- 5. Steyerberg EW, Vergouwe Y.. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J 2014;35:1925–1931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Janssen KJM, Donders ART, Harrell FE, Vergouwe Y, Chen Q, Grobbee DE, Moons KGM.. Missing covariate data in medical research: to impute is better than to ignore. J Clin Epidemiol 2010;63:721–727. [DOI] [PubMed] [Google Scholar]
- 7. Altman DG, Bland JM.. Missing data. BMJ 2007;334:424. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Harrison DA, Brady AR, Parry GJ, Carpenter JR, Rowan K.. Recalibration of risk prediction models in a large multicenter cohort of admissions to adult, general critical care units in the United Kingdom. Crit Care Med 2006;34:1378–1388. [DOI] [PubMed] [Google Scholar]
- 9. Nijman SWJ, Hoogland J, Groenhof TKJ, Brandjes M, Jacobs JJL, Bots ML, Asselbergs FW, Moons KGM, Debray TPA.. Real-time imputation of missing predictor values in clinical practice. Eur Heart J Dig Health 2021;2:154–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Donders ART, Van der Heijden GJMG, Stijnen T, Moons KGM.. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 2006;59:1087–1091. [DOI] [PubMed] [Google Scholar]