Prediction models hold tremendous promise as a way to improving patient outcomes and healthcare quality more generally by efficiently targeting interventions to those patients most likely to benefit. Most aspects of healthcare (i.e. tests, medications and procedures) have associated risks and burdens. Accurate prediction of the patients at higher risk for adverse outcomes allows for better targeting of interventions, so that only patients at high risk (for whom the benefits of the intervention outweigh the risks) are provided the intervention. Conversely, accurate prediction of patients at low risk for adverse outcomes (for whom the risks of the intervention outweigh the benefits) can help those patients avoid unnecessary and potentially harmful interventions. Thus, accurate prediction models play a pivotal role in the vision of personalized medicine, where all clinical decisions are tailored to each individual’s unique risk profile.1
In this issue of BMJ Quality & Safety, McAlister and van Walraven expand our knowledge of prediction for common, adverse hospitalization outcomes (prolonged hospitalization, 30-day mortality and readmission) in older adults.2 They utilize Ontario province data from 2004–2010 to compare how 2 previously published prediction models (Hospital Frailty Risk Score or HFRS3 and Hospital-patient One-year Mortality Risk or HOMR)4 predict these outcomes in historical Ontario data. They found that the HFRS more accurately predicted prolonged hospitalization while the HOMR more accurately predicted 30-day mortality. Both the HFRS and HOMR were poor predictors of 30-day readmissions to the hospital.
The authors should be commended for conducting a methodologically rigorous external validation of previously developed prediction models. Previous reviews have found that many prediction models are developed, but few are externally validated.5 Without external validation, providers face substantial uncertainty about whether a model is accurate enough to guide clinical decisions. Thus, methodologically rigorous validation studies such as this one constitute a critical but often-ignored component in the chain of evidence that starts with prediction model development and ends with prediction models being used to inform clinical care.
As the authors themselves note, their primary finding, that the HFRS better predicts prolonged hospitalization while the HOMR better predicts 30-day mortality, comes as no surprise. The HFRS was developed and optimized to predict frailty and prolonged hospitalization. In contrast, the HOMR was developed and optimized to predict 1-year mortality. This study’s results show that while adverse outcomes such as prolonged hospitalization and mortality often cluster together, they are distinct: A prediction model optimized for 1-year mortality predicts 30-day mortality better than a prediction model optimized to predict frailty and prolonged hospitalization. An optimistic interpretation of these results is that prediction models have reached a level of sophistication where related outcomes such as mortality and prolonged hospitalization can be distinguished and distinct prediction models are needed for these related (same-same) but distinct (different) outcomes.
Three additional factors should be considered when interpreting the results of this study2: 1) Differences between the UK and Ontario, 2) the importance of calibration as well as discrimination in the validation of prediction studies and 3) the importance of physical function in the prediction of outcomes for hospitalized older adults.
UK/Ontario Differences:
One striking result of this study concerns the differences between the UK and Ontario populations of hospitalized older adults. At baseline, hospitalization was much more common in the UK, with 40.1% of UK patients experiencing multiple prior admissions in the previous 2 years compared to only 6.3% in Ontario. This profound difference in the hospitalization exposure likely led to increased rates of ICD10 codes in the UK, resulting in the UK cohort having higher Charlson scores (2.9 in UK, 2.0 in Ontario) and higher proportion of frailty (58% in UK, 26% in Ontario). These baseline differences in hospitalization exposures may also have contributed to some of the unexpected, surprising results of this study. For example, patients with higher frailty risk scores (HFRS) in the UK were more likely to be readmitted; however, patients in Ontario with higher frailty risk scores were less likely to be readmitted. Future studies should explore potential sources of the differences in hospitalization rates in the UK and Ontario, including the differential use of geriatric day hospitals or different coding practices. This information could be very helpful in interpreting some of these intriguing differences in both the rate of hospitalization and the apparent differential effect of frailty on subsequent readmissions.
Calibration as well as Discrimination
Future studies should focus on both discrimination and calibration as co-equal components prediction model validation.6 Historically, discrimination (usually measured by the c-statistic) has overshadowed calibration in prediction model validation. This is unfortunate. Calibration is as important as discrimination (if not more important) for prediction models that are going to be used to inform clinical decisions.7,8 Discrimination measures how well a model stratifies or orders patients by risk. However, patients and providers are less interested in whether a patient is higher (or lower) risk than others. The information that is most helpful in clinical decision making is absolute predicted risk, which is evaluated through calibration. For example, knowing that a patient is in the highest quintile of risk is often less important than knowing that the patient’s 1 year risk of the outcome is 50%. Thus, prediction model validation studies should prominently display calibration (predicted vs observed outcome rates) across risk groups so that readers can evaluate model calibration and calculate the predicted risk for an individual patient with a specific set of predictors.
Physical Function
There is an extensive literature highlighting the importance of physical function as a predictor of outcomes for hospitalized older adults.9 However, neither the HFRS or HOMR considers functional status predictors. This omission likely reflects the fact that functional data are often unavailable in electronic and administrative databases. Thus, although the HFRS, HOMR and the current validation study are methodologically sound, major advances in prediction of hospitalization outcomes for older adults will likely remain out of reach until we can access additional data on factors such as physical function. Given the intrinsic importance of physical function to patients and its additional value in predicting outcomes such as readmission and nursing home placement, I urge regional and national data systems to follow the lead of the US Department of Veterans Affairs in routinely collecting functional data.10
Conclusion
This is an exciting time for clinical prediction with several trends coalescing to make prediction easier, faster and more accurate. First, the increasingly widespread use of electronic medical records means that more and more data is becoming readily accessible for clinical prediction. Instead of relying solely on administrative data such as age and ICD10 codes, clinical data such as laboratory results, radiology results and pharmacy data are increasingly being used to identify which patients are at highest risk.11 Second, there is increasing acceptance by clinicians and the public that “big data” and “predictive analytics” can make many things, including health care, better. While previous generations of clinicians viewed sophisticated “black-box” prediction models with skepticism, newer generations of clinicians, growing up in an age where Google correctly guesses your search phrase after 3 letters and Netflix recommends a show that ends up being your favorite, are more comfortable using predictions from sophisticated models to inform clinical decisions. These trends suggest that clinical prediction models will play a larger role in healthcare in the future. Studies such as this one will be a critical component of the evidence base that ensures that clinical prediction models fulfill their promise of better, safer care.
Acknowledgements
Dr. Lee’s work on this manuscript was supported by the National Institute on Aging (R01AG047897 and R01AG057751), VA HSR&D (IIR 15-434) and VA Office of Academic Affiliations (VA Quality Scholars Program, Grant #AF-3Q-09-2019-C)
REFERENCES
- 1.Collins FS, Varmus H. A New Initiative on Precision Medicine. N Engl J Med. 2015. February 26;372(9):793–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.McAlister et al. External validation of the Hospital Frailty Risk Score and comparison to the Hospital-patient One-year Mortality Risk score to predict outcomes in elderly hospitalized patients: A retrospective cohort study. BMJ Quality & Safety 2018. bmjqs-2018–008661. [DOI] [PubMed] [Google Scholar]
- 3.Gilbert T, Neuberger J, Kraindler J, Keeble E, Smith P, Ariti C, Arora S, Street A, Parker S, Roberts HC, Bardsley M, Conroy S. Development and validation of a Hospital Frailty Risk Score focusing on older people in acute care settings using electornic hospital records: An observational Study. Lancet. 2018. May 5;391(10132):1775–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.van Walraven C The Hospital-patient One-year Mortality Risk score accurately predicted long-term death risk in hospitalized patients. J Clin Epidemiol. 2014. September;67(9):1025–34. [DOI] [PubMed] [Google Scholar]
- 5.Yourman LC, Lee SJ, Schonberg MA, Widera EW, Smith AK. Prognostic indices for older adults: A systematic review. JAMA. 2012. January 11;307(2):182–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Alba AC, Agoritsas T, Walsh M, Hanna S, Iorio A, Devereaux PJ, McGinn T, Guyatt G. Discrimination and Calibration of Clinical Prediction Models: Users’ Guides to the medical Literature. JAMA. 2017. October 10;318(14):1377–1384. [DOI] [PubMed] [Google Scholar]
- 7.Cook NR. Statistical evaluation of prognostic versus diagnostic models: Beyond the ROC curve. Clin Chem. 2008. January;54(1):17–23. [DOI] [PubMed] [Google Scholar]
- 8.Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: A framework for traditional and novel measures. Epidemiology. 2010. January;21(1):128–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Covinsky KE, Pierluissi E, Johnston CB. Hospitalization-associated disability: “She was probably able to ambulate, but I’m not sure”. JAMA. 2011. October 26;306(16):1782–93. [DOI] [PubMed] [Google Scholar]
- 10.Brown RT, Komaiko KD, Shi Y, Fung KZ, Boscardin WJ, Au-Yeung A, Tarasovsky G, Jacob R, Steinman MA. Bringing functional status into a big data world: Validation of national Veterans Affairs functional status data. PLoS One. 2017. June 1;12(6):e0178726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wang L, Porter B, Maynard C, Evans G, Bryson C, Sun H, Gupta I, Lowy E, McDonell M, Frisbee K, Nielson C, Kirkland F, Fihn SD. Predicting risk of hospitalization or death among patients receiving primary care in the Veterans Health Administration. Med Care. 2013. April;51(4):368–73. [DOI] [PubMed] [Google Scholar]
