The number of predictive models based on electronic health record (EHR) data is expanding resulting in identification of challenges not only in model development and validation (bias, cross-site and cross-setting differences) but also in implementation in practice including monitoring of performance changes that may occur due to dataset shifts in clinical environments. In this editorial, I highlight a systematic review, three research papers, and a perspective that contribute to advancing the science for addressing these challenges.
Chen et al. conducted a systematic review of artificial intelligence (AI) models developed using EHR data to identify key biases, strategies for detecting and mitigating bias throughout model development, and metrics for bias assessment.1 Twenty of 450 retrieved articles met inclusion criteria with most models developed for predictive tasks. No models in the review had been deployed in real-world settings at the time of the review. Twenty-five percent of the studies focused on detection of biases through fairness metrics such as statistical parity, equal opportunity, and predictive equity. The remainder proposed strategies for mitigating biases and predominantly involved data collection and preprocessing techniques such as resampling and reweighting. This review highlights the importance of bias detection and mitigation strategies so that predictive models advance health equity rather than exacerbate existing disparities.
Although the number of distributed research networks is growing, sharing patient-level data can be logistically and legally challenging. To address this issue, meta-analysis is commonly used to synthesize results from distributed research networks but may be subjective to bias especially when the event of interest is rare. Federated algorithms have been proposed to address this bias and are associated with communication costs across sites. Zhang et al.2 address these issues in the context of multiple clinical conditions in a time-to-event analysis framework by developing two novel one-shot distributed algorithms for competing risk models (ODACoR) for post-acute sequelae of SARS-CoV-2 infection, which comprises multiple symptoms and conditions: a distributed surrogate likelihood-based algorithm (ODACoR-S) and a distributed one-step Newton-Raphson-based algorithm (ODACoR-O). They applied the distributed algorithms to multi-site data extracted from the PEDSnet, a National Pediatric Learning Health System (n = 53 394 hospitalized patients who were tested for SARS-CoV-2). Both simulation and real-world data studies demonstrated that ODACoR-O and ODACoR-S had smaller relative bias in estimating the effect size of the log of sub-distribution hazard ratio from competing risk models, compared to the meta-analysis estimator. Study findings suggest that the novel algorithms are communication-efficient, highly accurate, and suitable to characterize the complex interplay between multiple clinical conditions.
Using the example of post-surgery prolonged opioid use, Naderalvojoud and co-authors designed an innovative methodology for evaluating site and cross-site features during predictive model development and the validation phases.3 They mapped EHR data from 4 countries (United States, United Kingdom, Finland, and Korea) to the OMOP Common Data Model, developed models (n = 41 929) using Observational Health Data Sciences and Informatics (OHDSI) tools, and externally validated the models on separate patient cohorts (n = 66 261). The top-performing model, lasso logistic regression, achieved an area under the receiver operating characteristic curve of 0.75 during local validation and an average of 0.69 in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site. Studies findings emphasize the importance of incorporating diverse feature sets from various clinical settings to enhance the generalizability and utility of predictive models.
While patient falls are a safety concern across healthcare settings, most falls prediction models have focused on the acute care setting. To address this gap, Wabe et al. conducted a longitudinal cohort study using electronic data from 27 residential aged care facilities in Sydney, Australia to develop and internally validate dynamic fall risk prediction models and create point-based scoring systems for residents with and without dementia.4 They tracked residents for 60 months, using monthly landmarks with one-month prediction windows. The models identified 15 independent predictors of falls in dementia and 12 in non-dementia cohorts with falls history as the key predictor of subsequent falls in both cohorts. The authors note the need to embed these models within EHRs to facilitate targeted falls prevention interventions.
Due to complex dataset shifts in clinical environments, the accuracy and utility of AI models may deteriorate over time. Davis, Embi, and Matheny describe a algorithmovigilance framework that encompasses a 360° continuum of approaches to address AI model monitoring and maintenance: (1) preventive (stability-focused design), (2) preemptive (technical oversight), (3) responsive (data-driven oversight), and (4) reactive (end-user reporting).5 They present strategies for each approach as well as describe its advantages and limitations. The authors argue that “comprehensive algorithmovigilance programs leveraging preventive, preemptive, responsive, and reactive tactics in coordination can sustain clinical AI models, minimize user disruptions, and reliably support patient care.”
In 2019 editorial, The Science of Informatics and Predictive Analytics, JAMIA Associate Editor Leslie Lenert pointed out multiple directions for the science of informatics beyond the mechanics of model development and validation.6 The manuscripts in this issue demonstrate evolution toward integrating the science of informatics with predictive models and achieving their promise to advance healthcare quality and health equity.
Funding
None declared.
Conflicts of interest
None declared.
Data availability
No data were used in the preparation of this editorial.
References
- 1. Chen F, Wang L, Hong J, Jiang J, Zhou L.. Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record-based models. J Am Med Inform Assoc. 2024;31(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Zhang D, Tong J, Jing N, et al. Learning competing risks across multiple hospitals: one-shot distributed algorithms. J Am Med Inform Assoc. 2024;31(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Naderalvojoud B, Curtin CM, Yanover C, et al. Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network. J Am Med Inform Assoc. 2024;31(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Wabe N, Meulenbroeks I, Huang G, et al. Development and internal validation of a dynamic fall risk prediction and monitoring 2 tool in aged care (FRIPAC) using routinely collected electronic health data: a landmarking approach. J Am Med Inform Assoc. 2024;31(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Davis SE, Embí PJ, Matheny ME.. Sustainable deployment of clinical prediction tools-a 360° approach to model maintenance. J Am Med Inform Assoc. 2024;31(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Lenert L. The science of informatics and predictive analytics. J Am Med Inform Assoc. 2019;26(12):1425-1426. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No data were used in the preparation of this editorial.
