Skip to main content
Annals of the American Thoracic Society logoLink to Annals of the American Thoracic Society
editorial
. 2021 Mar 30;18(7):1116–1117. doi: 10.1513/AnnalsATS.202103-332ED

A Bold First Toe into the Uncharted Waters of Evaluating Proprietary Clinical Prediction Models

Gary E Weissman 1
PMCID: PMC8328356  PMID: 34242149

graphic file with name AnnalsATS.202103-332EDfx1.jpg

Would a clinician prescribe a new medication in the absence of any data about its efficacy or safety? Of course not. Regulatory authorities like the Food and Drug Administration (FDA) and good clinical judgment would prevent such a blunder. Then why would a health system deploy a clinical prediction model, designed to inform high-stakes decisions for patients at risk for critical illness, without any evidence of efficacy or safety?

Although the FDA’s regulatory strategy for clinical prediction models continues to mature and expand to include guidance around equity, transparency, and safety, significant gaps and uncertainties in oversight remain (1). For example, there are currently no federal regulatory standards for predictive clinical decision support (CDS) systems developed locally by hospitals (2). Those developed by private-sector companies for sale on the market may, in some cases, require FDA approval if they meet certain criteria (3). However, some of these criteria remain vague, and models released before these criteria were published have an uncertain fate.

The Epic Deterioration Index (EDI) is one such CDS system that may meet criteria for FDA regulation as a medical device and is reportedly in use in “hundreds of hospitals in the United States” (4). The EDI is a commercially available predictive CDS built by EPIC systems to identify patients at risk of clinical deterioration, was developed prior to the coronavirus disease (COVID-19) pandemic, and uses predictor variables such as patient age (but not race or sex), vital signs, nursing assessments, and laboratory values. However, the EDI is neither approved by the FDA nor had its performance, safety, or other important characteristics been reported in any peer-reviewed journal until now.

In this issue of AnnalsATS, Singh and colleagues (pp. 1129–1137) provided a public service by performing the first published evaluation of the EDI (5). Notably, none of the authors are affiliated with the FDA, and none disclosed any relationship to EPIC. The authors released a preprint of this study almost 1 year prior to this publication, thereby allowing substantial time for public comment and review (6). They studied the EDI’s ability to predict a composite outcome of transfer to the intensive care unit, need for mechanical ventilation, or in-hospital death among ward patients with COVID-19 admitted to the University of Michigan’s health system during the initial months of the COVID-19 pandemic. This is a particularly important population in which to study the EDI because the pandemic caused significant strain on many hospital wards, which may impair important care processes (7). Thus, under such strain, clinicians may rely more heavily on CDS systems, a scenario in which their efficacy, safety, and fairness become increasingly important.

This paper has several strengths that offer useful information to hospitals trying to decide if and how the EDI might be deployed. First, the authors found that among 392 patients who met inclusion criteria for the study, the area under the curve of the receiver operating characteristic was 0.79 (95% confidence interval, 0.74–0.84). In plain English, this means that if two randomly selected patients, one who did not experience the outcome and one who did, were compared with each other, the model would appropriately predict a higher risk for the latter patient 79% of the time. Figure 2 in Singh and colleagues article offers further insights into the lead time during which clinicians might respond to an alert based on the EDI’s predictions. This information permits an assessment of whether or not there is sufficient time, in this case, a median of 24 hours, to respond to an alert that may vary by hospital depending on available resources.

Second, the authors identified clinically relevant classification thresholds corresponding to actual bedside care decisions that the EDI might inform. This is an insightful framing because many evaluations of clinical prediction models lack specific use cases, which precludes a necessary and pragmatic assessment. For example, at the high-risk threshold of an EDI score of 68.8, the positive predictive value was 75%, much higher than that for many early warning scores and with a very efficient number needed to evaluate of 1.4. However, at this threshold, the model only identified 39% of patients with the composite outcome.

Third, the authors provide some insight into the potential harms of the EDI while noting the disproportionate effects of COVID-19 on Black people. CDS systems such as the EDI risk reinforcing existing inequities as they focus resources on a patient in need, which may divert resources from other patients on the same ward (8). Thus, algorithmic equity—equivalent model performance across demographic subgroups—requires evaluation. No differences in the area under the curve of the receiver operating characteristic were detected between patients of different ages, genders, or races. However, the study may have been underpowered to detect such differences.

Fourth, Singh and colleagues chose to evaluate the model against a very reasonable and potentially actionable outcome to capture clinical deterioration. An early alert from the EDI might prompt expedited evaluation and attention for a patient in need. An inherent limitation to this choice, though through no fault of the authors, is that EPIC has never revealed the predicted outcome used to train the EDI in the first place. Thus, this evaluation is therefore limited in inferences that might be drawn about the “true” performance of the EDI model. At the same time, the authors’ evaluation is pragmatic and appropriate and highlights the bizarre practice of selling and deploying clinical prediction models without explaining or understanding them.

The study should be interpreted in light of several additional limitations. First, Singh and colleagues reported the in silico performance of the model but not its direct effects on clinician decision-making or patient outcomes. The latter two outcomes would be best evaluated using a prospective randomized design that is outside the scope of this study but necessary for understanding how the EDI affects patient care. Second, the authors observed large fluctuations in the EDI every 15 minutes, as it is calculated when deployed. However, the authors reported performance measures using aggregations of the EDI at the hospitalization level. Although this practice is not uncommon in the reporting of clinical prediction models, it likely overestimates the true performance of the model and provides a less than real-world evaluation of how the model is used in practice. However, in the sensitivity analysis reported using a prediction-level evaluation, the positive predictive values were more modest and ranged from 5.5% to 24% over different time horizons.

We still don’t know enough to evaluate the claim in the title of EPIC’s news item on its own web page, “Artificial Intelligence Triggers Fast, Lifesaving Care.” But to Singh and colleagues, a debt of gratitude is owed by the FDA, EPIC, “hundreds of hospitals,” and the wider community of researchers and data scientists working to advance the field of clinical prediction models. Hospital leaders are currently being faced with a barrage of incentives to roll out new predictive CDS systems. At the same time, hospitals that wouldn’t approve of their clinicians prescribing new medications with no data behind them shouldn’t themselves take up the same practice by deploying unvalidated clinical prediction models. If regulatory authorities don’t step in, hospitals and independent researchers like Singh and colleagues will have to keep diving in to pick up the slack.

Footnotes

Supported by National Institutes of Health grant K23HL141639.

Author disclosures are available with the text of this article at www.atsjournals.org.

References


Articles from Annals of the American Thoracic Society are provided here courtesy of American Thoracic Society

RESOURCES