Skip to main content
Oxford University Press - PMC COVID-19 Collection logoLink to Oxford University Press - PMC COVID-19 Collection
. 2021 Feb 10:ocab018. doi: 10.1093/jamia/ocab018

Validation of an Internationally Derived Patient Severity Phenotype to Support COVID-19 Analytics from Electronic Health Record Data

Jeffrey G Klann 1,, Griffin M Weber 2, Hossein Estiri 1, Bertrand Moal 3, Paul Avillach 4, Chuan Hong 4, Victor Castro 5, Thomas Maulhardt 6, Amelia L M Tan 4, Alon Geva 7, Brett K Beaulieu-Jones 4, Alberto Malovini 8, Andrew M South 9, Shyam Visweswaran 10, Gilbert S Omenn 11, Kee Yuan Ngiam 12, Kenneth D Mandl 13, Martin Boeker 6, Karen L Olson 13, Danielle L Mowery 14, Michele Morris 10, Robert W Follett 15, David A Hanauer 16, Riccardo Bellazzi 17, Jason H Moore 14, Ne-Hooi Will Loh 18, Douglas S Bell 15, Kavishwar B Wagholikar 19, Luca Chiovato 20, Valentina Tibollo 8, Siegbert Rieg 21, Anthony L L J Li 22, Vianney Jouhet 23, Emily Schriver 24, Malarkodi J Samayamuthu 10, Zongqi Xia 25, Meghan Hutch 26, Yuan Luo 26; The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) (CONSORTIA AUTHOR), Isaac S Kohane 4, Gabriel A Brat 4, Shawn N Murphy 27
PMCID: PMC7928835  PMID: 33566082

Abstract

Introduction

The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing COVID-19 with federated analyses of electronic health record (EHR) data.

Objective

We sought to develop and validate a computable phenotype for COVID-19 severity.

Methods

Twelve 4CE sites participated. First we developed an EHR-based severity phenotype consisting of six code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also piloted an alternative machine-learning approach and compared selected predictors of severity to the 4CE phenotype at one site.

Results

The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability - up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean AUC 0.903 (95% CI: 0.886, 0.921), compared to AUC 0.956 (95% CI: 0.952, 0.959) for the machine-learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared to chart review.

Discussion

We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine-learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly due to heterogeneous pandemic conditions.

Conclusion

We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.

Keywords: novel coronavirus, disease severity, computable phenotype, medical informatics, data networking, data interoperability


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES