Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2022 Jul 20:2022.05.24.22275398. Originally published 2022 May 25. [Version 2] doi: 10.1101/2022.05.24.22275398

Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs

Justin T Reese, Hannah Blau, Timothy Bergquist, Johanna J Loomba, Tiffany Callahan, Bryan Laraway, Corneliu Antonescu, Elena Casiraghi, Ben Coleman, Michael Gargano, Kenneth J Wilkins, Luca Cappelletti, Tommaso Fontana, Nariman Ammar, Blessy Antony, T M Murali, Guy Karlebach, Julie A McMurry, Andrew Williams, Richard Moffitt, Jineta Banerjee, Anthony E Solomonides, Hannah Davis, Kristin Kostka, Giorgio Valentini, David Sahner, Christopher G Chute, Charisse Madlock-Brown, Melissa A Haendel, Peter N Robinson; the RECOVER Consortium
PMCID: PMC9164456  PMID: 35665012

Abstract

Accurate stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. We present a method for computationally modeling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning procedures. Using k-means clustering of this similarity matrix, we found six distinct clusters of PASC patients, each with distinct profiles of phenotypic abnormalities. There was a significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. Two of the clusters were associated with severe manifestations and displayed increased mortality. We assigned new patients from other healthcare centers to one of the six clusters on the basis of maximum semantic similarity to the original patients. We show that the identified clusters were generalizable across different hospital systems and that the increased mortality rate was consistently observed in two of the clusters. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.

Full Text

The Full Text of this preprint is available as a PDF (989.0 KB). The Web version will be available soon.


Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES