Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Mar 31;16(3):e0248956. doi: 10.1371/journal.pone.0248956

Characterizing COVID-19 clinical phenotypes and associated comorbidities and complication profiles

Elizabeth R Lusczek 1,#, Nicholas E Ingraham 2,*,#, Basil S Karam 3, Jennifer Proper 4, Lianne Siegel 4, Erika S Helgeson 4, Sahar Lotfi-Emran 2, Emily J Zolfaghari 5, Emma Jones 1, Michael G Usher 6, Jeffrey G Chipman 1, R Adams Dudley 2,7, Bradley Benson 6, Genevieve B Melton 1,7, Anthony Charles 8,9, Monica I Lupei 10, Christopher J Tignanelli 1,7,11
Editor: Chiara Lazzeri12
PMCID: PMC8011766  PMID: 33788884

Abstract

Purpose

Heterogeneity has been observed in outcomes of hospitalized patients with coronavirus disease 2019 (COVID-19). Identification of clinical phenotypes may facilitate tailored therapy and improve outcomes. The purpose of this study is to identify specific clinical phenotypes across COVID-19 patients and compare admission characteristics and outcomes.

Methods

This is a retrospective analysis of COVID-19 patients from March 7, 2020 to August 25, 2020 at 14 U.S. hospitals. Ensemble clustering was performed on 33 variables collected within 72 hours of admission. Principal component analysis was performed to visualize variable contributions to clustering. Multinomial regression models were fit to compare patient comorbidities across phenotypes. Multivariable models were fit to estimate associations between phenotype and in-hospital complications and clinical outcomes.

Results

The database included 1,022 hospitalized patients with COVID-19. Three clinical phenotypes were identified (I, II, III), with 236 [23.1%] patients in phenotype I, 613 [60%] patients in phenotype II, and 173 [16.9%] patients in phenotype III. Patients with respiratory comorbidities were most commonly phenotype III (p = 0.002), while patients with hematologic, renal, and cardiac (all p<0.001) comorbidities were most commonly phenotype I. Adjusted odds of respiratory, renal, hepatic, metabolic (all p<0.001), and hematological (p = 0.02) complications were highest for phenotype I. Phenotypes I and II were associated with 7.30-fold (HR:7.30, 95% CI:(3.11–17.17), p<0.001) and 2.57-fold (HR:2.57, 95% CI:(1.10–6.00), p = 0.03) increases in hazard of death relative to phenotype III.

Conclusion

We identified three clinical COVID-19 phenotypes, reflecting patient populations with different comorbidities, complications, and clinical outcomes. Future research is needed to determine the utility of these phenotypes in clinical practice and trial design.

Introduction

The coronavirus disease 2019 (COVID-19), a disease caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), has infected over 18 million and led to over 700,000 deaths since first appearing in late 2019 [1]. Researchers are rapidly attempting to understand the natural history of and immune response to COVID-19 [2]. Despite intense research since the arrival of this novel coronavirus [3], only one pharmaco-therapeutic agent, dexamethasone, has been associated with reduced mortality in at-risk individuals [4]. COVID-19 results in a constellation of symptoms, laboratory derangement, immune dysregulation, and clinical complications [5].

Emergency department presentation varies widely, suggesting distinct clinical phenotypes exist and, importantly, it is likely these distinct phenotypes respond differently to treatment. To illustrate, two early phenotypes of respiratory failure likely exist in COVID-19. A classic ARDS phenotype exists with poorly compliant lungs and poor gas exchange; however, a phenotype with normal lung compliance also exists in COVID-19 and is hypothesized to be driven by shunting secondary to pulmonary microthrombi [6, 7]. An intricate, multidimensional view is required to adequately understand the disease and account for the variation in clinical outcomes. Furthermore, patients could benefit from phenotype-specific medical care, which may differ from established standards of care.

Despite this need, few studies have characterized COVID-19 clinical phenotypes and evaluated their association with complications and clinical outcomes. The aim of this study was to characterize clinical phenotypes in COVID-19 according to disease-system factors using electronic health record (EHR) data pooled from 14 U.S. Midwest hospitals between March 7, 2020 and August 25, 2020.

Materials and methods

Data collection

The data source for this study included EHR reports from 14 U.S. Midwest hospitals and 60 primary care clinics across Minnesota. The healthcare system includes an academic quaternary center along with community hospitals all capable of providing critical care. Patient and hospital-level data were available for 7,538 patients with PCR-confirmed COVID-19. Of these, 1,022 required hospital admission and were included in this analysis. The database included all comorbidities reported since March 29, 1997 for each patient and prior to their COVID-19 diagnosis. The database also included home medications, laboratory values, clinic visits, social history, and patient demographics (age, gender, race/ethnicity, language spoken, zip code, socioeconomic status indicators). Race/ethnicity are self-reported. For each COVID-19 hospitalization the database included all laboratory values, vitals, orders, medications, complications, length of stay, and hospital disposition. State death certificate data was linked with the database to enable capture of out-of-hospital death. Additionally, the database allowed linkage across the 14 hospitals, facilitating the tracking of transfers.

The study was approved by all hospitals within the MHealth Fairview system which includes ethical approval by the University of Minnesota institutional review board. All patients have the option to opt-out of research upon establishing care within the MHealth Fairview healthcare system. Data is aggregated through the University of Minnesota’s centralized informatics center and de-identified prior to analysis. Data were pooled across different electronic health records (EHRs) utilizing a unique patient identifier to account for health care encounters across systems. This study was approved by the University of Minnesota institutional review board (STUDY00001489), which provided a waiver of consent for this study.

Participants

Patient-level data were obtained from the COVID-19 database from March 7, 2020 to August 25, 2020. The inclusion criterion was as follows: PCR-positive COVID-19 test requiring inpatient hospital admission to one of the 14 hospitals providing data. No hospitalized patients were excluded in this analysis to maximize generalizability. Follow-up data were available for a minimum of two weeks following admission for all patients.

Clinical variables for phenotyping

We selected 33 variables for clustering based on their association with COVID-19 mortality, known COVID-19 pathophysiology, and presence in the database (no more than 50% missingness) [811]. The following variables were included: age, body mass index (BMI), heart rate, respiratory rate, oxygen saturation, pulse pressure, systolic blood pressure, total protein, red cell distribution width, mean corpuscular volume, alkaline phosphatase, calcium, anion gap, bicarbonate, hematocrit, aspartate aminotransferase, glucose, absolute monocyte count, absolute neutrophil count, absolute lymphocyte count, white blood cell count, platelet, albumin, bilirubin, international normalized ratio (INR), lactate dehydrogenase, potassium, sodium, D-dimer, hemoglobin, C-reactive protein (CRP), creatinine, and gamma gap. For each variable we selected the first recorded value within the first 72 hours of the emergency department (ED) presentation that ultimately resulted in their hospitalization.

Comorbidities

We selected 68 comorbidities documented for each patient from March 29, 1997 preceding their COVID-19 hospital admission in their electronic health record (S1 Table). All comorbidities were identified based on ICD-9, ICD-10, or problem list documentation within the electronic health record. An indicator variable was created for each comorbidity to denote the presence of the selected ICD-9, ICD-10, or problem list documentation at any time in the medical record. To facilitate analysis, comorbidities were grouped by organ system into the following categories: cardiac, respiratory, hematologic, metabolic, renal, hepatic, autoimmune, cancer, and cerebrovascular disease.

Complications and clinical outcomes

We selected 30 in-hospital complications measured during each patient’s hospital stay for COVID-19 categorized into the following systems: cardiovascular, respiratory, hematologic, renal, hepatic, metabolic, and infectious (S2 Table). If applicable, complications could span multiple organ system variables. For example, ventilator associated pneumonia was included in both infectious and respiratory complications. Additional clinical outcomes included hospital length of stay (LOS), need for intensive care unit (ICU) admission, need for mechanical ventilation, and mortality. Mortality was defined as any in-hospital or out-of-hospital death based on death certificate data. All complications and outcomes were followed for a minimum of 2 weeks following hospital admission.

Statistical analysis

The overall rate of missingness of the 33 variables used for phenotyping, which included the first vitals and labs recorded for each inpatient within 72 hours of admission, was 19% (range 0% - 50%). We imputed missing values using multivariate imputations by chained equations implemented with the mice package (v.3.10.0) [12, 13]. Data were log-transformed before imputing missing values with predictive mean matching. A total of 40 imputed datasets were generated. The diceR package (v.1.0.0) [14] was used to perform k-means-based consensus clustering on each imputed dataset using 80% subsamples and 1,000 iterations. We considered grouping patients into 2–7 phenotypes and determined the optimal number was 3 by evaluating the consensus cumulative distribution function (CDF) plot, the delta area plot, and the consensus matrix heatmap. These figures were generated using the consensus clustering results for each imputed dataset, and all figures were qualitatively similar across datasets. For visualization purposes, these images are provided for a randomly selected imputed dataset in S1S4 Figs. The final assignment of each patient into one of the three phenotypes was determined by majority voting across the 40 consensus clustering results. Principal component analysis (PCA) was performed on the average covariance matrix to visualize the relationships among the three phenotypes and assess variable contributions [15].

Continuous variables were summarized using the median and interquartile range (IQR) and compared across phenotypes using a Kruskal-Wallis test. Categorical characteristics and outcomes were summarized using counts and proportions and compared across phenotypes using a Pearson’s chi-squared test or Fisher’s exact test. Multinomial regression models were fit to further compare patient comorbidities across phenotype classification.

We next evaluated the relationship between phenotype and subsequent outcomes using both unadjusted and adjusted models. The adjusted models included sex [16, 17], race and ethnicity (white, Black, Asian, Hispanic, other, not reported) [18], and Elixhauser Comorbidity Index [19], since these are known risk factors for the outcomes of interest and were not included in the clustering analysis. The associations between phenotype and complications, ICU admission and need for mechanical ventilation, were estimated using logistic regression models. Mortality was compared across phenotypes using Cox proportional hazard models and patients were censored at the last date of data collection, August 25, 2020. Hospital length of stay was compared across phenotypes using negative binomial regression models. The primary negative binomial model included individuals who died during hospitalization for whom length of stay was defined as the number of days until death. We performed a sensitivity analysis to assess the impact of mortality as a competing risk by refitting the length of stay model after removing the 127 patients who died. Two-sided p-values < 0.05 were considered statistically significant. P-values were not adjusted for multiple comparisons. Visualizations of comorbidities, complications, and outcomes by clinical phenotype were performed using the circlize package for R [20]. Comorbidities and complications were grouped into separate organ systems and the prevalence of each complication/comorbidity type was calculated as a percentage for each phenotype. All analyses were conducted using R version 3.6.3 [21] and Stata version 16.1 (StataCorp).

Results

The database included 1,022 patients requiring hospital admission with COVID-19. Among these patients, the median age was 62.1 [IQR: 45.9, 75.8] years; 481 [48.6%] male, 412 [40.3%] required ICU admission. Additionally, 437 [46.7%] were white, 188 [20.1%] were Black, 159 [17.0%] were Asian, 103 [11.0%] were Hispanic, 20 [2.1%] reported other race, and 28 [2.9%] did not report. Three clinical phenotypes were identified (I, II, III); 236 [23.1%] patients had phenotype I, 613 [60%] patients had phenotype II, and 173 [16.9%] patients had phenotype III.

Variable contributions to clustering

The first two principal components (PCs) from PCA were used to visualize the relationship between phenotypes. PC1 and PC2 captured approximately 11% and 9% of the variance in the clustering variables, respectively. Thirteen components were needed to explain 70% of the variance (S5 Fig). While phenotypes II and III overlay substantially, phenotype I is more clearly defined in the right-hand side of the score plot of the first two principal components (Fig 1). Notably, this figure shows that distinctions between phenotypes are primarily driven by variation in PC1 as opposed to PC2. The variable contributions to PC1 (S6 Fig) demonstrate that the largest contributors to the variation in PC1 are from lactate dehydrogenase (LDH), absolute neutrophil count, and D-dimer. These variables therefore prominently contribute to separating the three phenotypes as shown in the biplot (Fig 2). Univariate tests showed that LDH, D-dimer, and neutrophil count are highest in phenotype I. Other variables influential to phenotype clustering are white cell count (highest in I), C-reactive protein (highest in I), albumin (highest in III), aspartate aminotransferase (highest in I), bilirubin (highest in I), and oxygen saturation (highest in III).

Fig 1. Score plot: PC2 vs. PC1.

Fig 1

The principal component scores for PC1 and PC2 are plotted. Each point represents a patient in the dataset. Colors represent the cluster (phenotype) that the patient was assigned to by consensus clustering. Ellipses around each cluster/phenotype specify 95% confidence intervals, assuming a bivariate normal distribution. Abbreviations: PC1 (principal component 1); PC2 (principal component 2).

Fig 2. PCA biplot: PC2 vs. PC1.

Fig 2

The scores (points) and loadings (arrows) of PC1 and PC2 are plotted for each patient and variable in the model. 95% confidence ellipses for the scores are shown. The biplot facilitates interpretation of the scores and loadings, assigning context to the variables which prominently contribute to the phenotypes. Abbreviations: PC1 (principal component 1); PC2 (principal component 2); PCA (principal component analysis); Abs_Nphil_Ct (absolute neutrophil count); LDH (lactate dehydrogenase); CRP (C-reactive protein); WBC (white blood cell count); HCT (hematocrit); HGB (hemoglobin); Tbili (total bilirubin); RDW (red cell distribution width); AST (aspartate aminotransferase); Alk_phos (alkaline phosphatase); RR (respiratory rate); CA (calcium); TP (total protein); INR (internal normalized ratio of prothrombin time); CO2 (carbon dioxide); K (potassium); O2SAT (oxygen saturation); BMI (body mass index); PLT (platelet); PP (pulse pressure); Na (sodium); SBP (systolic blood pressure); Abs_mono_ct (absolute monocyte count); MCV (mean corpuscular volume).

Phenotype characteristics

Differences across phenotypes with respect to patient demographics, admission vitals and labs, complications, comorbidities, and clinical outcomes are presented in Table 1. Patients with phenotype I were older than patients in phenotypes II and III (67.2 [52.9, 79.0] years vs. 60.9 [45.9, 75.4] and 58.6 [34.8, 71.3] years respectively, p < 0.001). Patients with phenotype III were more often female than patients with phenotype I or II (57.6% vs. 41.6% and 53.4%, respectively, p = 0.002). Patients with phenotype I were less likely to be white (38.8% vs. 45.6% vs. 60.7%, respectively, p = 0.002) and more likely to be non-English speaking (47.9% vs. 39.2% vs. 23.7%, respectively, p <0.001). There were no statistically significant differences in BMI or socioeconomic status, as measured using the area deprivation index, between phenotypes (Table 1). Patients that presented with phenotype III had a more frequent history of smoking, alcohol abuse, and neutropenia. Patients that presented with phenotype II had a less frequent history of hepatic disease than phenotypes I or III (Table 1).

Table 1. Baseline demographics, comorbidities, and clinical outcomes of hospitalized COVID-19 patients with clinical phenotypes I, II, and III.

Phenotype I Phenotype II Phenotype III P-value
N = 236 N = 613 N = 173
Demographics
Age (years) 67.2 (52.9–79.0) 60.9 (45.9–75.4) 58.6 (34.8–71.3) <0.001
Male 132 (58.4%) 277 (46.6%) 72 (42.4%) 0.002
Race / Ethnicity 0.002
    White 81 (38.8%) 257 (45.6%) 99 (60.7%)
    Black 53 (25.4%) 105 (18.7%) 30 (18.4%)
    Asian 39 (18.7%) 101 (17.9%) 19 (11.7%)
    Hispanic 26 (12.4%) 66 (11.7%) 11 (6.7%)
    Declined 3 (1.4%) 22 (3.9%) 3 (1.8%)
    Other 7 (3.3%) 12 (2.1%) 1 (0.6%)
Non-English Speaking 113 (47.9%) 240 (39.2%) 41 (23.7%) <0.001
National ADI 44.5 (25.0–56.0) 43.0 (25.0–56.0) 37.0 (26.0–62.0) 0.76
BMI (kg/m2), mean (SD) 29.5 (8.9) 30.8 (8.2) 30.4 (13.4) 0.21
Smoker 9 (3.8) 44 (7.2) 18 (10.4) 0.03
Alcohol abuse 14 (5.9) 47 (7.7) 28 (16.2) <0.001
Comorbidities
Elixhauser Comorbidity Index 7.0 (4.0–10.0) 5.0 (3.0–9.0) 5.0 (2.0–8.0) <0.001
Cardiac 194 (82.2%) 428 (69.8%) 110 (63.6%) <0.001
Respiratory 55 (23.3%) 198 (32.3%) 68 (39.3%) 0.002
Hematologic 127 (53.8%) 220 (35.9%) 53 (30.6%) <0.001
Metabolic 175 (74.2%) 477 (77.8%) 121 (69.9%) 0.08
Renal 92 (39.0%) 170 (27.7%) 37 (21.4%) <0.001
Hepatic 46 (19.5%) 82 (13.4%) 25 (14.5%) 0.08
Autoimmune 40 (16.9%) 126 (20.6%) 23 (13.3%) 0.07
Cancer 29 (12.3%) 73 (11.9%) 16 (9.2%) 0.58
Cerebrovascular disease 52 (22.0%) 106 (17.3%) 33 (19.1%) 0.28
Blood Type O 72 (42.4%) 158 (39.0%) 39 (37.5%) 0.67
In-hospital Complications
Cardiovascular 16 (6.8%) 46 (7.5%) 13 (7.5%) 0.93
Respiratory 49 (20.8%) 104 (17.0%) 14 (8.1%) 0.002
Hematologic 27 (11.4%) 35 (5.7%) 10 (5.8%) 0.01
Renal 54 (22.9%) 60 (9.8%) 7 (4.0%) <0.001
Metabolic 85 (36.0%) 141 (23.0%) 18 (10.4%) <0.001
Hepatic 21 (8.9%) 4 (0.7%) 2 (1.2%) <0.001
Infectious 76 (32.2%) 134 (21.9%) 27 (15.6%) <0.001
Clinical Outcomes
ICU Admission 158 (66.9%) 220 (35.9%) 34 (19.7%) <0.001
Mechanical Ventilation 98 (41.5%) 88 (14.4%) 4 (2.3%) <0.001
Hospital Readmission 6 (2.5%) 29 (4.7%) 14 (8.1%) 0.03
ECMO 7 (3.0%) 1 (0.2%) 0 (0.0%) <0.001
In- or Out of hospital mortality 63 (26.7%) 57 (9.3%) 7 (4.0%) <0.001
Admission Vitals and Labs Phenotype I Phenotype II Phenotype III P value
Heart rate (mean (SD)) 96.17 (20.82) 93.93 (19.35) 90.16 (22.3) 0.01
Respiratory rate 22.0 (18.0–28.0) 20.0 (18.0–23.0) 18.0 (16.0–20.0) <0.001
Oxygen saturation 94.0 (89.0–97.0) 95.0 (92.0–97.0) 97.0 (95.0–99.0) <0.001
Pulse pressure 55.0 (43.5–70.5) 53.0 (43.0–68.0) 51.0 (40.0–62.0) 0.02
SBP (mean (SD)) 133.29 (27.14) 132.46 (23.54) 134.10 (26.26) 0.72
Total protein 6.5 (5.9–7.0) 6.7 (6.20–7.2) 6.6 (6.2–7.1) 0.01
Red cell distribution width 14.1 (13.2–15.4) 13.5 (12.9–14.7) 13.5 (12.8–14.6) <0.001
Mean corpuscular volume 90.0 (86.0–94.0) 89.0 (85.0–93.0) 92.0 (88.0–95.3) <0.001
Alkaline phosphatase 88.0 (67.5–129.0) 71.0 (55.5–92.0) 72.0 (58.-88.0) <0.001
Calcium 8.10 (7.6–8.5) 8.30 (8.0–8.7) 8.40 (8.1–8.9) <0.001
Anion gap 9.0 (7.0–12.0) 8.0 (6.0–10.0) 7.0 (6.0–9.0) <0.001
CO2 23.25 (21.0–26.0) 24.0 (22.0–27.0) 25.0 (23.0–27.8) <0.001
Hematocrit 36.40 (32.3–40.2) 37.60 (33.6–41.1) 38.45 (35.7–41.5) <0.001
Aspartate aminotransferase 55.0 (38.0–95.0) 35.0 (24.0–53.0) 29.0 (20.0–44.0) <0.001
Glucose 122.0 (101.0–165.0) 112.0 (96.0–149.5) 104.0 (91.0–126.5) <0.001
Absolute monocyte count 0.40 (0.3–0.8) 0.40 (0.3–0.6) 0.50 (0.3–0.7) <0.001
Platelets 206.0 (160.0–290.0) 190.0 (149.0–243.0) 196.0 (142.5–247.5) 0.01
Albumin 2.40 (2.0–2.7) 2.80 (2.5–3.1) 3.10 (2.8–3.4) <0.001
Bilirubin 0.70 (0.4–1.1) 0.40 (0.3–0.6) 0.40 (0.3–0.6) <0.001
INR 1.11 (1.03–1.28) 1.06 (0.99–1.17) 1.08 (0.98–1.21) 0.001
Lactate dehydrogenase 460.5 (380.0–562.8) 308.0 (249.0–394.0) 231.0 (180.0–293.5) <0.001
Potassium 4.0 (3.6–4.3) 3.80 (3.6–4.2) 3.80 (3.6–4.2) 0.101
Sodium 137.5 (134.0–141.0) 137.0 (135.0–139.0) 138.0 (136.0–140.0) 0.003
D-dimer 3.08 (1.71–5.57) 0.87 (0.59–1.27) 0.60 (0.36–1.05) <0.001
Hemoglobin 11.90 (10.5–13.1) 12.20 (10.7–13.5) 12.40 (11.3–13.7) 0.01
C-reactive protein 157.0 (102.0–244.0) 89.0 (55.0–134.8) 12.0 (5.0–20.0) <0.001
Creatinine 1.06 (0.77–1.62) 0.84 (0.69–1.13) 0.80 (0.68–1.03) <0.001
Absolute neutrophil count 8.05 (5.75–11.42) 4.20 (3.0–6.0) 2.90 (1.8–4.3) <0.001
Absolute lymphocyte count 0.90 (0.6–1.3) 0.90 (0.7–1.3) 1.30 (0.9–1.7) <0.001
WBC 8.74 (5.68–15.42) 4.50 (3.0–6.71) 2.36 (1.31–3.77) <0.001
Gamma Gap 9.80 (7.2–13.2) 5.90 (4.3–7.6) 4.90 (3.9–7.3) <0.001

Table 1 presents summary statistics of patient demographics, comorbidities, in-hospital complications, clinical outcomes, and admission vitals and labs for each clinical phenotype (I, II, III). Admissions vitals and labs were used to create the phenotypes. Categorical variables are presented as count (%). Continuous variables are presented as median (interquartile range) unless otherwise specified.

Abbreviations: ADI, area deprivation index; BMI, body mass index; INR, internal normalized ratio of prothrombin time; ECMO, Extracorporeal membrane oxygenation; ICU, Intensive Care Unit

When grouping comorbidities by organ system, cardiac (p <0.001), respiratory (p = 0.002), hematologic (p <0.001), and renal (p <0.001) comorbidities were found to be significantly associated with phenotype. Cancer, hepatic, autoimmune, cerebrovascular, and metabolic comorbidities were not significantly associated with phenotype (Table 1, S7 Fig). Based on the estimated relative risk ratios, patients with renal (RRR 2.35; 95% CI 1.5–3.67; p <0.001), hematologic (RRR 2.64; 95% CI 1.75–3.98; p <0.001), and cardiac comorbidities (RRR 2.65; 95% CI: 1.68–4.17; p <0.001) were more likely to have phenotype I vs. III (Fig 3). Patients with respiratory comorbidities were 0.47 (95% CI: 0.31–0.72; p <0.001) times as likely to have phenotype I vs. III and 0.74 (95% CI: 0.52–1.04 p = 0.09) times as likely to have phenotype II vs. III (Fig 3).

Fig 3. Relative risk ratio of comorbidities to clinical phenotypes.

Fig 3

Relative Risk ratios of comorbidities of phenotypes I and II compared to the reference group phenotype III.

Association between phenotype and clinical outcomes

Clinical phenotypes I and II were associated with increased odds of respiratory (I: OR: 2.98, 95% CI 1.58–5.59; II: OR: 2.32, 95% CI: 1.29–4.17; p<0.001), renal (I: OR: 7.04, 95% CI 3.11–15.9; II: OR: 2.57, 95% CI: 1.15–5.74; p <0.001), and metabolic (I: OR: 4.85, 95% CI: 2.78–8.45; II: OR: 2.57, 95% CI: 1.52–4.34; p <0.001) complications, compared to phenotype III after adjusting for sex, race, and Elixhauser Comorbidity Index (S3 Table). There was a trend towards increased odds of hematologic complications among patients with phenotype I (I: OR: 2.11, 95% CI: 0.99–4.48, p = 0.05) compared to III. Phenotype was associated with hepatic complications (p <0.001); however, while phenotype I was associated with a 8.35-fold (OR: 8.35, 95% CI: 1.93–36.11, p < 0.001) increase in the odds of hepatic complication, phenotype II did not differ significantly from phenotype III (OR: 0.56, 95% CI: 0.10–3.09, p = 0.51). This is not surprising since only 4 individuals in phenotype II and 2 in phenotype III experienced hepatic complications during hospitalization (Table 1). Phenotype was also significantly associated with the rate of infectious complications (p <0.001) for phenotype 1 (OR 2.57, 95% CI 1.57–4.21; <0.001) but not did not reach statistical significance for phenotype 2 (OR 1.51, 95% CI 0.96–2.38; p = 0.07) (S3 Table and S8 Fig).

Clinical phenotypes differed in odds of ICU admission (p <0.001) and mechanical ventilation (p <0.001), hospital LOS (p <0.001), and risk of mortality (<0.001) on adjusted analysis which accounted for sex, race, and Elixhauser Comorbidity Index (Table 2, S9 Fig). Controlling for these risk factors and compared to phenotype III, phenotypes I and II were associated with 7.88-fold (OR: 7.88, 95% CI: 4.65–13.37) and 2.32-fold (OR: 2.32, 95% CI: 1.46–3.68) increases in the odds of ICU admission, respectively. Phenotypes I and II were associated with 25.59-fold (OR: 25.59, 95% CI: 7.69,-85.17) and 7.45-fold (OR: 7.45, 95% CI: 2.27–24.43) increases in the odds of requiring mechanical ventilation. Phenotypes I and II were associated with 1.74-fold (IRR: 1.74, 95% CI: 1.45–2.10, p<0.001) and 1.22-fold (IRR: 1.22, 95% CI: 1.05–1.43, p = 0.01) increases in hospital LOS. Phenotype I was associated with a 7.30-fold (HR: 7.30, 95% CI: 3.11–17.17, p <0.001) increase in risk of mortality, and Phenotype II had a 2.57-fold (HR: 2.57, 95% CI: 1.10–6.00, p = 0.03) increase in the hazard of death compared to Phenotype 3. We performed a sensitivity analysis to assess the impact of mortality as a competing risk by fitting the LOS model before and after removing the 127 patients who died. The estimated effect sizes were similar between these two models (S4 Table). Table 2 includes the LOS model with only survivors. S4 Table shows the home medications and Day 5 labs of the three identified phenotypes (S5 Table).

Table 2. Association of clinical phenotype with clinical outcome.

In- and Out- of Hospital Mortality (Cox PH) HR 95% CI P value
Mortality <0.001 (LR test)
    Phenotype I 7.30 3.11–17.17 <0.001
    Phenotype II 2.57 1.10–6.00 0.03
Binary Outcomes (Logistic Regression) OR 95% CI P value
ICU Admission <0.001 (LR test)
    Phenotype I 7.88 4.65–13.37 <0.001
    Phenotype II 2.32 1.46–3.68 <0.001
Mechanical Ventilation <0.001 (LR test)
    Phenotype I 25.59 7.69–85.17 <0.001
    Phenotype II 7.45 2.27–24.43 <0.001
Count Outcome (Binomial Regression) IRR 95% CI P value
Hospital LOS* <0.001 (LR test)
    Phenotype I 1.74 1.45–2.10 <0.001
    Phenotype II 1.22 1.05–1.43 0.01

Abbreviations: PH, proportional hazards; HR, hazard ratio; CI, confidence interval; OR, odds ratio; ICU, intensive care unit; IRR, incidence rate ratio; LOS, length of stay; LR, likelihood ratio.

Reference group for all models is Phenotype III. All models adjusted for sex, race/ethnicity, and Elixhauser Comorbidity Index.

* LOS model only included patients that survived.

Discussion

This is one of the first studies to report on clinical phenotypes associated with COVID-19. We identified three clinical phenotypes for patients with COVID-19 on hospital presentation. Most patients presented with phenotype II, which is associated with a moderate course and an approximately 10% mortality. A subset of patients presented with the more severe phenotype I, which is associated with a staggering 27% mortality. Patients with cardiac, hematologic, and renal comorbidities were most likely to be characterized by phenotype I. Surprisingly, respiratory comorbidities appeared less related to phenotypes I or II and were most associated with phenotype III, which had the most indolent course. Despite this indolent course, patients with phenotype III had the highest rate of readmission which is likely in part due to the high survival rate. This also suggests patients with pre-existing respiratory comorbidities, while not at highest risk for mortality, may be at highest risk for long term sequalae following COVID-19. Patients that presented with phenotype I were most associated with the development of respiratory, hematologic, renal, metabolic, hepatic, and infectious complications. Surprisingly, cardiovascular complications did not significantly differ between phenotypes.

Elucidating patient risk factors and severe COVID-19 disease markers may allow early treatment implementation that may improve the patient’s outcome. Multiple studies have documented COVID-19 risk factors; however, most have done so from a homogenous lens. For example, a prospective cohort study from New York City identified that the most considerable risks for hospital admission were age, male sex, heart failure, chronic kidney disease, and high BMI [22]. A large observational study conducted in the UK reported that increasing age, male gender, comorbidities such as cardiac disease, chronic lung disease, chronic kidney disease, and obesity were associated with higher mortality in COVID-19 positive patients admitted to the hospital.14 A study from China found that increased odds of in-hospital death due to COVID-19 were associated with older age, higher sequential organ failure assessment (SOFA) score and D-dimers > 1.0 μg/mL on admission [23]. Another retrospective study reported that patients with severe COVID-19 disease and diabetes had increased leucocytes, neutrophils count, and increased C-reactive protein (CRP), D-dimers, fibrinogen levels [24]. A systematic review and meta-analysis found that the biomarkers associated with increased mortality include higher CRP, higher D-dimers, increased creatinine, and lower albumin levels [25]. However it is well known that patients do not have a singular natural history of disease. Multiple studies including this study found that only half of patients suffer a primarily respiratory disease [26, 27]. Patients suffer a constellation of cardiovascular, hematologic, renal, or hepatic progression of disease following COVID-19. It is likely patient baseline risk factors related to the virus [28], home medications [16, 29], genetic predisposition [30], race/ethnicity [18], and other factors predispose patients to one of the various clinical manifestations and natural history of COVID-19.

Treatment of hospitalized patients should be tailored based on the clinical courses most likely for a patient given their a priori risk. For example, phenotypes with a higher risk of thrombotic events, may benefit from more aggressive anticoagulation. Phenotypes more prone to infectious complications, may benefit from more targeted immunomodulation instead of broad and systemic steroid therapy. A key first step to evaluate these treatment decisions is to characterize and describe clinical phenotypes requiring hospitalization. In this analysis we identified three clinical phenotypes for patients that required hospitalization for COVID-19. Few studies to date have attempted to elucidate clinical phenotypes. One study attempted to characterize clinical phenotypes at ICU admission using a dataset of 85 critically ill patients [31]. Similar to our analysis, they identified three distinct clinical phenotypes. Their low mortality cluster which they called cluster 1 was very similar to our phenotype III with a predominance of females, lower mortality rate, lower D-dimer and CRP levels. Similarly, their high mortality cluster was predominantly male, with elevated inflammation markers on ICU presentation. In this study, we not only characterized three clinical phenotypes, but extended findings outside of the ICU by characterizing the association of comorbidities with clinical phenotype and the association of clinical phenotypes with in-hospital complication and clinical outcomes.

Phenotype I can be termed the “Adverse phenotype” and was associated with the worst clinical outcomes. Lactate dehydrogenase (LDH), absolute neutrophil count, D-dimer, aspartate aminotransferase (AST), and C-reactive protein (CRP) were most influential in phenotype I determination. The strong association of red cell distribution width (RDW) with phenotype I was interesting. RDW was strongly associated with genetic age which is hypothesized to be a risk factor in COVID-19 [30]. As people age, variability in red blood cell volumes increases. Similarly, Gamma Gap, a marker of immunoglobulin levels, was elevated in all three phenotypes (median > 3.5) [32]. However, patients with clinical phenotype I were noted to have the largest increase in Gamma Gap. In this scenario elevated Gamma Gap was likely an indicator of systemic inflammation and has been associated in other inflammatory disease processes with prognosis. Other groups have previously reported on the importance of the Absolute Neutrophil to Absolute Lymphocyte count, here we noted that ANC/ALC was lowest for phenotype III and highest for phenotype I, in line with previous reports. Patients with cardiac, hematologic, and renal comorbidities were most prone to develop phenotype I. Phenotype I was associated with numerous complications (hematologic, hepatic, metabolic, renal, respiratory, and infectious) when compared to other phenotypes. It is interesting to note despite a higher rate of baseline cardiac comorbidities phenotype I was not associated with increased cardiac complications. Beyond the pathophysiologic differences, it is important to note the higher proportion of non-White and non-English speaking patients in phenotype I. Moreover, socioeconomic status was similar across all phenotypes, which has been proposed to be a driver of disparate outcomes in healthcare. These findings are consistent with a recent study conducted across this populations of patients which found COVID-19 severity to be associated with minority populations and non-English speaking patients, independent of socioeconomic status. Given race/ethnicity and primary language spoken are social constructs and traits, respectively, which are not biologically grounded; these results require further investigation as to why these populations are at higher risk of developing phenotype I through mediation analysis of external factors (as opposed to these populations being an isolated cause of developing an unfavorable phenotype).

Phenotype III was associated with the best clinical outcomes and can be termed the “Favorable Phenotype”. Surprisingly, patients with phenotype III had a very high rate of respiratory comorbidities and the best clinical outcomes. What is most surprising is despite the lowest complication rate and mortality, this phenotype was associated with a greater than 10% rate of hospital readmission. Long-term sequelae from the critically ill remains an important target for patient centered improvements in care given the increasing loss of functional status among ICU patients predating the pandemic. It is possible that patients pre-existing respiratory comorbidities predisposed them to longer term sequelae which may have resulted in this readmission rate, although additional studies are needed to better elucidate these findings, specifically controlling for differences in survival. Patients with respiratory comorbidities such as asthma and COPD routinely use medications which may be protective in SARS-CoV-2 pathogenesis which may explain this protective effect. For example, our group has previously identified reduced mortality in COVID-19 for patients with asthma treated with beta2-agonists [16]. Patients with phenotype III were more likely to use inhaled steroids, nasal fluticasone, albuterol, and antihistamines.

Clinical phenotypes are critical during a pandemic when time and resources are scarce. Phenotypes not only enable the identification of risk factors; they also provide essential insight towards the high yield follow up investigations. For example, while noting the respiratory associations with phenotype III (favorable phenotype) is interesting, the more beneficial take away includes further investigations towards how these underlying conditions and/or their medications may mitigate illness severity. Lastly, by phenotyping patients affected by COVID-19; we set the foundation to begin comparing if these phenotypes are unique to SARS-CoV-2 or if similarities exist elsewhere.

As the attention paid to personalized medicine accelerates; these studies are just the beginning. Future work will expand upon these phenotypes with the hope that they can assist in 1) identifying those at risk of poor outcomes, 2) precisely treating each phenotype (which may not be uniform across all phenotypes), and 3) preventing further complications in those phenotypes at higher risk. In addition, a deeper investigation into clinical phenotypes and associated genomic, transcriptomic, and proteomic is needed. The ability to classify patients into clinical phenotypes can facilitate the linkage of—omics data to better understand SARS-CoV-2 pathogenesis and natural history. Work is already being done to identify genetic host factors that may play a role in determining not only susceptibility to the virus, but also the clinical trajectory when infection does occur. Understanding COVID-19 severity, its biomarkers, and risk factors is paramount during the COVID-19 pandemic.

Our study has several limitations, including that this is a retrospective study and therefore results may be biased or subject to residual confounding. Second, patients were followed for variable lengths of time. Patients that were admitted in March 2020 thus had approximately 5 months of follow-up whereas patients admitted in late August had limited time. We accounted for this by conducting a Cox proportional hazard analysis when analyzing in- and out- of hospital mortality. Additionally, when the data were pulled, only 54 patients (5%) remained hospitalized. While most patients developed complications within their first 2 weeks of hospital admission, it is possible that they may still develop clinical complications which is not reflected in this analysis. Furthermore, our analysis was completed on hospitalized patients. It is important to recognize that our results are restricted to those who required hospitalization. Our data cannot be extrapolated to those with mild COVID-19 (i.e. not requiring hospitalization).

Conclusion

In this retrospective analysis of patients with COVID-19, three clinical phenotypes were identified reflecting adverse, moderate, and favorable outcomes. Patients from each phenotype presented with different comorbidities and developed different complications. Our results suggest that phenotype-specific medical care of COVID-19 may improve outcomes. Future research is urgently needed to determine the utility of these phenotypes in clinical practice and trial design.

Supporting information

S1 Fig. Consensus cumulative distribution functions.

Cumulative distribution functions (CDF) for a randomly selected imputed dataset are shown. A range of phenotypes (2–7) were considered, and the optimal choice of phenotypes is 3.

(TIF)

S2 Fig. Delta area.

The relative change in delta area under the cumulative distribution function is shown for the range of phenotypes (k = 2–7) for a randomly selected imputed dataset. The optimal choice of phenotypes is 3. Abbreviations: CDF (cumulative distribution function).

(TIF)

S3 Fig. Consensus matrix with 3 clusters.

A consensus matrix heatmap is shown for a randomly selected imputed dataset clustered into 3 phenotypes. The heatmap allows visualization of consensus cluster assignments to evaluate cluster stability. Darker shades of green indicate higher stability.

(TIF)

S4 Fig. Consensus matrix with 4 clusters.

A consensus matrix heatmap is shown for a randomly selected imputed dataset clustered into 4 phenotypes. The heatmap allows visualization of consensus cluster assignments to evaluate cluster stability. Darker shades of green indicate higher stability. The choice of 4 clusters shows less stability than 3 clusters (see S3 Fig).

(TIF)

S5 Fig. Cumulative proportion of variance explained.

The proportion of variance explained by each principal component is summed over all principal components. For example, PC1 and PC2 cumulatively explain 20% of the variation in the dataset. Abbreviations: PC1 (principal component 1); PC2 (principal component 2).

(TIF)

S6 Fig. Contribution of variables to PC1.

The contributions of each of the 33 variables used in the clustering to principal component 1 are shown. The red line marks the expected average contribution of each variable if the contributions of the variables were uniform across the dataset. Variables contributing most to the observed pattern in PC1 are D-dimer and albumin. Abbreviations: PC1 (principal component 1); Abs_Nphil_Ct (absolute neutrophil count); LDH (lactate dehydrogenase); CRP (C-reactive protein); WBC (white blood cell count); HCT (hematocrit); HGB (hemoglobin); Tbili (total bilirubin); RDW (red cell distribution width); AST (aspartate aminotransferase); Alk_phos (alkaline phosphatase); RR (respiratory rate); CA (calcium); TP (total protein); INR (internal normalized ratio of prothrombin time); CO2 (carbon dioxide); K (potassium); O2SAT (oxygen saturation); BMI (body mass index); PLT (platelet); PP (pulse pressure); Na (sodium); SBP (systolic blood pressure); Abs_mono_ct (absolute monocyte count); MCV (mean corpuscular volume).

(TIF)

S7 Fig. Comorbidities by phenotype.

Chord diagram illustrates the prevalence of comorbidities (% observed) for the three clinical phenotypes.

(TIF)

S8 Fig. Complications by phenotype.

Chord diagram illustrates the prevalence of complications (% observed) for the three clinical phenotypes.

(TIF)

S9 Fig. Clinical outcomes by phenotype.

Chord diagram illustrates the prevalence of clinical outcomes (% observed) for the three clinical phenotypes. Abbreviations: ICU (intensive care unit); Vent (mechanical ventilation); Readmit (readmission to hospital or ICU); ECMO (extracorporeal membrane oxygenation).

(TIF)

S1 Table. Categories of comorbidities and ICD 10 codes used.

(PDF)

S2 Table. List of complications contributing to each complication category.

(PDF)

S3 Table. Association of clinical phenotype with in-hospital complications.

(PDF)

S4 Table

(PDF)

S5 Table. Home medications and hospital day 5 laboratory values of hospitalized COVID-19 patients with clinical phenotypes I, II, and III.

(PDF)

S1 File

(XLSX)

Data Availability

All data cannot be shared publicly because these data contain private electronic healthcare record data from M Health Fairview and is subject to HIPAA laws which restrict sharing of data without data use agreements in place. In accordance with HIPPA regulations, a de-identified dataset has been included with this publication.

Funding Statement

1. NIH National Heart, Lung, and Blood Institute T32HL07741 (NEI) 2. This research was supported by the Agency for Healthcare Research and Quality (AHRQ) and Patient-Centered Outcomes Research Institute (PCORI), grant K12HS026379 (CJT) and the National Institutes of Health’s National Center for Advancing Translational Sciences, grant UL1TR002494. 3. NIH National Heart, Lung, and Blood Institute T32HL129956 (JP, LS) The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Wu Z, McGoogan JM, (2020) Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72314 Cases From the Chinese Center for Disease Control and Prevention. JAMA [DOI] [PubMed] [Google Scholar]
  • 2.Ingraham NE, Lotfi-Emran S, Thielen BK, Techar K, Morris RS, Holtan SG, et al. (2020) Immunomodulation in COVID-19. Lancet Respir Med 8: 544–546 10.1016/S2213-2600(20)30226-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ingraham NE, Tignanelli CJ, (2020) Fact Versus Science Fiction: Fighting Coronavirus Disease 2019 Requires the Wisdom to Know the Difference. Crit Care Explor 2: e0108 10.1097/CCE.0000000000000108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Group RC, Horby P, Lim WS, Emberson JR, Mafham M, Bell JL, et al. (2020) Dexamethasone in Hospitalized Patients with Covid-19—Preliminary Report. The New England journal of medicine [Google Scholar]
  • 5.Wiersinga WJ, Rhodes A, Cheng AC, Peacock SJ, Prescott HC, (2020) Pathophysiology, Transmission, Diagnosis, and Treatment of Coronavirus Disease 2019 (COVID-19): A Review. JAMA [DOI] [PubMed] [Google Scholar]
  • 6.Li X, Ma X, (2020) Acute respiratory failure in COVID-19: is it "typical" ARDS? Crit Care 24: 198 10.1186/s13054-020-02911-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Diehl JL, Peron N, Chocron R, Debuc B, Guerot E, Hauw-Berlemont C, et al. (2020) Respiratory mechanics and gas exchanges in the early course of COVID-19 ARDS: a hypothesis-generating study. Ann Intensive Care 10: 95 10.1186/s13613-020-00716-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Arentz M, Yim E, Klaff L, Lokhandwala S, Riedo FX, Chong M, et al. (2020) Characteristics and outcomes of 21 critically ill patients with COVID-19 in Washington State. Jama 323: 1612–1614 10.1001/jama.2020.4326 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Qin C, Zhou L, Hu Z, Zhang S, Yang S, Tao Y, et al. (2020) Dysregulation of immune response in patients with COVID-19 in Wuhan, China. Clinical Infectious Diseases 10.1093/cid/ciaa248 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wu C, Chen X, Cai Y, Zhou X, Xu S, Huang H, et al. (2020) Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease 2019 pneumonia in Wuhan, China. JAMA internal medicine 10.1001/jamainternmed.2020.0994 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. (2020) Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The lancet 395: 497–506 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Little RJA, Rubin DB. Statistical analysis with missing data. New York: Wiley; 1987. xiv, 278 [Google Scholar]
  • 13.Stef van Buuren Karin Groothuis-Oudshoorn (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. URL https://www.jstatsoft.org/v45/i03/. [Google Scholar]
  • 14.Chiu Derek and Talhouk Aline (2020). diceR: Diverse Cluster Ensemble in R. R package version 1.0.0. https://CRAN.R-project.org/package=diceR [Google Scholar]
  • 15.Nassiri V, Lovik A, Molenberghs G, Verbeke G, (2018) On using multiple imputation for exploratory factor analysis of incomplete data. Behav Res Methods 50: 501–517 10.3758/s13428-017-1013-4 [DOI] [PubMed] [Google Scholar]
  • 16.Bramante C, Ingraham N, Murray T, Marmor S, Hoversten S, Gronski J, et al. (2020) Observational Study of Metformin and Risk of Mortality in Patients Hospitalized with Covid-19. medRxiv: 2020.2006.2019.20135095 10.1101/2020.06.19.20135095 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Jin JM, Bai P, He W, Wu F, Liu XF, Han DM, et al. (2020) Gender Differences in Patients With COVID-19: Focus on Severity and Mortality. Front Public Health 8: 152 10.3389/fpubh.2020.00152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ingraham NE, Purcell LN, Karam BS, Dudley RA, Usher MG, Warlick CA, et al., (2020) Racial/Ethnic Disparities in Hospital Admissions from COVID-19 and Determining the Impact of Neighborhood Deprivation and Primary Language. medRxiv: 2020.2009.2002.20185983 10.1101/2020.09.02.20185983 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Elixhauser A, Steiner C, Harris DR, Coffey RM, (1998) Comorbidity measures for use with administrative data. Medical care: 8–27 10.1097/00005650-199801000-00004 [DOI] [PubMed] [Google Scholar]
  • 20.Gu Z, Gu L, Eils R, Schlesner M, Brors B, (2014) circlize implements and enhances circular visualization in R. Bioinformatics 30: 2811–2812 10.1093/bioinformatics/btu393 [DOI] [PubMed] [Google Scholar]
  • 21.R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. [Google Scholar]
  • 22.Petrilli CM, Jones SA, Yang J, Rajagopalan H, O’Donnell L, Chernyak Y, et al. (2020) Factors associated with hospital admission and critical illness among 5279 people with coronavirus disease 2019 in New York City: prospective cohort study. BMJ 369: m1966 10.1136/bmj.m1966 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. (2020) Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet 395: 1054–1062 10.1016/S0140-6736(20)30566-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ye W, Chen G, Li X, Lan X, Ji C, Hou M, et al. (2020) Dynamic changes of D-dimer and neutrophil-lymphocyte count ratio as prognostic biomarkers in COVID-19. Respir Res 21: 169 10.1186/s12931-020-01428-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Tian W, Jiang W, Yao J, Nicholson CJ, Li RH, Sigurslid HH, et al. (2020) Predictors of mortality in hospitalized COVID-19 patients: A systematic review and meta-analysis. J Med Virol 10.1002/jmv.26050 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Shi S, Qin M, Shen B, Cai Y, Liu T, Yang F, et al. (2020) Association of Cardiac Injury With Mortality in Hospitalized Patients With COVID-19 in Wuhan, China. JAMA Cardiol [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Yang X, Yu Y, Xu J, Shu H, Xia J, Liu H, et al. (2020) Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respir Med 8: 475–481 10.1016/S2213-2600(20)30079-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ingraham NE, Barakat AG, Reilkoff R, Bezdicek T, Schacker T, Chipman JG, et al. (2020) Understanding the Renin-Angiotensin-Aldosterone-SARS-CoV-Axis: A Comprehensive Review. Eur Respir J: 2000912 10.1183/13993003.00912-2020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Tignanelli CJ, Ingraham NE, Sparks MA, Reilkoff R, Bezdicek T, Benson B, et al. (2020) Antihypertensive drugs and risk of COVID-19? Lancet Respir Med 8: e30–e31 10.1016/S2213-2600(20)30153-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kuo CP L. Atkins JC, et al., (2020) COVID-19 severity is predicted by earlier evidence of accelerated aging. Medrxiv [Google Scholar]
  • 31.Azoulay E, Zafrani L, Mirouse A, Lengline E, Darmon M, Chevret S, (2020) Clinical phenotypes of critically ill COVID-19 patients. Intensive Care Med 46: 1651–1652 10.1007/s00134-020-06120-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Yang M, Xie L, Liu X, Hao Q, Jiang J, Dong B, (2018) The gamma gap predicts 4-year all-cause mortality among nonagenarians and centenarians. Sci Rep 8: 1046 10.1038/s41598-018-19534-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Chiara Lazzeri

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

31 Dec 2020

PONE-D-20-32255

Characterizing COVID-19 Clinical Phenotypes and Associated Comorbidities and Complication Profiles

PLOS ONE

Dear Dr. Ingraham,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Feb 13 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Chiara Lazzeri

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In the ethics statement, please provide further clarification whether the IRB of all participating hospitals provided ethical approval.

3. Please note that all PLOS journals ask authors to adhere to our policies for sharing of data and materials: https://journals.plos.org/plosone/s/data-availability. According to PLOS ONE’s Data Availability policy, we require that the minimal dataset underlying results reported in the submission must be made immediately and freely available at the time of publication. As such, please remove any instances of 'unpublished data' or 'data not shown' in your manuscript and replace these with either the relevant data (in the form of additional figures, tables or descriptive text, as appropriate), a citation to where the data can be found, or remove altogether any statements supported by data not presented in the manuscript.

4. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere.

"It has been published as a preprint: " ext-link-type="uri" xlink:type="simple">https://www.medrxiv.org/content/10.1101/2020.09.12.20193391v1"

Please clarify whether this publication was peer-reviewed and formally published. If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript.

5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

6. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.

7. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section.

8. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Follow the tips in the attachment.

Complete the description of the tables.

Follow the journal format.

Mention the study interval.

You need to explain the practical purpose further.

Identify the importance of the studied phenotypes and intermediate phenotypes.

Reviewer #2: The Authors report on a retrospective study aimed at assessing whether distinct phenotypes can be identified within the COVID-19 spectrum of clinical presentation. A large series of patients admitted to 14 hospital was ascertained.

As the Authors sensibly state, a multidimensional approach is needed to better understand COVID-19 and interpret the variation in clinical outcomes. The article may provide novel pieces of evidence in order to establish a reliable stratification of patients.

The Authors reported that the data set includes all consecutive patients – this strategy limits the ascertainment bias and is an element of strength of the study.

A few issues could be addressed to improve the overall quality of the manuscript.

The study cohort comprised patients admitted to inpatient clinics (n=1022, out of 7538 patients). Therefore, the wide spectrum of phenotypes caused by SARS-CoV-2 infection which was exhibited by the majority of individuals could not be accounted for. I understand that the recruitment setting is constrained by the study design. However, the generalisability of results should be discussed accounting for this limitation.

To put data into the health care context, a brief description of the hospital setting could be helpful – e.g. geographical distribution, dimension of the hospital, type of unit [if not ICU], etc., including the population served.

It is noteworthy that phenotype I was found associated with being non-white and non-English speaking. Though the socioeconomic status was not differently distributed, this finding should be discussed.

To this regard, it should be reported how race/ethnicity was ascertained.

Dissecting the role for constitutional risk factors, and particularly genetic risk factors, is of paramount importance to design effective health care strategies for COVID-19. To this purpose, a clear-cut, evidence-based characterisation of phenotypes is a fundamental step. With this perspective, the implications of the present study deserve to be properly addressed.

Conversely, the Authors outlined the impact of the study in a very simplistic way.

As far as concerns genetic predisposition, accelerated aging is far from being a pivotal reference [page 10 first paragraph, ref. 30]; the term ‘exome data’ [page 11, last paragraph of Discussion] is inappropriate. The Authors should be aware that there is a line of research focussing on the role for host genetic factors in determining variable susceptibility to develop the phenotypes associated to SARS-CoV-2 infection. A large body of literature has been published – see for instance ‘Genetic variants of the human host influencing the coronavirus-associated phenotypes (SARS, MERS and COVID-19): rapid systematic review and field synopsis’, Human Genomics 2020, which also addresses the quality of methodological approaches; Beck and Aksentijevich, Science 2020, and citations therein; and the recently published genome-wide association studies.

Minor issues:

- Some references are incomplete.

- A few acronyms should be defined [e.g. SOFA, RDW].

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PONE.docx

PLoS One. 2021 Mar 31;16(3):e0248956. doi: 10.1371/journal.pone.0248956.r002

Author response to Decision Letter 0


19 Feb 2021

The response to reviewers has also been uploaded with in a word format with out manuscript. Below includes the responses as well.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

01.Response: The manuscript has been appropriately formatted to match PLOS ONE’s style requirements.

2. In the ethics statement, please provide further clarification whether the IRB of all participating hospitals provided ethical approval.

02.Response: The IRB was approved by all hospitals within the MHealth Fairview system which includes ethical approval. This has been updated in the manuscript.

3. Please note that all PLOS journals ask authors to adhere to our policies for sharing of data and materials: https://journals.plos.org/plosone/s/data-availability. According to PLOS ONE’s Data Availability policy, we require that the minimal dataset underlying results reported in the submission must be made immediately and freely available at the time of publication. As such, please remove any instances of 'unpublished data' or 'data not shown' in your manuscript and replace these with either the relevant data (in the form of additional figures, tables or descriptive text, as appropriate), a citation to where the data can be found, or remove altogether any statements supported by data not presented in the manuscript.

03.Response: Thank you for pointing this out, we have added a table which includes the data from the sensitivity analysis. We have also updated the numbering of our supplemental figures to the appropriate order.

4. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere.

"It has been published as a preprint: https://www.medrxiv.org/content/10.1101/2020.09.12.20193391v1"

Please clarify whether this publication was peer-reviewed and formally published. If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript.

04.Response: This publication was not peer-reviewed, and the cover letter has been updated to convey this as well.

5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

05.Response: We have put together a minimal database to be included with our manuscript to support the public data sharing.

6. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.

06.Response: This has been addressed in Response #3. In brief, we have added a figure to ensure all of our data is included in the manuscript.

7. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section.

07.Response: This has been corrected.

8. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

08.Response: This has been corrected.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

________________________________________

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1:

Follow the tips in the attachment.

1) Complete the description of the tables.

09.Response: Thank you for this feedback. This has been corrected in the manuscript

2) Follow the journal format.

10.Response: The manuscript has been updated to the appropriate format for PLOS ONE.

3) Mention the study interval.

11.Response: Thank you for this comment. The study interval is listed in the methods section of the manuscript; however, we have also added it to the abstract March 7, 2020 to August 25, 2020

4) You need to explain the practical purpose further. Identify the importance of the studied phenotypes and intermediate phenotypes.

12.Response: This is a great point and we thank you for the comment. We have expanded upon our discussion regarding the utility of these phenotypes, clinically and academically. (Last paragraph of page 14 through top of page 15)

Reviewer #2:

The Authors report on a retrospective study aimed at assessing whether distinct phenotypes can be identified within the COVID-19 spectrum of clinical presentation. A large series of patients admitted to 14 hospitals was ascertained.

As the Authors sensibly state, a multidimensional approach is needed to better understand COVID-19 and interpret the variation in clinical outcomes. The article may provide novel pieces of evidence in order to establish a reliable stratification of patients.

The Authors reported that the data set includes all consecutive patients – this strategy limits the ascertainment bias and is an element of strength of the study.

A few issues could be addressed to improve the overall quality of the manuscript.

1) The study cohort comprised patients admitted to inpatient clinics (n=1022, out of 7538 patients). Therefore, the wide spectrum of phenotypes caused by SARS-CoV-2 infection which was exhibited by the majority of individuals could not be accounted for. I understand that the recruitment setting is constrained by the study design. However, the generalisability of results should be discussed accounting for this limitation.

13.Response: We appreciate your feedback. This is an important point and we have updated our limitations section to include this point.

To put data into the health care context, a brief description of the hospital setting could be helpful – e.g. geographical distribution, dimension of the hospital, type of unit [if not ICU], etc., including the population served.

It is noteworthy that phenotype I was found associated with being non-white and non-English speaking. Though the socioeconomic status was not differently distributed, this finding should be discussed.

To this regard, it should be reported how race/ethnicity was ascertained.

14.Response: This is a great point. We have expanded upon the hospital system characteristics. Regarding the racial/ethnic and primary language association, we very much appreciate this point. We are currently looking into this and have a manuscript under review which further supports these findings (Race/Ethnicity + Non-English-speaking populations are associated with a higher rate of severe COVID-19 disease which is all independent of socioeconomic status). We have added more to the paragraph regarding phenotype I and made a point to distinguish that this association should be viewed appropriately given the social construct in which race and ethnicity are derived.

Dissecting the role for constitutional risk factors, and particularly genetic risk factors, is of paramount importance to design effective health care strategies for COVID-19. To this purpose, a clear-cut, evidence-based characterisation of phenotypes is a fundamental step. With this perspective, the implications of the present study deserve to be properly addressed.

Conversely, the Authors outlined the impact of the study in a very simplistic way.

As far as concerns genetic predisposition, accelerated aging is far from being a pivotal reference [page 10 first paragraph, ref. 30]; the term ‘exome data’ [page 11, last paragraph of Discussion] is inappropriate. The Authors should be aware that there is a line of research focussing on the role for host genetic factors in determining variable susceptibility to develop the phenotypes associated to SARS-CoV-2 infection. A large body of literature has been published – see for instance ‘Genetic variants of the human host influencing the coronavirus-associated phenotypes (SARS, MERS and COVID-19): rapid systematic review and field synopsis’, Human Genomics 2020, which also addresses the quality of methodological approaches; Beck and Aksentijevich, Science 2020, and citations therein; and the recently published genome-wide association studies.

15.Response: Thank you for this great insight. We have strengthened our section which describes the implications and importance of our work. We have also included the suggested citations (much appreciated) which further drives home our point.

Minor issues:

- Some references are incomplete.

16.Response: This has been corrected

- A few acronyms should be defined [e.g. SOFA, RDW].

17.Response: This has been corrected

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Chiara Lazzeri

9 Mar 2021

Characterizing COVID-19 Clinical Phenotypes and Associated Comorbidities and Complication Profiles

PONE-D-20-32255R1

Dear Dr. Ingraham,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Chiara Lazzeri

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Chiara Lazzeri

18 Mar 2021

PONE-D-20-32255R1

Characterizing COVID-19 clinical phenotypes and associated comorbidities and complication profiles

Dear Dr. Ingraham:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Chiara Lazzeri

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Consensus cumulative distribution functions.

    Cumulative distribution functions (CDF) for a randomly selected imputed dataset are shown. A range of phenotypes (2–7) were considered, and the optimal choice of phenotypes is 3.

    (TIF)

    S2 Fig. Delta area.

    The relative change in delta area under the cumulative distribution function is shown for the range of phenotypes (k = 2–7) for a randomly selected imputed dataset. The optimal choice of phenotypes is 3. Abbreviations: CDF (cumulative distribution function).

    (TIF)

    S3 Fig. Consensus matrix with 3 clusters.

    A consensus matrix heatmap is shown for a randomly selected imputed dataset clustered into 3 phenotypes. The heatmap allows visualization of consensus cluster assignments to evaluate cluster stability. Darker shades of green indicate higher stability.

    (TIF)

    S4 Fig. Consensus matrix with 4 clusters.

    A consensus matrix heatmap is shown for a randomly selected imputed dataset clustered into 4 phenotypes. The heatmap allows visualization of consensus cluster assignments to evaluate cluster stability. Darker shades of green indicate higher stability. The choice of 4 clusters shows less stability than 3 clusters (see S3 Fig).

    (TIF)

    S5 Fig. Cumulative proportion of variance explained.

    The proportion of variance explained by each principal component is summed over all principal components. For example, PC1 and PC2 cumulatively explain 20% of the variation in the dataset. Abbreviations: PC1 (principal component 1); PC2 (principal component 2).

    (TIF)

    S6 Fig. Contribution of variables to PC1.

    The contributions of each of the 33 variables used in the clustering to principal component 1 are shown. The red line marks the expected average contribution of each variable if the contributions of the variables were uniform across the dataset. Variables contributing most to the observed pattern in PC1 are D-dimer and albumin. Abbreviations: PC1 (principal component 1); Abs_Nphil_Ct (absolute neutrophil count); LDH (lactate dehydrogenase); CRP (C-reactive protein); WBC (white blood cell count); HCT (hematocrit); HGB (hemoglobin); Tbili (total bilirubin); RDW (red cell distribution width); AST (aspartate aminotransferase); Alk_phos (alkaline phosphatase); RR (respiratory rate); CA (calcium); TP (total protein); INR (internal normalized ratio of prothrombin time); CO2 (carbon dioxide); K (potassium); O2SAT (oxygen saturation); BMI (body mass index); PLT (platelet); PP (pulse pressure); Na (sodium); SBP (systolic blood pressure); Abs_mono_ct (absolute monocyte count); MCV (mean corpuscular volume).

    (TIF)

    S7 Fig. Comorbidities by phenotype.

    Chord diagram illustrates the prevalence of comorbidities (% observed) for the three clinical phenotypes.

    (TIF)

    S8 Fig. Complications by phenotype.

    Chord diagram illustrates the prevalence of complications (% observed) for the three clinical phenotypes.

    (TIF)

    S9 Fig. Clinical outcomes by phenotype.

    Chord diagram illustrates the prevalence of clinical outcomes (% observed) for the three clinical phenotypes. Abbreviations: ICU (intensive care unit); Vent (mechanical ventilation); Readmit (readmission to hospital or ICU); ECMO (extracorporeal membrane oxygenation).

    (TIF)

    S1 Table. Categories of comorbidities and ICD 10 codes used.

    (PDF)

    S2 Table. List of complications contributing to each complication category.

    (PDF)

    S3 Table. Association of clinical phenotype with in-hospital complications.

    (PDF)

    S4 Table

    (PDF)

    S5 Table. Home medications and hospital day 5 laboratory values of hospitalized COVID-19 patients with clinical phenotypes I, II, and III.

    (PDF)

    S1 File

    (XLSX)

    Attachment

    Submitted filename: PONE.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All data cannot be shared publicly because these data contain private electronic healthcare record data from M Health Fairview and is subject to HIPAA laws which restrict sharing of data without data use agreements in place. In accordance with HIPPA regulations, a de-identified dataset has been included with this publication.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES