Severe acute infection and chronic pulmonary disease are risk factors for developing post-COVID-19 conditions

Pritha Ghosh; Michiel JM Niesen; Colin Pawlowski; Hari Bandi; Unice Yoo; Patrick J Lenehan; Praveen Kumar M; Mihika Nadig; Jason Ross; Sankar Ardhanari; John C O’Horo; AJ Venkatakrishnan; Clifford J Rosen; Amalio Telenti; Ryan T Hurt; Venky Soundararajan

doi:10.1101/2022.11.30.22282831

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2022 Dec 1:2022.11.30.22282831. [Version 1] doi: 10.1101/2022.11.30.22282831

Severe acute infection and chronic pulmonary disease are risk factors for developing post-COVID-19 conditions

Pritha Ghosh ^1,⁺, Michiel JM Niesen ^2,⁺, Colin Pawlowski ², Hari Bandi ², Unice Yoo ², Patrick J Lenehan ², Praveen Kumar M ¹, Mihika Nadig ², Jason Ross ³, Sankar Ardhanari ⁴, John C O’Horo ⁵, AJ Venkatakrishnan ², Clifford J Rosen ^6,⁷, Amalio Telenti ⁸, Ryan T Hurt ^5,^*, Venky Soundararajan ^1,^2,^3,^4,^9,^*

PMCID: PMC9753786 PMID: 36523407

Abstract

Post-COVID-19 conditions, also known as “long COVID”, has significantly impacted the lives of many individuals, but the risk factors for this condition are poorly understood. In this study, we performed a retrospective EHR analysis of 89,843 individuals at a multi-state health system in the United States with PCR-confirmed COVID-19, including 1,086 patients diagnosed with long COVID and 1,086 matched controls not diagnosed with long COVID. For these two cohorts, we evaluated a wide range of clinical covariates, including laboratory tests, medication orders, phenotypes recorded in the clinical notes, and outcomes. We found that chronic pulmonary disease (CPD) was significantly more common as a pre-existing condition for the long COVID cohort than the control cohort (odds ratio: 1.9, 95% CI: [1.5, 2.6]). Additionally, long-COVID patients were more likely to have a history of migraine (odds ratio: 2.2, 95% CI: [1.6, 3.1]) and fibromyalgia (odds ratio: 2.3, 95% CI: [1.3, 3.8]). During the acute infection phase, the following lab measurements were abnormal in the long COVID cohort: high triglycerides (mean_longCOVID: 278.5 mg/dL vs. mean_control: 141.4 mg/dL), low HDL cholesterol levels (mean_longCOVID: 38.4 mg/dL vs. mean_control: 52.5 mg/dL), and high neutrophil-lymphocyte ratio (mean_longCOVID: 10.7 vs. mean_control: 7.2). The hospitalization rate during the acute infection phase was also higher in the long COVID cohort compared to the control cohort (rate_longCOVID: 5% vs. rate_control: 1%). Overall, this study suggests that the severity of acute infection and a history of CPD, migraine, CFS, or fibromyalgia may be risk factors for long COVID symptoms. Our findings motivate clinical studies to evaluate whether suppressing acute disease severity proactively, especially in patients at high risk, can reduce incidence of long COVID.

Introduction

According to CDC estimates, approximately 58% of the United States population has had a SARS-CoV-2 infection at least once through February 2022,¹ and the total number of confirmed COVID-19 deaths surpassed 1 million in May 2022.² Given the high prevalence of COVID-19 and its large burden on health systems and society overall, it is a public health imperative to understand the short, medium, and long-term effects of this disease so that optimal care can be offered to COVID-19 patients during their infection and their convalescence. There is mounting evidence that SARS-CoV-2 infection may have significant long-term health effects for some individuals. For example, some individuals, particularly those infected with earlier variants of SARS-CoV-2, may experience persistent loss of taste and/or smell.³ The WHO developed a clinical case definition for post-COVID-19 conditions (also known as “long COVID”), which include fatigue, shortness of breath, and cognitive dysfunction as common symptoms.⁴ In October 2021, an ICD code for long COVID was adopted internationally (U09.9). According to the National Center for Health Statistics (NCHS) Household Pulse Survey, approximately 34% of individuals who were infected with COVID-19 report symptoms lasting three months or more after their infection.⁵ One large retrospective study found that anosmia, hair loss, sneezing, ejaculation difficulty, and reduced libido were the most commonly reported long COVID symptoms, and risk factors include female sex, belonging to an ethnic minority, socioeconomic deprivation, smoking, obesity, and a wide range of comorbidities.⁶ Currently, prospective studies are underway to characterize the long-term sequelae of COVID-19, including the CDC INSPIRE study,⁷ and the NIH RECOVER initiative.⁸

Here, we conduct a large-scale retrospective analysis of de-identified electronic health records from a multi-state health system to characterize long COVID conditions and associated risk factors. We consider a cohort of patients with long COVID based on an ICD code diagnosis and a control cohort of COVID-19 patients without long COVID diagnosis. We perform 1:1 matching to ensure that the cohorts are balanced on clinical characteristics, including demographics, date of infection, geography, and the number of prior laboratory testing encounters. We examined trends in lab test measurements for these two matched cohorts during a baseline phase before COVID-19 diagnosis and an acute COVID-19 phase. In addition, we compared other clinical features between these two cohorts including hospitalization, diagnoses, medications, and signs and symptoms captured in clinical notes.

Methods

nference platform with de-identified electronic health record data

We used the nference Clinical nSights platform to conduct this analysis. This platform includes de-identified records from over 6.9 million patients, spanning multiple US states. This de-identified environment includes structured tables derived from electronic health records (EHR) data such as ECG waveforms, diagnosis codes, laboratory tests, vital signs, medications administered, medications ordered, procedures, and flowsheets. In addition, this environment includes unstructured tables derived from the EHR, such as ECG, radiology and pathology reports, and clinical notes. All personally identifiable information in this environment (e.g., names, locations, dates) have been excluded or substituted using a best-in-class de-identification methodology.⁹

Study design

In the de-identified EHR database, the study population included all individuals with at least one positive SARS-CoV-2 PCR test between June 1, 2021 (four months before the first use of the long COVID ICD-10 code) to May 28, 2022. Individuals without a primary care provider on record in the health system or with no clinical encounters recorded in the past three years were excluded from the analysis. Individuals with at least one ICD-10 code for long COVID (U09.9, “Post COVID-19 condition, unspecified”) at least 7 days after a positive SARS-CoV-2 PCR test were grouped into the “Long COVID” cohort, and the rest of the study population without this ICD-10 code was grouped into the control cohort. For individuals in the long COVID cohort, the date of the most recent positive PCR test prior to the first U09.9 ICD-10 code was considered to be the index date. For individuals in the control cohort, the date of the first positive PCR test during the study period was considered to be the index date. In Figure 1, we provide an overview of the study design.

Figure 1: — **(a)** Timeline capturing the journey of a patient with long COVID. There are three main phases — i) baseline (10 to 365 days before infection), ii) acute COVID-19 (0 to 14 days after infection), and iii) post-COVID-19 (28 to 42 days after infection). **(b)** Comparison of new onset symptoms and diseases recorded in EHR notes following a positive SARS-CoV-2 PCR test. Only phenotypes for which there is a significant difference in reporting (Fisher’s exact test, p-value < 0.05) between the long COVID and control cohorts are shown. **(c)** Lab test with significant difference between the matched long COVID vs. control cohorts. For each lab test, mean test values and 95% confidence intervals are shown. The normal ranges for these lab tests^28–31 are shaded in gray.

Definition of the matched control cohort

To identify risk factors for long COVID, we constructed a 1:1 matched control cohort starting from the unmatched study population. This cohort was exactly matched on potentially confounding factors for long COVID ICD-10 diagnosis, including demographics (age, sex, race, ethnicity), health system site, date of infection (within two weeks), and the number of lab test encounters at the health system within the past year. Individuals in the long COVID cohort without a corresponding matched control (54 out of 1,140 individuals) were dropped from the matched analysis.

Extraction of phenotypes from clinical notes

A Bidirectional Encoder Representations from Transformers (BERT)-based classification model was used to classify the sentiment for phenotypes mentioned in EHR clinical notes. BERT is a transformer-based machine learning model used for natural language processing of unlabeled data. This model was previously used to identify signs and symptoms of COVID-19,¹⁰ short and long-term complications of COVID-19,¹¹ and adverse events of mRNA-based COVID-19 vaccines.¹² Given a sentence that includes any phenotype, this model outputs one of the following labels: “Yes” - confirmed diagnosis, “Maybe” - possible diagnosis, “No” - ruled out the diagnosis, or “Other” - none of the above. A dataset of 18,490 manually annotated sentences extracted from EHR clinical notes containing over 250 different phenotypes was used to train the model. The classification model achieves an out-of-sample accuracy of 93.6% and precision and recall values above 95%.¹⁰

For this study, we applied the BERT model to classify the sentiment of 64 phenotypes (Table S1) in the clinical notes for individuals in the long COVID and control cohorts during each of the study phases. This list of phenotypes was obtained from the CDC website for long COVID¹³ and publicly available literature sources, and the methodology to identify candidate long COVID phenotypes from publicly available literature sources is described in the following methods section. For the analysis of clinical notes, we first define the following phases: the baseline or pre-COVID-19 phase (10 to 365 days before infection), the acute COVID-19 phase (0 to 14 days after infection), and the post-COVID-19 phase (28 to 42 days after infection). The time window for the post-COVID-19 phase was selected to both capture most new long COVID diagnosis (60% of the long COVID cohort was diagnosed with long COVID before day 42, and 18% was diagnosed between day 28 to 42) and because during this time window we observed significant differences in overall phenotype reporting (Figure S2). Individuals without at least one clinical note during the baseline phase and individuals with less than 42 days of follow-up post-PCR were excluded from this analysis. For the baseline phase, an individual was counted as positive for the phenotype if they had at least one mention of the phenotype with a “Yes” label and the confidence score was greater than 0.8 (a “positive sentiment”). For each prediction, the confidence score is a number between 0 and 1 which reflects the certainty of the model that the prediction is correct, with 0 being the least certain and 1 being the most certain. In this study, we selected a threshold of 0.8 for the confidence score based on manual review of a subset of model predictions. For the acute and post-COVID-19 phases, an individual was counted as positive for a phenotype only if they had a positive sentiment for the phenotype during that phase (i.e. “Yes” label and confidence score > 0.8) without any positive sentiment in the baseline phase. We term such phenotypes as “new onset”. We have also quantified the overall prevalence of positive sentiments for any of the 64 phenotypes during 7-day intervals from 42 days before the positive PCR test to 42 days after the positive PCR test (Figure S2).

Identification of candidate long COVID phenotypes from publicly available literature sources

The nferX Signals application (https://research.nferx.com/dv/202011/signals/) was used to determine candidate long COVID phenotypes from publicly available literature sources. This application enables the user to search for biomedical associations in free-text over 100 million documents from over 80K sources including but not limited to: PubMed articles, clinicaltrials.gov, patent applications, SEC filings, blogs, conferences, and news articles. For this study, we used this application to identify disease phenotypes frequently mentioned in the biomedical literature in the context of long COVID and its associated synonyms (e.g. “pasc”, “post-COVID condition”). The full list of disease phenotypes that were considered includes approximately 140K unique phenotypes compiled from 9 sources which are available in the nferX “Diseases” collection (see Table S2). The synonym lists for each of these phenotypes and “long COVID” were determined by the nferX Signals application. For each phenotype in the “Diseases” collection, we computed a metric called the “nferX local score” which measures the strength of the association between that phenotype and long COVID in the nferX corpus of biomedical literature. The formula to compute the nferX Local Score is provided in the Supplemental Materials (see Figure S1). In particular, phenotypes which co-occur relatively frequently with long COVID in the corpus within a specified word span achieve high local scores, and phenotypes which co-occur relatively infrequently achieve low local scores. Phenotypes with a significantly high association (local score > 3.0) for a word span of +/− 50 words were considered as candidate long COVID phenotypes, excluding COVID-19 and non-specific disease phenotypes. The final list of 64 phenotypes considered for this study is the union of the list of phenotypes on the CDC website for long COVID¹³ and the list of candidate long COVID phenotypes identified by the nferX Signals application (see Table S1).

Comparison of lab measurements

For the matched long COVID and control cohorts, we computed: (a) the mean values of a lab test for each patient contributing to the analysis of this lab test (mean_individual: patient-level data summarization), and (b) the mean values of a lab test for all mean_individual values of patients in a cohort (mean_population: population-level data summarization). We performed these calculations for the baseline and acute COVID-19 phases. We compared the mean_population (hereafter referred to as ‘mean’) values for a lab test between — i) the baseline and acute COVID-19 phases for the long COVID cohort, and ii) the long COVID and control cohorts in the acute COVID-19 phase. In both cases, we report p-values from Mann-Whitney U tests, subsequently corrected for multiple comparisons using the Benjamini-Yekutieli (False Discovery Rate) method. We also calculated 95% confidence intervals around the mean_population values by bootstrap resampling (1000 samples).

Comparison of clinical characteristics

We compared the clinical characteristics of the long COVID and control cohorts and reported odds ratios and 95% confidence intervals. For age, we considered the following buckets: <18, 18-24, 25-34, 35-44, 45-54, 55-64, 65-74, and 75+ years old. For race, we grouped the categories (“Asian,” “Asian - Far East,” and “Asian - Indian Subcontinent”) as “Asian,” and we grouped the categories (“Chose not to disclose,” “Unable to provide,” and “Unknown”) as “Unknown.” For ethnicity, we grouped the categories (“Choose not to disclose” and “Unknown”) as “Unknown.” For the number of lab test encounters within the past year, we considered the following buckets: 0, 1-3, and 4+ lab test encounters. Individuals with at least one dose of the Janssen COVID-19 vaccine or two or more doses of the Pfizer or Moderna COVID-19 vaccines on record were considered to be fully vaccinated. Comorbidities were determined based on ICD codes observed during the baseline phase. Comorbidities in the Charlson Comorbidity Index¹⁴ were considered along with auto-immune diseases and related conditions, including chronic fatigue syndrome, postural tachycardia syndrome without hypotension, fibromyalgia, and migraine. We also compared medications administered or ordered for the matched long COVID and control cohorts during the baseline, acute COVID-19, and post-COVID-19 phases. We report p-values from Fisher’s exact test performed for each phase.

Statistical analysis software

All statistical analyses were performed using the Numpy (version 1.23.3), Scipy (version 1.9.1), and Statmodels (version 0.13.2) in Python 3.9.6.

Results

The study population included 88,943 patients with a positive PCR test for SARS-CoV-2, including 1,140 patients with an ICD-10 code diagnosis for long COVID (U09.9). In Table S3, we provide the clinical characteristics of the unmatched cohorts. We observed that the observed rate of long COVID was higher among females compared to males (odds ratio: 1.42, 95% CI: [1.26, 1.60]). In addition, the median age of individuals in the long COVID cohort was significantly higher compared to the control cohort (Table S3). We performed a matched analysis to control for differences in demographics and other potential confounding factors for long COVID diagnosis (see Methods section for details). In Table 1, we provide the comorbidities and clinical outcomes for the final 1:1 matched long COVID and control cohorts. In Table S4, we provide a summary of the matched clinical characteristics for these cohorts. For the rest of this section, we present the results based on these matched cohorts.

Table 1: Comorbidities and clinical outcomes of long COVID and matched control cohorts.

For each categorical variable, the percentage of patients in each cohort is shown along with the odds ratio and corresponding 95% confidence interval.

	Long COVID cohort (matched)	Control cohort (matched)	Odds Ratio [95% CI]

Number of individuals	1,086	1,086	-

Fully vaccinated before infection¹ (%)
- Pfizer (two or more doses)	37	41	0.85 [0.71, 1.00]
- Moderna (two or more doses)	16	20	0.78 [0.62, 0.97]^*
- Janssen (one or more doses)	5	4	1.37 [0.90, 2.07]
- Any other vaccine (two or more doses)	0	<1	-

Charlson comorbidities in baseline phase (%)
- Cancer	6	6	0.98 [0.68, 1.41]
- Cerebrovascular disease	3	3	0.94 [0.58, 1.53]
- Chronic pulmonary disease	15	8	1.94 [1.48, 2.55]^***
- Congestive heart failure	8	6	1.40 [1.00, 1.97]
- Dementia	<1	<1	-
- Diabetes without chronic complication	12	10	1.14 [0.87, 1.50]
- Hemiplegia or paraplegia	<1	<1	0.33 [0.07, 1.65]
- Metastatic solid tumor	1	2	0.63 [0.34, 1.20]
- Mild liver disease	3	5	0.69 [0.45, 1.06]
- Moderate or severe liver disease	<1	<1	-
- Myocardial infarction	3	1	1.83 [0.99, 3.40]
- Peptic ulcer disease	1	<1	2.52 [0.97, 6.52]
- Peripheral vascular disease	7	5	1.31 [0.92, 1.86]
- Renal disease	14	10	1.40 [1.08, 1.82]^*
- Rheumatic disease	5	3	1.48 [0.97, 2.27]
- at least one of the listed comorbidities	40	36	1.21 [1.01, 1.44]^*

Auto-immune diseases and potentially related conditions in baseline phase (%)
- Chronic Fatigue Syndrome	1	<1	2.16 [0.88, 5.32]
- Postural Tachycardia Syndrome Without Hypotension	<1	<1	-
- Fibromyalgia	4	2	2.25 [1.32, 3.84]^*
- Migraine	10	5	2.22 [1.57, 3.14]^***
- at least one of the listed conditions	13	6	2.27 [1.67, 3.08]^***

Individuals admitted 0-14 days post-infection (%)
- Hospitalized	5	1	4.74 [2.58, 8.70]^***
- ICU admission	3	1	2.63 [1.34, 5.15]^*
- Intubated	3	1	2.29 [1.26, 4.16]^*

Open in a new tab

Odds ratios that are statistically significant (p-value < 0.05) are indicated with *, and those that are highly significant (p-value < 0.001) are indicated with ***. Odds ratios for comparisons with <1% of patients in both cohorts are not shown. The matched clinical characteristics for these two cohorts are provided in Table S4.

Patients who have received COVID-19 vaccine doses from multiple manufacturers are also included here.

Cough, difficulty breathing, and tiredness are the most commonly reported conditions for the long COVID cohort in the post-COVID-19 infection phase

Next, we compared the rates of phenotypes reported in the clinical notes for the long COVID and control cohorts. For each phenotype, we observed higher rates in the long COVID cohort compared to the control cohort during both the acute COVID-19 and post-COVID-19 phases (Figure 1B, Figure S2). Overall phenotype reporting was highest immediately following incidence of COVID-19, and for the long COVID cohort we found that phenotype reporting was increased compared to baseline reporting throughout (Figure S2). In contrast, phenotype reporting in the control cohort was back at baseline levels within 20 days of their positive PCR test (Figure S2). For the long COVID cohort, almost all of the phenotypes were reported at lower rates during the post-COVID-19 phase compared to the acute COVID-19 phase, with the exception of brain fog (increase from <1% to 3%) and sleep problems (increase from 2% to 3%), which were both higher during the post-COVID-19 phase. The most common phenotypes in the long COVID cohort during the post-COVID-19 phase were cough (14%), difficulty breathing (12%), and tiredness (10%).

Comparison of patient characteristics before infection

To identify features associated with a higher risk of developing post-COVID-19 conditions, we assessed differences during the baseline phase. We observed that patients with chronic lung disease had higher rates of long COVID diagnosis (odds ratio: 1.94, 95% CI: [1.48, 2.55]) (Table 1). This subpopulation of patients with chronic lung disease included patients with asthma, COPD, emphysema, and bronchiectasis (Figure S3). We also observed that individuals with renal disease had higher rates of long COVID diagnosis (odds ratio: 1.40, 95% CI: [1.08, 1.82]). Auto-immune diseases and conditions including migraine (odds ratio: 2.40, 95% CI: [1.77, 3.25]) and fibromyalgia (odds ratio: 2.25, 95% CI: [1.32, 3.84]) were also more common as pre-existing conditions in the long COVID cohort.

Comparison of lab test measurements during acute infection

To determine whether there are clinical signatures of acute COVID-19 disease indicative of increased risk for subsequent post-COVID-19 conditions, we assessed differences during the acute COVID-19 phase. We observed differences consistent with increased acute disease severity in the long COVID cohort compared with their matched controls. In the long COVID cohort, hospital admission rates (within 14 days of infection) were significantly increased (Table 1, rate_longCOVID: 5% vs. rate_control: 1%, p-value: <0.001, odds ratio: 4.74 [2.58, 8.70]). Similarly, ICU admission rates were also significantly higher in the long COVID cohort (rate_longCOVID: 3% vs. rate_control: 1%, p-value: <0.01, odds ratio: 2.63 [1.34, 5.15])).

To assess whether laboratory measurements could predict onset of long COVID, we analyzed measurements for 82 tests contributed by more than ten patients in both the long COVID and the control cohorts during acute SARS-CoV-2 infection (Table S5). For 15 lab tests, the long COVID cohort exhibited a significant difference in mean test results (p-value < 0.05) during the acute phase, both compared to the control cohort during the acute phase, and the long COVID cohort during the baseline phase. Further, we compared the test results for these 15 lab tests to their known normal ranges (shaded region, Figure 1C and Figure S4, S5) 6 out of these 15 tests in the long COVID cohort exhibited mean test results outside the normal range in the acute phase (Figure 1C). Specifically, we observed increased levels of: neutrophil-lymphocyte ratio (mean_longCOVID: 10.7, 95% CI: [7.9, 14.3] vs. mean_control: 7.2 [3.9, 11.0]), alanine aminotransferase (42.4 [38.0, 47.6] u/L vs. 36.7 [28.9, 46.4] u/L), and serum triglyceride (278.5 [203.5, 372.7] mg/dL vs. 141.4 [104.3, 187.0] mg/dL). We also observed decreased levels of serum HDL cholesterol (38.4 [32.9, 44.6] mg/dL vs. 52.5 [45.5, 60.3] mg/dL).

Concordant signals of more severe acute disease in long-COVID patients are also found when looking at medications ordered and administered during both the acute and post-acute phases (Figure S6, Table S6). Notably, antivirals, anticoagulants, and steroids were administered at significantly higher rates in the long COVID cohort. We did not observe a significant difference for monoclonal antibodies and administration of Albuterol was already elevated during the baseline, consistent with a higher prevalence of CPD in the long COVID cohort (Figure S7, Table S6).

Discussion

In this study, we provide an in-depth characterization of a cohort of 1,086 patients diagnosed with long COVID compared with a matched control cohort. We found that the long COVID cohort was significantly enriched in patients with a history of CPD, fibromyalgia, and migraine. Additionally, we found that the patients that developed long COVID showed signs of more severe COVID-19 during their acute infection (0 to 14 days after infection) based on hospitalization, lab measurements, and medications administered.

Our findings are consistent with previous studies that have investigated long COVID signs and symptoms. We found that the most common phenotypes reported by long-COVID patients included cough, breathing difficulties, tiredness, and heart palpitations.^13,15 We also found that long COVID patients exhibited low HDL cholesterol and high triglycerides levels in their serum during the acute COVID-19 phase, consistent with a previous retrospective study of 1,411 hospitalized COVID-19 patients.¹⁶ Previous studies have also shown that low albumin levels and elevated transaminases (ALT, AST) are associated with severe COVID-19 outcomes, and we observed the same to a lesser extent (Figure S4).^17–19 Several of the long COVID characteristics observed in this study were also shown to be important variables for a recently described predictor of long COVID, including: difficulty breathing, dyspnea, cough, hospitalization, albuterol use, and CPD.¹⁵

In addition, both patients in the long COVID cohort and those in the control cohort had elevated levels of serum glucose level during the acute COVID-19 phase. In prior work, cases of metabolic dysfunction during and after SARS-CoV-2 infection have been reported ranging from new-onset diabetes mellitus (both Type I and Type 2) to asymptomatic insulin resistance and glucose intolerance.^20–23 Acute SARS-CoV-2 infection can lead to metabolic dysfunction, including abnormal lipid profiles and sustained elevation in plasma glucose, through the chronic elaboration of cytokines, glucocorticoid treatment, and sustained stress related to severe infection and comorbidity.^16,20,24 There is evidence that these metabolic disturbances may be related to viral persistence in adipose tissue.²⁵ Data from the current study also point to the potential for a metabolic signature. For example, changes in glucose disposal and lipid handling during the acute illness phase may be an early signal of further symptomatology and long COVID.

There are several limitations for this analysis. First, this is a retrospective study carried out in a single multi-state health system, so the clinical characteristics of the study population are not representative of the entire population of patients with post-COVID-19 conditions. Second, the ICD-10 code for post-COVID-19 conditions only became available in the United States on October 1, 2021,²⁶ so this analysis was restricted to long COVID cases reported during the Delta and Omicron waves of the pandemic. Third, although we control for demographics, time and date of infection, and the number of prior lab tests, additional confounding factors might explain the differences in the long COVID and control cohorts. For example, individuals in the long COVID cohort may engage in more health-seeking behaviors and thus have higher rates of reported comorbidities than the control group. Additionally, since patients in the long COVID cohort were more likely hospitalized, their EHR data may be more complete. This may partially explain the observed higher rates of medications or disease symptoms in the long COVID cohort due to improved recording. We can therefore not draw causal relationships between observed enrichments and long COVID incidence. Fourth, not all of the patients in the long COVID and control cohorts underwent laboratory testing, so the observed distributions of lab values may not represent the distribution of lab values in the overall cohort. For example, some lab tests are ordered only in cases of a suspected diagnosis. In follow-up studies, methods for imputing missing values may be applied such as zero imputation, mean imputation, and multiple imputation.²⁷

Overall, this study provides further clarity to the identification of risk factors for long COVID and motivates future research on the relationship between early interventions in COVID and the onset of long COVID. Future studies are needed to define individuals at highest risk for persistent symptomatology and possible interventions to forestall or prevent long COVID.

Supplementary Material

NIHPP2022.11.30.22282831v1-supplement-1.pdf^{(2MB, pdf)}

References

1.Clarke K. E. N. Seroprevalence of Infection-Induced SARS-CoV-2 Antibodies — United States, September 2021–February 2022. MMWR Morb. Mortal. Wkly. Rep. 71, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.CDC. COVID Data Tracker. Centers for Disease Control and Prevention; https://covid.cdc.gov/covid-data-tracker (2020). [Google Scholar]
3.Tan B. K. J. et al. Prognosis and persistence of smell and taste dysfunction in patients with covid-19: meta-analysis with parametric cure modelling of recovery curves. BMJ 378, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.A clinical case definition of post COVID-19 condition by a Delphi consensus, 6 October 2021. https://www.who.int/publications/i/item/WHO-2019-nCoV-Post_COVID-19_condition-Clinical_case_definition-2021.1 (2021). [DOI] [PMC free article] [PubMed]
5.Long COVID. https://www.cdc.gov/nchs/covid19/pulse/long-covid.htm#technical_notes (2022).
6.Subramanian A. et al. Symptoms and risk factors for long COVID in non-hospitalized adults. Nat. Med. 1–9 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.O’Laughlin K. N. et al. Study protocol for the Innovative Support for Patients with SARS-COV-2 Infections Registry (INSPIRE): A longitudinal study of the medium and long-term sequelae of SARS-CoV-2 infection. PLoS One 17, e0264260 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.RECOVER: Researching COVID to Enhance Recovery. RECOVER: Researching COVID to Enhance Recovery https://recovercovid.org/.
9.Murugadoss K. et al. Building a best-in-class automated de-identification tool for electronic health records through ensemble learning. Patterns (N Y) 2, 100255 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Wagner T. et al. Augmented curation of clinical notes from a massive EHR system reveals symptoms of impending COVID-19 diagnosis. Elife 9, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Venkatakrishnan A. J. et al. Mapping each pre-existing condition’s association to short-term and long-term COVID-19 complications. NPJ Digit Med 4, 117 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.McMurry R. et al. Real-time analysis of a mass vaccination effort confirms the safety of FDA-authorized mRNA COVID-19 vaccines. Med (N Y) 2, 965–978.e5 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.CDC. Long COVID or Post-COVID Conditions. Centers for Disease Control and Prevention; https://www.cdc.gov/coronavirus/2019-ncov/long-term-effects/index.html (2022). [Google Scholar]
14.Charlson M. E., Pompei P., Ales K. L. & MacKenzie C. R. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J. Chronic Dis. 40, 373–383 (1987). [DOI] [PubMed] [Google Scholar]
15.Pfaff E. R. et al. Identifying who has long COVID in the USA: a machine learning approach using N3C data. Lancet Digit Health 4, e532–e541 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Masana L. et al. Low HDL and high triglycerides predict COVID-19 severity. Sci. Rep. 11, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Huang J. et al. Hypoalbuminemia predicts the outcome of COVID-19 independent of age and co-morbidity. J. Med. Virol. 92, 2152 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Chen C. et al. Hypoalbuminemia – An Indicator of the Severity and Prognosis of COVID-19 Patients: A Multicentre Retrospective Analysis. Infect. Drug Resist. 14, 3699 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Wagner J. et al. Elevated transaminases and hypoalbuminemia in Covid-19 are prognostic factors for disease severity. Sci. Rep. 11, 1–5 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Lim S., Bae J. H., Kwon H.-S. & Nauck M. A. COVID-19 and diabetes mellitus: from pathophysiology to clinical management. Nat. Rev. Endocrinol. 17, 11–30 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Montefusco L. et al. Acute and long-term disruption of glycometabolic control after SARS-CoV-2 infection. Nature Metabolism 3, 774–785 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Barrett C. E. Risk for Newly Diagnosed Diabetes 30 Days After SARS-CoV-2 Infection Among Persons Aged 18 Years — United States, March 1, 2020-June 28, 2021. MMWR Morb. Mortal. Wkly. Rep. 71, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Scherer P. E., Kirwan J. P. & Rosen C. J. Post-acute sequelae of COVID-19: A metabolic perspective. (2022) doi: 10.7554/eLife.78200. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Reiterer M. et al. Hyperglycemia in acute COVID-19 is characterized by insulin resistance and adipose tissue infectivity by SARS-CoV-2. Cell Metab. 33, 2174 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Marténez-Colón G. J. et al. SARS-CoV-2 infection drives an inflammatory response in human adipose tissue through infection of adipocytes and macrophages. Sci. Transl. Med. (2022) doi: 10.1126/scitranslmed.abm9151. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.CDC. Public Health Recommendations. Centers for Disease Control and Prevention; https://www.cdc.gov/coronavirus/2019-ncov/hcp/clinical-care/post-covid-public-health-recs.html (2022). [Google Scholar]
27.Groenwold R. H. H. Informative missingness in electronic health record systems: the curse of knowing. Diagnostic and Prognostic Research 4, 1–6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Tests and procedures. https://www.mayoclinic.org/tests-procedures (2020).
29.Test catalog - mayo clinic laboratories. https://www.mayocliniclabs.com/test-catalog.
30.Diagnostics & Testing. Cleveland Clinic https://my.clevelandclinic.org/health/diagnostics.
31.Medical Tests. ucsfhealth.org https://www.ucsfhealth.org/medical-tests.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHPP2022.11.30.22282831v1-supplement-1.pdf^{(2MB, pdf)}

[R1] 1.Clarke K. E. N. Seroprevalence of Infection-Induced SARS-CoV-2 Antibodies — United States, September 2021–February 2022. MMWR Morb. Mortal. Wkly. Rep. 71, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.CDC. COVID Data Tracker. Centers for Disease Control and Prevention; https://covid.cdc.gov/covid-data-tracker (2020). [Google Scholar]

[R3] 3.Tan B. K. J. et al. Prognosis and persistence of smell and taste dysfunction in patients with covid-19: meta-analysis with parametric cure modelling of recovery curves. BMJ 378, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.A clinical case definition of post COVID-19 condition by a Delphi consensus, 6 October 2021. https://www.who.int/publications/i/item/WHO-2019-nCoV-Post_COVID-19_condition-Clinical_case_definition-2021.1 (2021). [DOI] [PMC free article] [PubMed]

[R5] 5.Long COVID. https://www.cdc.gov/nchs/covid19/pulse/long-covid.htm#technical_notes (2022).

[R6] 6.Subramanian A. et al. Symptoms and risk factors for long COVID in non-hospitalized adults. Nat. Med. 1–9 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.O’Laughlin K. N. et al. Study protocol for the Innovative Support for Patients with SARS-COV-2 Infections Registry (INSPIRE): A longitudinal study of the medium and long-term sequelae of SARS-CoV-2 infection. PLoS One 17, e0264260 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.RECOVER: Researching COVID to Enhance Recovery. RECOVER: Researching COVID to Enhance Recovery https://recovercovid.org/.

[R9] 9.Murugadoss K. et al. Building a best-in-class automated de-identification tool for electronic health records through ensemble learning. Patterns (N Y) 2, 100255 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Wagner T. et al. Augmented curation of clinical notes from a massive EHR system reveals symptoms of impending COVID-19 diagnosis. Elife 9, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Venkatakrishnan A. J. et al. Mapping each pre-existing condition’s association to short-term and long-term COVID-19 complications. NPJ Digit Med 4, 117 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.McMurry R. et al. Real-time analysis of a mass vaccination effort confirms the safety of FDA-authorized mRNA COVID-19 vaccines. Med (N Y) 2, 965–978.e5 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.CDC. Long COVID or Post-COVID Conditions. Centers for Disease Control and Prevention; https://www.cdc.gov/coronavirus/2019-ncov/long-term-effects/index.html (2022). [Google Scholar]

[R14] 14.Charlson M. E., Pompei P., Ales K. L. & MacKenzie C. R. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J. Chronic Dis. 40, 373–383 (1987). [DOI] [PubMed] [Google Scholar]

[R15] 15.Pfaff E. R. et al. Identifying who has long COVID in the USA: a machine learning approach using N3C data. Lancet Digit Health 4, e532–e541 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Masana L. et al. Low HDL and high triglycerides predict COVID-19 severity. Sci. Rep. 11, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Huang J. et al. Hypoalbuminemia predicts the outcome of COVID-19 independent of age and co-morbidity. J. Med. Virol. 92, 2152 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Chen C. et al. Hypoalbuminemia – An Indicator of the Severity and Prognosis of COVID-19 Patients: A Multicentre Retrospective Analysis. Infect. Drug Resist. 14, 3699 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Wagner J. et al. Elevated transaminases and hypoalbuminemia in Covid-19 are prognostic factors for disease severity. Sci. Rep. 11, 1–5 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Lim S., Bae J. H., Kwon H.-S. & Nauck M. A. COVID-19 and diabetes mellitus: from pathophysiology to clinical management. Nat. Rev. Endocrinol. 17, 11–30 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Montefusco L. et al. Acute and long-term disruption of glycometabolic control after SARS-CoV-2 infection. Nature Metabolism 3, 774–785 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Barrett C. E. Risk for Newly Diagnosed Diabetes 30 Days After SARS-CoV-2 Infection Among Persons Aged 18 Years — United States, March 1, 2020-June 28, 2021. MMWR Morb. Mortal. Wkly. Rep. 71, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Scherer P. E., Kirwan J. P. & Rosen C. J. Post-acute sequelae of COVID-19: A metabolic perspective. (2022) doi: 10.7554/eLife.78200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Reiterer M. et al. Hyperglycemia in acute COVID-19 is characterized by insulin resistance and adipose tissue infectivity by SARS-CoV-2. Cell Metab. 33, 2174 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Marténez-Colón G. J. et al. SARS-CoV-2 infection drives an inflammatory response in human adipose tissue through infection of adipocytes and macrophages. Sci. Transl. Med. (2022) doi: 10.1126/scitranslmed.abm9151. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.CDC. Public Health Recommendations. Centers for Disease Control and Prevention; https://www.cdc.gov/coronavirus/2019-ncov/hcp/clinical-care/post-covid-public-health-recs.html (2022). [Google Scholar]

[R27] 27.Groenwold R. H. H. Informative missingness in electronic health record systems: the curse of knowing. Diagnostic and Prognostic Research 4, 1–6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Tests and procedures. https://www.mayoclinic.org/tests-procedures (2020).

[R29] 29.Test catalog - mayo clinic laboratories. https://www.mayocliniclabs.com/test-catalog.

[R30] 30.Diagnostics & Testing. Cleveland Clinic https://my.clevelandclinic.org/health/diagnostics.

[R31] 31.Medical Tests. ucsfhealth.org https://www.ucsfhealth.org/medical-tests.

PERMALINK

This is a preprint.

Severe acute infection and chronic pulmonary disease are risk factors for developing post-COVID-19 conditions

Pritha Ghosh

Michiel JM Niesen

Colin Pawlowski

Hari Bandi

Unice Yoo

Patrick J Lenehan

Praveen Kumar M

Mihika Nadig

Jason Ross

Sankar Ardhanari

John C O’Horo

AJ Venkatakrishnan

Clifford J Rosen

Amalio Telenti

Ryan T Hurt

Venky Soundararajan

Abstract

Introduction

Methods

nference platform with de-identified electronic health record data

Study design

Figure 1: Study Overview.

Definition of the matched control cohort

Extraction of phenotypes from clinical notes

Identification of candidate long COVID phenotypes from publicly available literature sources

Comparison of lab measurements

Comparison of clinical characteristics

Statistical analysis software

Results

Table 1: Comorbidities and clinical outcomes of long COVID and matched control cohorts.

Cough, difficulty breathing, and tiredness are the most commonly reported conditions for the long COVID cohort in the post-COVID-19 infection phase

Comparison of patient characteristics before infection

Comparison of lab test measurements during acute infection

Discussion

Supplementary Material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases