Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2024 Oct;89(4):None. doi: 10.1016/j.jinf.2024.106235

Prevalence, risk factors and characterisation of individuals with long COVID using Electronic Health Records in over 1.5 million COVID cases in England

Han-I Wang a,e,⁎,1, Tim Doran a, Michael G Crooks b, Kamlesh Khunti c, Melissa Heightman d, Arturo Gonzalez-Izquierdo e, Muhammad Qummer Ul Arfeen e, Antony Loveless f, Amitava Banerjee e, Christina Van Der Feltz-Cornelis a,b,e
PMCID: PMC11409608  PMID: 39121972

Summary

Objectives

This study examines clinically confirmed long-COVID symptoms and diagnosis among individuals with COVID in England, aiming to understand prevalence and associated risk factors using electronic health records. To further understand long COVID, the study also explored differences in risks and symptom profiles in three subgroups: hospitalised, non-hospitalised, and untreated COVID cases.

Methods

A population-based longitudinal cohort study was conducted using data from 1,554,040 individuals with confirmed SARS-CoV-2 infection via Clinical Practice Research Datalink. Descriptive statistics explored the prevalence of long COVID symptoms 12 weeks post-infection, and Cox regression models analysed the associated risk factors. Sensitivity analysis was conducted to test the impact of right-censoring data.

Results

During an average 400-day follow-up, 7.4% of individuals with COVID had at least one long-COVID symptom after acute phase, yet only 0.5% had long-COVID diagnostic codes. The most common long-COVID symptoms included cough (17.7%), back pain (15.2%), stomach-ache (11.2%), headache (11.1%), and sore throat (10.0%). The same trend was observed in all three subgroups. Risk factors associated with long-COVID symptoms were female sex, non-white ethnicity, obesity, and pre-existing medical conditions like anxiety, depression, type II diabetes, and somatic symptom disorders.

Conclusions

This study is the first to investigate the prevalence and risk factors of clinically confirmed long-COVID in the general population. The findings could help clinicians identify higher risk individuals for timely intervention and allow decision-makers to more efficiently allocate resources for managing long-COVID.

Keywords: Long COVID, Post SARS-CoV-2, Symptoms, Prevalence, Risk factor

Introduction

The term long-COVID is commonly used to describe signs and symptoms that individuals continue to experience, or newly develop, after an acute infection with COVID-19. Based on the National Institute for Clinical Excellence (NICE) definition, this includes both ongoing symptomatic COVID-19, persisting for four to 12 weeks, and post-COVID-19 syndrome, with signs and symptoms persisting more than 12 weeks and not explained by an alternative diagnosis.1

Studies have identified over 100 symptoms associated with long-COVID,2, 3, 4, 5, 6, 7 with estimated prevalence of long-COVID ranging from less than 5% among individuals with initially mild infections4, 7 to 70% among hospitalised COVID cases.8 However, the true prevalence of long-COVID and its symptoms in the general population remains unclear. This is because the reported rates vary widely due to country-specific differences, representation of the study population,9 and reliance on self-reporting versus clinical confirmation of symptoms. Self-report studies have two significant limitations. First, potential over-diagnosis without clinical confirmation introduces bias in estimating the prevalence of long-COVID. Second, individuals reporting symptoms may not necessarily seek treatment. The discrepancy between symptom reporting and health service utilisation poses difficulties in planning resource allocation and modelling demand for healthcare interventions. To address these concerns, two matched case-controlled studies have utilised electronic health records (EHRs) to examine the prevalence of long-COVID symptoms in post-hospitalised COVID-19 individuals and non-hospitalised individuals.5, 9 However, the findings of these studies cannot be generalised to the general population.

Several studies have also identified risk factors associated with long-COVID. A study including 4128 general COVID cases indicated that age, female sex, hospitalisation during acute COVID-19, and comorbidities like asthma may increase the risk of long-COVID.4 Another study involving 2712 community-dwelling individuals with COVID showed an association between female sex, smoking, poor self-perceived health status, and the presence of severe long-COVID symptoms.10 However, both studies have limitations in terms of generalisability, as they relied on self-reported symptoms.

Existing studies so far have not compared long-COVID risks or symptoms between individuals treated for COVID-19 in hospitals or primary care setting and those who did not receive treatment. Such comparisons are crucial for informing COVID-19 management policies, especially with the development of specialised long-COVID clinics.11 While some research suggests that hospitalised COVID-19 individuals, particularly those intubated, are at higher risk of developing long-COVID compared to outpatients,12 the link between acute infection severity and long-COVID is still under debate. This includes the use of immunosuppressive drugs, given the ongoing debate about their potential protective role.13 Additionally, whether staying home during the acute phase, as instructed during the pandemic, is a risk factor of developing long-COVID compared to receiving treatment remains unclear. To address these aforementioned issues, a study based on the general population, utilising primary care and hospital data from national healthcare registers and comparing between subgroups, is needed to inform healthcare policy and service planning.

The aim of this study is to provide a comprehensive overview of long-COVID prevalence in the general population, and to assess the associations between demographic and clinical risk factors and having long-COVID among adults following confirmed SARS-CoV-2 infection using national EHRs. To further understand long-COVID, the study also explored differences in risks and symptom profiles in three subgroups of general population: individuals who received hospital treatment (hospitalised), those who received only primary care treatment (non-hospitalised), and those who received no treatment (untreated) during the acute phase.

Method

Study design and data source

A retrospective, population-based, longitudinal cohort study was conducted using data from both the Clinical Practice Research Datalink (CPRD) AURUM and GOLD databases. CPRD GOLD, the original version of the database covering 9% of the UK population, and CPRD AURUM, the newer version covering 13% of the UK population, are representative of the wider UK population with regard to age and sex.14, 15 Individuals from CPRD AURUM were excluded if their registered general practices were included in CPRD GOLD to avoid overlap and to maintain statistical power, as removing overlap cases from CPRD AURUM, a larger dataset, has less impact in data representation of England population compared to removing cases from CPRD GOLD.

CPRD data included individual demographics, symptoms, diagnoses, tests and referrals in primary care. They were further linked to the Hospital Episode Statistics–Admitted Patient Care (HES-APC) for specialist care information, the Office for National Statistics (ONS) for Civil Registration of Deaths, the Index of Multiple Deprivation (IMD) for area deprivation, and COVID-19 datasets for COVID diagnosis. The COVID-19 datasets include the national laboratory COVID-19 testing data from the Second Generation Surveillance System (SGSS) and COVID-19 hospital admission data from the COVID-19 Hospitalisation in England Surveillance System (CHESS). Data availability and linkage are summarised in Supplementary Material 1. This study, utilising linked data, concerns England only, as HES is not UK-based and includes data only for the cohort subset registered with practices in England.

Study population and follow-up period

Individuals aged 18 years or over with a first diagnosis of SARS-CoV-2 infection between 1 January 2020 and 28 February 2021 and registered with their general practice for at least 15 months before the studies index date (to ensure data quality16, 17) were extracted from the linked CPRD data (both GOLD and AURUM). Following the approach suggested by Thygesen and colleagues,18 COVID cases were identified by the presence of at least one of the following events: COVID diagnosis Read v2 codes in GOLD,19 COVID diagnosis SNOMED-CT (Systematized Nomenclature of Medicine Clinical Terms) codes in AURUM,20 COVID-related ICD-10 codes21 in HES-APC, COVID-related ICD-10 codes in ONS death registration, positive polymerase chain reaction (PCR) test results in SGSS,22, 23 or the presence in CHESS. SNOMED-CT is a globally used clinical terminology system for EHRs, while Read v2 is a coded thesaurus of clinical terms used in the NHS since 1985. Although CPRD currently employs different coding systems for GOLD and AURUM, it is gradually transitioning from Read v2 to SNOMED-CT as the Read v2 system nears retirement. The index date (or COVID diagnosis date) was defined as the date of the first COVID-related event. Hence, the study cohort included three types: hospitalised individuals (admitted for acute COVID-19 infection), non-hospitalised individuals (seeking primary care treatment only), and untreated individuals (diagnosed cases without treatment in hospital or primary care), providing a comprehensive view of all COVID cases. Details about resolving disagreements between CRPD and HES in terms of patient characteristics followed Wang and colleagues' approach and have been described elsewhere.16 The final cohort comprised 1,554,040 COVID cases (355,386 from GOLD and 1,198,654 from AURUM). The detailed code lists are described in Supplementary Material 2, and the study flowchart is presented in Supplementary Material 3.

The study cohort were followed until the study end date (31 December 2021) using the CPRD data exclusively. The CPRD data were used because of their up-to-date coverage (see Supplementary Material 1) and ability to capture most long-COVID diagnoses and symptoms (e.g. fatigue and dyspnoea), which are more commonly recorded in CPRD than in other linked data sources (e.g. HES-APC).

Long-COVID diagnosis and symptoms

Based on the NHS guidance, long-COVID in this study was defined as the presence of either a long-COVID diagnosis or at least one long-COVID symptom occurring from 12 weeks after the index date. The long-COVID diagnosis code list is based on Davis’s study24 and can be found in Supplementary Material 2. Long-COVID diagnosis codes were more comprehensive in SNOMED-CT, as the Read v2 was about to retire. The key 37 long-COVID symptoms, selected from over 100 symptoms reported in the literature, were identified based on the NHS guidance,25, 26 a rapid narrative review, and expert opinions. PubMed was searched up to May 15, 2023, using terms “long COVID,” “post COVID-19,” “post-acute COVID,” “long SARS-CoV-2,” combined with “symptom” and/or “risk factor,” with no language restrictions. Symptom code lists were developed rigorously based on the Clinical Research Using Linked Bespoke Studies and Electronic Health Records (CALIBER) phenotypes, the clusters of Quality and Outcomes Framework (QOF) business rules, and expert clinical input (CFC). CALIBER is a research platform that offers consistent data definitions and advanced phenotyping algorithms for specific diseases and conditions, while QOF cluster, UK's primary care pay-for-performance scheme, identifies and describes specific diseases or conditions using a combination of diagnostic codes and clinical indicators. The list of the 37 long-COVID symptoms can be found in Supplementary Material 4. Symptoms that appeared after 2nd COVID diagnosis were not considered.

Risk factors

Two main types of information were considered as risk factors: baseline characteristics and pre-existing long-term conditions (LTCs). The baseline characteristics included age, sex, ethnic group, area deprivation, and body mass index (BMI), while pre-existing LTCs included physical and mental health conditions recorded prior COVID diagnosis date. Following NHS guidance27, 28, 29 and expert opinions (CFC), 39 key LTCs were identified and considered as potential risk factors, including receiving immunosuppressive therapy, which is a combined group of treatments(see Supplementary Material 5). Supplementary Material 6 details the methods used to determine the baseline characteristics during the identification period.

For BMI, weight and height measurements were preferred over codes for obesity. BMI was categorised as underweight (<18.5 kg/m2), normal weight (18.5–24 kg/m2), overweight (25–29 kg/m2), and obese (≥30 kg/m2). Ethnic group was categorised as either white or non-white. Area deprivation was categorised into five quintiles based on IMD. Missing data on ethnic group, area deprivation, BMI, and smoking status were assigned to "unknown" categories within the corresponding variable.

Statistical methods

Mean and standard deviation were used for continuous variables, such as age, to describe the prevalence of long-COVID symptoms. In contrast, frequencies were used for categorical data, such as area deprivation and the presence of long-COVID symptoms.

To understand long-COVID symptoms and risk factors in three study subgroups (hospitalised, non-hospitalised, and untreated COVID-19 individuals), prevalence of long-COVID symptoms and clinical risk factors between individuals with long-COVID and those without long-COVID within and between subgroups were compared. Mann-Whitney U tests were used for continuous variables, and Chi-square or Fisher's exact tests for discrete variables, as appropriate.

To explore the risk factors for having long-COVID, a Cox regression model was employed to handle right censoring in the data, which occurs when the event of interest has not occurred for some individuals within the study period. Having long-COVID or not was treated as the outcome variable, while the baseline and clinical characteristics were treated as risk factors. Both univariate and multivariable Cox regression models were performed. The multivariable model was adjusted for five baseline characteristics (age, sex, ethnicity, area deprivation, and BMI) and 42 pre-existing LTCs (33 physical and 9 mental health conditions) listed in Table 1. Baseline characteristics used the largest group as the reference (male, white, least deprived, and normal BMI), and clinical characteristics used the normative group (‘No’) as the reference. Hazard ratios (HRs) and 95% confidence intervals (CIs) were reported, and a P value (two-sided) less than 0.05 was considered to be statistically significant. The variance inflation factors (VIF) and tolerance values (1/ VIF) were used to test the multicollinearity,30 with VIF greater than 2.50 be an indicator of multicollinearity.31 The test of scaled Schoenfeld residuals was used to assess the proportional hazard assumption.32

Table 1.

Baseline and clinical characteristics of individuals diagnosed with COVID (n = 1,554,040).

Total N (%) Hospitalised N (%) Non-hospitalised N (%) Untreated N (%) P value
Total 1554,040 (100.0%) 171,662 (100.0%) 416,358 (100.0%) 966,020 (100.0%)
Age <.001
 Mean (sd) 45.2 (17.9) 64.0 (19.2) 44.4 (16.8) 42.1 (15.9)
 Median (min-max) 42.8 (18−9) 65.1 (18 - 99) 43.0 (18 - 99) 40.0 (18−99)
Sex <.001
 Male 690,424 (44.4%) 88,997 (51.8%) 183,126 (44.0%) 418,301 (43.3%)
 Female 863,607 (55.6%) 82,664 (48.2%) 233,231 (56.0%) 547,712 (56.7%)
 Unknown 9 (0.0%) 1 (0.0%) 1 (0.0%) 7 (0.0%)
Ethnicity <.001
 White 1018,497 (65.5%) 126,241 (73.5%) 266,476 (64.0%) 625,780 (64.8%)
 Non-white 356,354 (22.9%) 37,704 (22.0%) 109,039 (26.2%) 209,611 (21.7%)
 Unknown 179,189 (11.6%) 7717 (4.5%) 40,843 (9.8%) 130,629 (13.5%)
Deprivation <.001
 1 (most deprived) 238,595 (15.4%) 24,730 (14.4%) 61,532 (14.8%) 152,333 (15.8%)
 2 266,363 (17.1%) 28,256 (16.5%) 67,824 (16.3%) 170,283 (17.6%)
 3 291,140 (18.7%) 31,460 (18.3%) 72,094 (17.3%) 187,586 (19.4%)
 4 360,882 (23.2%) 40,694 (23.7%) 90,399 (21.7%) 229,789 (23.8%)
 5 (least deprived) 361,415 (23.3%) 45,980 (26.8%) 93,017 (22.3%) 222,418 (23.0%)
 Unknown 35,645 (2.3%) 542 (0.3%) 31,492 (7.6%) 3611 (0.4%)
Body mass index (BMI) <.001
 Normal 729,143 (46.9%) 83,374 (48.6%) 255,247 (61.3%) 390,522 (40.4%)
 Overweight 198,353 (12.8%) 25,044 (14.6%) 29,368 (7.1%) 143,941 (14.9%)
 Obesity 228,200 (14.7%) 38,489 (22.4%) 81,971 (19.7%) 107,740 (11.2%)
 Underweight 46,942 (3.0%) 2743 (1.6%) 7089 (1.7%) 37,110 (3.8%)
 Uncertain/unknown 351,402 (22.6%) 22,012 (12.8%) 42,683 (10.2%) 286,707 (29.7%)
Long-term physical conditions
 Type II diabetes mellitus (T2DM) 514,960 (33.1%) 90,809 (52.9%) 192,760 (46.3%) 231,391 (24.0%) <.001
 Asthma 485,470 (31.2%) 62,521 (36.4%) 145,221 (34.9%) 277,728 (28.8%) <.001
 Hypertension 479,890 (30.9%) 106,858 (62.3%) 143,302 (34.4%) 229,730 (23.8%) <.001
 Chronic obstructive pulmonary disease (COPD) 204,221 (13.1%) 51,662 (30.1%) 45,755 (11.0%) 106,804 (11.1%) <.001
 Migraine 132,443 (8.5%) 14,841 (8.7%) 48,771 (11.7%) 68,831 (7.1%) <.001
 Chronic heart disease 96,807 (6.2%) 43,991 (25.6%) 21,526 (5.2%) 31,290 (3.2%) <.001
 Unspecified diabetes mellitus 72,820 (4.7%) 27,002 (15.7%) 26,215 (6.3%) 19,603 (2.0%) <.001
 Atrial Fibrillation (AF) 59,900 (3.9%) 30,632 (17.8%) 10,223 (2.5%) 19,045 (2.0%) <.001
 Stroke 56,035 (3.6%) 24,700 (14.45) 11,124 (2.7%) 20,211 (2.1%) <.001
 Stable angina 49,374 (3.2%) 22,894 (13.3%) 9976 (2.4%) 16,504 (1.7%) <.001
 Heart failure (HF) 46,356 (3.0%) 27,023 (15.7%) 6949 (1.7%) 12,384 (1.3%) <.001
 Type I diabetes mellitus (T1DM) 46,191 (3.0%) 17,670 (10.3%) 14,285 (3.4%) 14,236 (1.5%) <.001
 Allergy 41,539 (2.7%) 7238 (4.2%) 11,414 (2.7%) 22,887 (2.4%) <.001
 Bronchitis 39,886 (2.6%) 7415 (4.3%) 12,435 (3.0%) 20,036 (2.1%) <.001
 Bronchiolitis 36,668 (2.4%) 21,027 (12.3%) 4799 (1.2%) 10,842 (1.1%) <.001
 Myocardial infarction (MI) 29,521 (1.9%) 14,061 (8.2%) 5813 (1.4%) 9647 (1.0%) <.001
 Rheumatoid arthritis 24,302 (1.6%) 8675 (5.1%) 6784 (1.6%) 8843 (0.9%) <.001
 Transient ischaemic attack 22,522 (1.5%) 10,396 (6.1%) 4649 (1.1%) 7477 (0.8%) <.001
 Percutaneous coronary intervention 19,611 (1.3%) 8717 (5.1%) 3812 (0.9%) 7082 (0.7%) <.001
 Peripheral arterial disease 19,260 (1.2%) 10,928 (6.4%) 3072 (0.7%) 5260 (0.5%) <.001
 Aortic valve disease 15,794 (1.0%) 8163 (4.8%) 2858 (0.7%) 4773 (0.5%) <.001
 Multiple Valve Disease 15,621 (1.0%) 9333 (5.4%) 2028 (0.5%) 4260 (0.4%) <.001
 Mitral Valve Disease 14,324 (0.9%) 6656 (3.9%) 2986 (0.7%) 4682 (0.5%) <.001
 Unstable angina 14,462 (0.9%) 7225 (4.2%) 2446 (0.6%) 4791 (0.5%) <.001
 Emphysema 11,740 (0.8%) 7243 (4.2%) 1517 (0.4%) 2980 (0.3%) <.001
 Coronary artery bypass graft (CABG) 9031 (0.6%) 4698 (2.7%) 1669 (0.4%) 2664 (0.3%) <.001
 Abdominal aortic aneurysm 6952 (0.5%) 4118 (2.4%) 979 (0.2%) 1855 (0.2%) <.001
 Cardiac arrest 7154 (0.5%) 3564 (2.1%) 1171 (0.3%) 2419 (0.3%) <.001
 Congenital malformation of the cardiac septa 6290 (0.4%) 873 (0.5%) 1751 (0.4%) 3666 (0.4%) 0.414
 Immunosuppression therapy 6330 (0.4%) 1609 (0.9%) 1127 (0.3%) 3594 (0.4%) <.001
 Dilated cardiomyopathy 2556 (0.2%) 1198 (0.7%) 481 (0.1%) 877 (0.1%) <.001
 Hypertrophic cardiomyopathy 1184 (0.1%) 470 (0.3%) 285 (0.1%) 429 (0.0%) <.001
 Rheumatoid arthritis organ manifestation 219 (0.0%) 84 (0.1%) 62 (0.0%) 73 (0.0%) 0.102
Long-term mental conditions
 Anxiety 384,558 (24.8%) 57,058 (33.2%) 142,472 (34.2%) 185,028 (19.2%) <.001
 Depression 352,334 (22.7%) 58,748 (34.2%) 116,076 (27.9%) 177,510 (18.4%) <.001
 Somatic Symptom Disorders (SSD) 56,102 (3.6%) 7629 (4.4%) 24,380 (5.9%) 24,093 (2.5%) <.001
 Delirium 31,236 (2.0%) 17,814 (10.4%) 3250 (0.8%) 10,172 (1.1%) <.001
 Psychosis 20,537 (1.3%) 6081 (3.5%) 4250 (1.0%) 10,206 (1.1%) <.001
 Bipolar 15,913 (1.0%) 4851 (2.8%) 2932 (0.7%) 8130 (0.8%) <.001
 Schizophrenia 14,784 (1.0%) 4279 (2.5%) 3167 (0.8%) 7338 (0.8%) <.001
 Post-traumatic stress disorder (PTSD) 8159 (0.5%) 2173 (1.3%) 1255 (0.3%) 4731 (0.5%) <.001
 Functional neurological disorder (FND) 2359 (0.2%) 689 (0.4%) 524 (0.1%) 1146 (0.1%) <.001
Long-COVID
 Yes 119,725 (7.7%) 14,574 (8.5%) 94,306 (22.7%) 10,845 (1.1%)
 COVID diagnosis 8233 (0.5%) 2254 (1.3%) 5727 (1.4%) 252 (0.0%)
 COVID symptoms 115,590 (7.4%) 13,673 (8.0%) 91,210 (21.9%) 10,707 (1.1%)
 No 1434,315 (92.3%) 157,088 (91.5%) 322,052 (77.4%) 955,175 (98.9%)
Follow-up time (days)
 Mean (sd) 400.9 (76.0) 448.9 (117.7) 392.0 (62.4) 396.2 (68.5)
 Median (min - max) 373 (306 – 730) 394 (306−728) 371 (306−730) 372 (306−724)

A sensitivity analysis was conducted to assess the impact of data censoring. The study cohort was followed for seven months, and the multivariable logistic regression model was employed to derive the odds ratios (ORs) for all the risk factors. The aim was to check the robustness of the primary analysis results by comparing the HRs from the primary analysis and the ORs from the sensitivity analysis. Data importing and extraction were performed using R version 4.1.2, while data manipulation and all data analysis were conducted using SAS version 9.4 (SAS Institute, USA).

Ethical approval and consent to participate

A data-use agreement for CPRD records and linked HES and Office for National Statistics mortality data was granted by the CPRD Independent Scientific Advisory Committee (protocol number: 22_001739).33 Individual consent is not required for observational CPRD studies, but individuals can opt out of contributing to the database.

Role of the funding source

The study was funded by the National Institute for Health and Care Research (NIHR, COV-LT2–0043) as part of the STIMULATE-ICP study. The funders of the study had no role in the study design, data collection, data analysis, data interpretation, or writing of the report. The views expressed in this publication are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

Results

Baseline and clinical characteristics

Based on the CPRD GOLD and AURUM linked data, in total, 1,554,040 adults with a first diagnosis of COVID during the period 1 January 2020 to 28 February 2021 were identified and followed until the study end date (31 December 2021), providing an overview of the population during the first/second wave and pre-vaccination phase. Table 1 shows the demographic and clinical characteristics of the individuals with COVID, both overall and by subgroups. Overall, the mean age was 45.2 years (sd = 17.9), 55.6% were female, 65.5% were white, and cases were more frequent in less deprived areas. The most common pre-existing LTCs were type 2 diabetes mellitus (T2DM, 33.1%), asthma (31.2%), hypertension (30.9%), anxiety (24.8%) and depression (22.7%). Individual's demographic and clinical characteristics in CPRD GOLD and CPRD AURUM were similar.

Regarding the subgroups, hospitalised COVID individuals were older, predominantly white (73.5%), and more likely to be obese and have pre-existing LTCs such as T2DM and hypertension, compared to those non-hospitalised or untreated COVID individuals. However, some pre-existing LTCs, such as Somatic Symptom Disorders (SSD), were more commonly seen in non-hospitalised COVID individuals than in other subgroups. Subgroup analysis further revealed that, in each subgroup, individuals with long-COVID had significantly higher proportions of pre-existing physical and mental LTCs (p < .001) compared to those without long-COVID (Supplementary Table 7). However, some pre-existing LTCs, such as congenital malformation of the cardiac septa, dilated cardiomyopathy, and rheumatoid arthritis with organ manifestation, did not show significant differences in the hospitalised and untreated subgroups. Lastly, a higher proportion of non-hospitalised individuals had long-COVID compared to hospitalised and untreated COVID individuals.

Long-COVID diagnosis and symptoms

In total, after at least 12 weeks’ follow-up following the index date, 119,725 out of 1,554,040 (7.7%) COVID cases were considered to have long-COVID, as they either have been diagnosed with long-COVID (0.5%) or presented at least one of the pre-defined long-COVID symptoms (7.4%).

Table 2 shows the recorded diagnosis and symptoms of long-COVID. Overall, the frequently recorded long-COVID symptoms include cough (17.8%), back pain (15.3%), stomach-ache (11.4%), headache (11.2%), sore throat (10.0%), fatigue (8.1%) and chest pain (8.0%). Similar frequently recorded long-COVID symptoms were observed in each of the three subgroups, with dyspnoea (7.4%), dizziness (6.9%) and rash (6.8%) also common among hospitalised COVID individuals. Brain fog and related symptoms, such as memory loss, occurred more often in the hospitalised subgroup. However, fatigue and insomnia, often associated with brain fog, were similar across all subgroups. Subgroup analysis further revealed that, in each subgroup, individuals with long-COVID had a significantly higher proportion of all the long-COVID symptoms (p < .001) compared to those without long-COVID (Supplementary Table 8).

Table 2.

Long-COVID diagnosis and symptoms (N = 119,725).

Total N (%) Treated in hospitals N (%) Treated at primary care only N (%) Not treated N (%) P value
Long COVID total 119,725 (100%) 14,574 (100%) 94,306 (100%) 10,845 (100%)
 Long COVID diagnosis 8233 (6.9%) 2254 (15.5%) 5727 (6.1%) 252 (2.3%)
 Long COVID symptom 115,590 (96.6%) 13,673 (93.8%) 91,210 (96.7%) 10,707 (98.7%)
  Cough 21,246 (17.8%) 3187 (21.9%) 16,393 (17.4%) 1666 (15.4%) <.001
  Back pain 18,363 (15.3%) 2111 (14.5%) 14,511 (15.4%) 1741 (16.1%) 0.002
  Stomach-ache 13,624 (11.4%) 1682 (11.5%) 10,633 (11.3%) 1309 (12.1%) 0.038
  Headache 13,458 (11.2%) 1211 (8.3%) 10,928 (11.6%) 1319 (12.2%) <.001
  Sore throat 11,990 (10.0%) 977 (6.7%) 9776 (10.4%) 1237 (11.4%) <.001
  Fatigue 9644 (8.1%) 1148 (7.9%) 7715 (8.2%) 781 (7.2%) 0.001
  Chest pain 9610 (8.0%) 1553 (10.7%) 7294 (7.7%) 763 (7.0%) <.001
  Rash 7054 (5.9%) 993 (6.8%) 5370 (5.7%) 691 (6.4%) <.001
  Dizziness 6634 (5.5%) 999 (6.9%) 5087 (5.4%) 548 (5.1%) <.001
  Joint pain 5831 (4.9%) 743 (5.1%) 4601 (4.9%) 487 (4.5%) 0.081
  Vaginal discharge 5369 (4.5%) 288 (2.0%) 4538 (4.8%) 543 (5.0%) <.001
  Dyspnoea 4125 (3.5%) 1082 (7.4%) 2750 (2.9%) 293 (2.7%) <.001
  Nausea 3987 (3.3%) 850 (5.8%) 2782 (3.0%) 355 (3.3%) <.001
  Diarrhoea 3849 (3.2%) 890 (6.1%) 2635 (2.8%) 324 (3.0%) <.001
  Palpitation 3714 (3.1%) 424 (2.9%) 2948 (3.1%) 342 (3.2%) 0.354
  Menorrhagia 2996 (2.5%) 187 (1.3%) 2514 (2.7%) 295 (2.7%) <.001
  Insomnia 2633 (2.2%) 417 (2.9%) 1987 (2.1%) 229 (2.1%) <.001
  Hair loss 2427 (2.0%) 522 (3.6%) 1738 (1.8%) 167 (1.5%) <.001
  Pin and needle 2119 (1.8%) 275 (1.9%) 1686 (1.8%) 158 (1.5%) 0.024
  Erectile dysfunction 1760 (1.5%) 339 (2.3%) 1287 (1.4%) 134 (1.2%) <.001
  Tinnitus 1594 (1.3%) 180 (1.2%) 1.292 (1.4%) 122 (1.1%) 0.060
  Blocked nose 1400 (1.2%) 145 (1.0%) 1147 (1.2%) 108 (1.0%) 0.015
  Eye irritation 1386 (1.2%) 245 (1.7%) 1014 (1.1%) 127 (1.2%) <.001
  Fever 1429 (1.2%) 306 (2.1%) 1000 (1.1%) 123 (1.1%) <.001
  Loss of appetite 1279 (1.1%) 350 (2.4%) 823 (0.9%) 106 (1.0%) <.001
  Memory loss 882 (0.7%) 303 (2.1%) 517 (0.6%) 62 (0.6%) <.001
  Walk disturbances 765 (0.6%) 305 (2.1%) 401 (0.4%) 59 (0.5%) <.001
  Muscle pain 695 (0.6%) 158 (1.1%) 486 (0.5%) 51 (0.5%) <.001
  Other symptoms 626 (0.5%) 507 (3.5%) 110 (0.1%) 9 (0.1%) <.001
  Change/Loss of smell 615 (0.5%) 36 (0.3%) 530 (0.6%) 49 (0.5%) <.001
  Brain fog 533 (0.5%) 183 (1.3%) 309 (0.3%) 41 (0.4%) <.001
  Change/Loss of taste 377 (0.3%) 42 (0.3%) 306 (0.3%) 29 (0.3%) 0.500
  Reduced libido 196 (0.2%) 13 (0.1%) 161 (0.2%) 22 (0.2%) 0.044
  Ejaculation difficulties 110 (0.1%) 7 (0.1%) 96 (0.1%) 7 (0.1%) 0.084
  Body pain 83 (0.1%) 24 (0.2%) 51 (0.1%) 8 (0.1%) <.001

Risk factors of having long-COVID

With a minimum of 12 weeks and a maximum of 7 months follow-up (mean/median 400/373 days, respectively), both univariate (unadjusted) and multivariable (adjusted) Cox regression analyses indicated that being female, non-white, obese, and having pre-existing medical conditions such as hypertension, rheumatoid arthritis, T2DM, chronic heart disease, anxiety, depression, and Somatic Symptom Disorder (SSD) were associated with having long-COVID. The top three most significant risk factors were anxiety (adjusted HR: 1.829; 95% CI: 1.806 to 1.853), SSD (adjusted HR: 1.692; 95% CI: 1.659 to 1.726) and T2DM (1.646; 95% CI: 1.624 to 1.668).

Several significant factors associated with a lower risk were also identified, including being in the most deprived area (adjusted HR: 0.955; 95% CI: 0.937 to 0.974), having immunosuppressive therapy (adjusted HR: 0.594; 95% CI: 0.538 to 0.657), and delirium (adjusted HR: 0.541; 95% CI: 0.515 to 0.569). Details of HR and 95% CI are presented in Table 3 and Fig. 1. Multicollinearity is not a concern, as all VIFs are below 2.5. However, some of the reported HRs might be biased (highlighted as † in Table 3) due to not satisfying the proportional hazard assumption.

Table 3.

Risk factors of having long-COVID.

Primary analysis
Sensitivity analysis
Unadjusted HR, 95% CI Adjusted HR, 95% CI Unadjusted OR, 95% CI Adjusted OR, 95% CI
Age 1.005 (1.005 to 1.006)*** 0.993 (0.993 to 0.994)*** 1.006 (1.005 to 1.006)*** 0.992 (0.992 to 0.993)***
Sex
 Male Reference group Reference group Reference group Reference group
 Female 1.594 (1.575 to 1.613)*** 1.171 (1.156 to 1.186)*** 1.691 (1.667 to 1.715)*** 1.217 (1.199 to 1.236)***
Ethnicity
 White Reference group Reference group Reference group Reference group
 Non-white 1.133 (1.118 to 1.148)*** 1.145 (1.130 to 1.161)*** 1.145 (1.127 to 1.163)*** 1.180 (1.160 to 1.200)***
 Unknown 0.512 (0.500 to 0.525)*** 0.788 (0.769 to 0.808)*** 0.498 (0.484 to 0.512)*** 0.786 (0.763 to 0.811)***
Deprivation
 1 (most deprived) 0.794 (0.779 to 0.809)*** 0.955 (0.937 to 0.974)*** 0.774 (0.756 to 0.791)*** 0.945 (0.923 to 0.968)***
 2 0.817 (0.802 to 0.832)*** 0.938 (0.921 to 0.955)*** 0.805 (0.788 to 0.823)*** 0.936 (0.915 to 0.957)***
 3 0.827 (0.812 to 0.841)***, 0.914 (0.898 to 0.930)***, 0.813 (0.796 to 0.830)*** 0.906 (0.887 to 0.926)***
 4 0.878 (0.864 to 0.893)***, 0.922 (0.907 to 0.937)***, 0.869 (0.853 to 0.886)*** 0.916 (0.897 to 0.934)***
 5 (least deprived) Reference group Reference group Reference group Reference group
 Unknown 1.872 (1.819 to 1.926)***, 2.187 (2.124 to 2.253)***, 1.954 (1.887 to 2.023)*** 2.433 (2.344 to 2.527)***
BMI
 Normal Reference group Reference group Reference group Reference group
 Underweight 0.289 (0.275 to 0.304)***, 0.355 (0.337 to 0.374)***, 0.285 (0.268 to 0.302)*** 0.352 (0.332 to 0.375)***
 Overweight 0.266 (0.259 to 0.273)***, 0.312 (0.304 to 0.321)***, 0.259 (0.251 to 0.267)*** 0.306 (0.297 to 0.316)***
 Obesity 1.205 (1.189 to 1.221)*** 1.100 (1.085 to 1.116)*** 1.241 (1.220 to 1.261)*** 1.125 (1.106 to 1.145)***
 Unknown 0.175 (0.171 to 0.180)*** 0.270 (0.263 to 0.277)*** 0.166 (0.161 to 0.171)*** 0.264 (0.256 to 0.272)***
Long-term physical conditions
 Type II diabetes mellitus (T2DM) 2.394 (2.367 to 2.421)*** 1.646 (1.624 to 1.668)*** 2.475 (2.442 to 2.509)*** 1.675 (1.648 to 1.702)***
 Migraine 2.388 (2.353 to 2.424)*** 1.527 (1.503 to 1.551)*** 2.570 (2.524 to 2.617)*** 1.617 (1.566 to 1.648)***
 Unspecified diabetes mellitus 2.267 (2.225 to 2.311)*** 1.408 (1.374 to 1.444)*** 2.394 (2.339 to 2.451)*** 1.430 (1.386 to 1.476)***
 Chronic heart disease 1.555 (1.526 to 1.586)*** 1.344 (1.306 to 1.382)*** 1.702 (1.663 to 1.742)*** 1.423 (1.373 to 1.474)***
 Rheumatoid arthritis 1.783 (1.722 to 1.846)*** 1.292 (1.247 to 1.339)*** 1.923 (1.844 to 2.005)*** 1.339 (1.280 to 1.400)***
 Bronchitis 1.567 (1.522 to 1.613)*** 1.236 (1.200 to 1.273)*** 1.628 (1.572 to 1.686)*** 1.256 (1.210 to 1.303)***
 Mitral Valve Disease 1.410 (1.342 to 1.481)*** 1.161 (1.102 to 1.224)*** 1.535 (1.447 to 1.628)*** 1.166 (1.092 to 1.245)***
 Hypertrophic cardiomyopathy 1.425 (1.201 to 1.691)*** 1.130 (0.951 to 1.342) 1.652 (1.355 to 2.014)*** 1.224 (0.996 to 1.505)
 Chronic obstructive pulmonary disease (COPD) 1.488 (1.467 to 1.510)*** 1.127 (1.106 to 1.148)*** 1.588 (1.561 to 1.616)*** 1.167 (1.140 to 1.194)***
 Transient ischaemic attack 1.357 (1.303 to 1.412)*** 1.118 (1.065 to 1.172)*** 1.484 (1.415 to 1.557)*** 1.128 (1.063 to 1.197)***
 Congenital malformation of the cardiac septa 1.236 (1.140 to 1.340)***, 1.085 (1.000 to 1.176)*, 1.297 (1.179 to 1.427)*** 1.107 (1.003 to 1.223)*
 Allergy 1.474 (1.431 to 1.518)*** 1.076 (1.045 to 1.109)*** 1.532 (1.479 to 1.587)*** 1.079 (1.040 to 1.120)***
 Coronary artery bypass graft (CABG) 1.336 (1.253 to 1.422)*** 1.076 (1.005 to 1.153)* 1.393 (1.290 to 1.505)*** 1.022 (0.937 to 1.115)
 Hypertension 1.591 (1.572 to 1.609)*** 1.037 (1.023 to 1.051)*** 1.654 (1.531 to 1.677)*** 1.053 (1.036 to 1.071)***
 Aortic valve disease 1.216 (1.157 to 1.279)*** 1.031 (0.987 to 1.088) 1.336 (1.259 to 1.418)*** 1.049 (0.982 to 1.121)
 Abdominal aortic aneurysm 1.041 (0.960 to 1.128) 1.021 (0.940 to 1.110) 1.148 (1.043 to 1.252)** 1.047 (0.946 to 1.159)
 Rheumatoid arthritis organ manifestation 1.701 (1.184 to 2.442)**, 1.003 (0.696 to 1.445) 1.664 (1.052 to 2.632)* 0.909 (0.564 to 1.464)
 Asthma 1.562 (1.544 to 1.580)*** 0.999 (0.985 to 1.013) 1.593 (1.571 to 1.615)*** 0.984 (0.967 to 1.000)
 Unstable angina 1.455 (1.387 to 1.527)*** 0.995 (0.942 to 1.052) 1.592 (1.503 to 1.687)*** 0.999 (0.933 to 1.070)
 Myocardial infarction 1.295 (1.249 to 1.342)*** 0.975 (0.930 to 1.021) 1.384 (1.325 to 1.445)*** 0.973 (0.918 to 1.032)
 Dilated cardiomyopathy 1.107 (0.972 to 1.262) 0.947 (0.828 to 1.083) 1.195 (1.024 to 1.394)* 0.934 (0.793 to 1.100)
 Percutaneous coronary intervention 1.277 (1.222 to 1.335)*** 0.946 (0.895 to 1.001) 1.325 (1.256 to 1.399)*** 0.905 (0.843 to 0.970)**
 Cardiac arrest 1.081 (1.000 to 1.169) 0.922 (0.850 to 0.999)* 1.190 (1.084 to 1.305)** 0.937 (0.849 to 1.034)
 Stable angina 1.428 (1.390 to 1.467)*** 0.920 (0.887 to 0.954)*** 1.552 (1.502 to 1.603)*** 0.925 (0.885 to 0.968)**
 Stroke 1.170 (1.138 to 1.203)*** 0.918 (0.887 to 0.950)*** 1.275 (1.234 to 1.318)*** 0.943 (0.904 to 0.983)**
 Peripheral arterial disease 1.198 (1.144 to 1.253)*** 0.909 (0.867 to 0.954)** 1.325 (1.255 to 1.399)*** 0.932 (0.878 to 0.989)*
 Multiple Valve Disease 1.063 (1.008 to 1.121)* 0.897 (0.846 to 0.952)** 1.195 (1.122 to 1.273)*** 0.915 (0.852 to 0.983)*
 Atrial Fibrillation 1.039 (1.010 to 1.068)** 0.895 (0.866 to 0.925)*** 1.143 (1.106 to 1.182)*** 0.929 (0.892 to 0.967)**
 Emphysema 1.175 (1.108 to 1.246)*** 0.892 (0.838 to 0.950)** 1.314 (1.226 to 1.408)*** 0.896 (0.830 to 0.968)**
 Type I diabetes mellitus (T1DM) 1.841 (1.795 to 1888)*** 0.856 (0.830 to 0.884)*** 1.961 (1.902 to 2.022)*** 0.864 (0.830 to 0.899)***
 Bronchiolitis 1.297 (1.255 to 1.339)*** 0.847 (0.815 to 0.880)*** 1.468 (1.413 to 1.525)*** 0.869 (0.829 to 0.911)***
 Heart failure 1.112 (1.078 to 1.146)*** 0.847 (0.815 to 0.880)*** 1.241 (1.196 to 1.287)*** 0.869 (0.829 to 0.911)***
 Immunosuppression therapy 0.778 (0.704 to 0.859)*** 0.594 (0.538 to 0.657)*** 0.798 (0.710 to 0.897)** 0.572 (0.507 to 0.646)***
Long-term mental conditions
 Anxiety 2.770 (2.739 to 2.802)*** 1.829 (1.806 to 1.853)*** 2.987 (2.946 to 3.028)*** 1.912 (1.882 to 1.943)***
 Somatic Symptom Disorders (SSD) 3.044 (2.986 to 3.103)*** 1.692 (1.659 to 1.726)*** 3.336 (3.258 to 3.416)*** 1.818 (1.773 to 1.865)***
 Depression 2.222 (2.196 to 2.248)*** 1.291 (1.273 to 1.308)*** 2.399 (2.365 to 2.433)*** 1.344 (1.322 to 1.366)***
 Functional neurological disorder (FND) 1.745 (1.562 to 1.950)*** 0.925 (0.828 to 1.034) 1.866 (1.632 to 2.133)*** 0.903 (0.786 to 1.038)
 Psychosis 1.072 (1.022 to 1.124)** 0.864 (0.821 to 0.909)*** 1.179 (1.115 to 1.246)*** 0.892 (0.839 to 0.949)**
 Schizophrenia 1.034 (0.977 to 1.094) 0.768 (0.723 to 0.816)*** 1.126 (1.054 to 1.203)*** 0.784 (0.729 to 0.843)***
 Bipolar 1.089 (1.032 to 1.149)** 0.734 (0.694 to 0.775)*** 1.194 (1.122 to 1.271)*** 0.742 (0.695 to 0.793)***
 Post-traumatic stress disorder (PTSD) 1.114 (1.034 to 1.200)** 0.676 (0.628 to 0.729)*** 1.197 (1.098 to 1.306)*** 0.667 (0.610 to 0.730)***
 Delirium 0.639 (0.609 to 0.670)*** 0.541 (0.515 to 0.569)*** 0.724 (0.685 to 0.765)*** 0.571 (0.538 to 0.606)***
*

p < 0.05.

**

p < 0.01.

***

p < 0.001.

Does not satisfy the proportionality assumption.

c statistic = 0.752.

Fig. 1.

Fig. 1

Risk factors and hazard ratios of having long-COVID.

Sensitivity analysis

To assess the impact of data censoring, a sensitivity analysis was conducted. With a fixed period of 7-month follow-up, the multivariate logistic regression identified risk factors associated with having long-COVID. As shown in Table 3, the OR results were similar to the HR results, showing that most long-COVID symptoms can be captured in the first seven months after the index date.

Discussion

Main findings

This is the first study to examine long-COVID diagnoses and symptoms in hospitalised (admitted for acute COVID-19 infection), non-hospitalised (seeking primary care only) and untreated COVID cases (positive cases without treatment in hospital or primary care) using EHRs, providing the most comprehensive view of all COVID cases. Around 7.4% of individuals with COVID had at least one long-COVID symptom that required medical attention. However, few had been recorded as long-COVID, aligning with the findings from a prior study that solely utilised data from primary care practices in the UK.34 Walker and colleagues have reported such issues with coding as a diagnosis versus symptoms of long-COVID.35, 36 Possible explanations include the timing of the availability of long-COVID diagnostic codes, limited awareness among clinicians regarding using such codes, and variations in criteria used by different clinicians for using long-COVID diagnostic codes.

When considering long-COVID prevalence in three subgroups, our results show that the highest proportion of long-COVID cases (22.7%) were observed among the non-hospitalised COVID individuals, followed by hospitalised COVID cases (8.5%) and untreated cases (1.1%). This finding differs slightly from a Swedish study,12 which found a greater risk of long-COVID among hospitalised COVID individuals compared to outpatients. These differences may be due to our study defining the non-hospitalised subgroup as those receiving primary care only and identifying long-COVID based on both diagnosis and relevant symptoms, which are more likely to be recorded in primary care EHRs. Therefore, whether long-COVID is linked with severity of the acute infection or treatment setting remains unclear and requires further investigation to inform healthcare policy.

Our study reveals the top five most frequently recorded long-COVID symptoms, 12 weeks after the index date, were cough (17.8%), back pain (15.3%), stomach-ache (11.4%), headache (11.2%), and sore throat (10.0%). Same trend was observed in all three subgroups. The ranking differs from the previous self-report studies,2, 3, 4, 6, 7 indicating the discrepancies between self-reported symptoms and symptoms recorded in the EHR. While fatigue (8.1%) and chest pain (8.0%) remained common, other symptoms commonly reported by individuals, such as brain fog and loss of smell, were not commonly recorded in EHRs, suggesting differences in symptoms prioritisation between individuals and clinicians, and indicating that not all self-reported long-COVID symptoms may be selected for treatments. The ranking also differed from the studies utilising EHRs.5, 9 For instance, one study using CPRD data reported that the most commonly observed symptoms in ‘non-hospitalised individuals’ were pain (3.89%), anxiety (0.97%), depression (0.90%), abdominal pain (0.64%), and cough (0.55%), highlighting the prevalence of cough and stomach ache, which is consistent with our findings.5

Our study found that the risk factors associated with long-COVID were being female, non-white, obese, and having pre-existing physical and mental health conditions. Pre-existing mental health conditions such as anxiety and depression provided a similar risk of having long-COVID compared to pre-existing physical health conditions. This is in line with the findings of two US survey studies, which found that pre-existing psychological distress may be a risk factor for long-COVID.37, 38 The underlying mechanism could be twofold; depression and anxiety are known to have drivers, such as systemic low-grade inflammation, in a subgroup of people that react well to anti-inflammatory drugs39; this might reinforce the development of long-COVID. Second, a pre-existing inclination towards depressive or anxiety disorder might impair swift recovery from an acute COVID-19 infection in some way, although the exact mechanism for that would need further exploration in research. Pre-existing SSD is when a person experiences severe and persistent distress concerning somatic symptoms, whether in the context of a known medical condition or of so-called Medically Not Yet Explained Symptoms.40 Such pre-existing distress may be aggravated by contracting COVID-19 and be sustained by expectations that recovery may be slow. However, the HRs for pre-existing SSD do not align with other pre-existing mental disorders and should be interpreted cautiously. It should be noted that having pre-existing mental disorders does not automatically mean that people with long-COVID would have current mental disorders. Also, since pre-existing LTCs occurred before acute SARS-CoV-2 infection, mental health conditions should not automatically be considered typical long-COVID symptoms, a view supported by the finding that the odds of developing depression and anxiety rise over time in long COVID.41 Our findings also show that migraine is a risk factor for having long-COVID, as it increases the HR by 50% in the adjusted model and more than twice in the unadjusted model. The result is consistent with the findings from previous studies42, 43 showing that migraine may predispose to the development of long-COVID with symptoms other than migraine headaches. The possible explanation is that migraine and COVID-19 may share underlying mechanisms such as an increase of interleukin-6 (IL-6) levels and neuro-inflammation44 and over-expression of the angiotensin-converting enzyme 2 (ACE2) at central and peripheral nervous systems.45

Factors associated with lower risk were also identified, including individuals with COVID residing in the most deprived area or experiencing delirium, suggesting potential barriers to care and diagnosis for long-COVID in these populations.46 Another potential explanation for the lower risk observed in the most deprived areas is the challenge of detecting long-COVID cases within deprived populations. This could be attributed to underreporting by patients in deprived areas.47 In contrast, patients in affluent areas may demonstrate a higher awareness of the condition and possess greater health literacy, potentially leading to an over-reporting of symptoms.48 Furthermore, although a previous review study has suggested that the role of immune suppression is still under debate,49 our findings demonstrate that the risk of having long-COVID is strongly reduced in people on long-term immunosuppressive therapy. Having immunosuppressive therapy was found to reduce HR risk by more than 30% to have long-COVID, indicating that long-COVID may be associated with a pro-inflammatory immune response in COVID-19 cases, leading to hyperactivation of T cells, macrophages and killer cells and overproduction of inflammatory mediators.50 This finding warrants further research into the use of immunosuppressants in long-COVID.

Strengths

The current study has several strengths. First, this is the first long-COVID study that focused on individuals in the general population using EHRs, including hospitalised, non-hospitalised and untreated COVID cases, unlike previous studies either using CPRD data only to identify non-hospitalised COVID cases34, 51 or using HES data for hospitalised COVID cases.9 Such an approach provides a true overview of the long-COVID prevalence. It allows for more accurate estimates, as CPRD-linked data contains more detailed information than other data sources, such as the General Practice Extraction Service (GPES) Data for Pandemic Planning and Research (GDPPR). Second, EHRs allow the prevalence of important and clinically confirmed long-COVID symptoms to be measured compared to self-report studies. It also allows the individuals who seek medical attention for long-COVID symptoms to be identified. Such information will be helpful for decision-makers to develop services for managing long-COVID. Another strength of this study is that our sensitivity analysis yielded similar results.

Limitations

The study was subject to several limitations. First is the constraint on data availability. As shown in Supplementary Material 1, the study can only describe the prevalence of long-COVID symptoms and associated risk factors up to December 2021.52 Also, the effects of the vaccination and different COVID-19 variants could not be assessed because the vaccines were not widely available before March 2021, and the COVID-19 variant information was not included in the CPRD-linked data. Although more recent data are needed, the current study includes the most recent data and the most extended follow-up, compared to a previous study that used similar datasets in non-hospitalised people.5

Second, the definition of long-COVID can affect results about its frequency and associated risk factors,53 such as the number of long-COVID symptoms considered. Currently, no agreed set of long-COVID symptoms is available for identifying long-COVID.53 In this study, 37 key symptoms were predefined based on the NHS guidance, a rapid narrative review and expert opinions. Such a set of symptoms may not be comprehensive but could provide a conservative overview when describing long-COVID.

Third, although individuals in CPRD broadly represent the general population, we cannot ascertain the representativeness of individuals with COVID because not all the individuals who had COVID were recorded in the dataset. SGSS may cover most of the COVID-positive cases up until February 2021 (the latest available data date). However, those who had positive results using home testing kits but did not seek medical care would not be identified from EHRs and were therefore not included in this study. Additional limitations are possible with ascertainment and miscoding in the early stages of the pandemic, and the predominance of White individuals (two-thirds of the study population), limiting the representation of other ethnicities. This is a recognised issue,54 highlighting the need to explore long-COVID in minority ethnic populations. Finally, the data linkage between UK-based CPRD and England-based HES data has restricted the study population to individuals registered to CPRD general practices in England, potentially differing from the practices in the UK as a whole. Despite these limitations, the generalisability of our findings was supported by the UK Office for National Statistics that reported a similar distribution to our study group for characteristics such as age at diagnosis and gender.7

Finally, the current study is limited to individuals with the first COVID diagnosis only and did not consider individuals with multiple infections due to difficulty in distinguishing between ongoing COVID and re-infected COVID in the complex CPRD-linked data. Currently, no standard algorithm for this purpose is available. Hence, our findings may underestimate the true prevalence of long-COVID, and the effect of multiple infections on developing long-COVID requires further investigation.

Conclusion

In conclusion, our study provides an overview of the current observation of long-COVID symptoms and associated risk factors based on reliable health records of a large population-based cohort in England. The findings are useful for clinicians to identify higher-risk individuals for timely intervention and allow decision-makers to more efficiently allocate resources for managing long-COVID.

Author contribution

CFC, AB, MGC, KK and MH acquired funding for the study. CFC, AB, and HW designed and directed the project. TD verified the analytical methods. CFC, AB, TD, MGC, KK, and MH provided their expert insights. AG, MQUA, and HW contributed to data management, data linkage, and data curation. HW processed the data and performed the data analyses. HW took the lead in writing the manuscript. All authors provided critical feedback and helped shape the research, analysis, and manuscript.

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Prof. Christina Van Der Feltz-Cornelis received travel and accommodation in UK for lectures from The Lloyd Register Foundation and honoraries from Janssen UK and royalties for books on psychiatry. Prof. Christina Van Der Feltz-Cornelis has also received grants from the National Institute for Health Research, British Medical Association, European Union’s Horizon 2020 research programme and the Netherlands Organisation for Health Research and Development. Prof. Amitava Banerjee has received grants and/or fees from the National Institute for Health Research, British Medical Association, UK Research and Innovation, European Union’s Horizon 2020 research and innovation programme and EFPIA, and AstraZeneca. Dr. Michael G Crooks has received grants, fees and/or non-financial support from the National Institute for Health and Care Research, AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline, Philips, and Pfizer. Prof. Kamlesh Khunti was chair of the ethnicity subgroup of the UK Scientific Advisory Group for Emergencies (SAGE) and has acted as a consultant, speaker or received grants for investigator-initiated studies from Astra Zeneca and Pfizer. K.K. (chair) and A.B. are members of the National Long Covid Research Group that informs the chief medical officer for England. All other authors have no conflicts of interest to report.

Acknowledgements

This study was funded by the National Institute for Health and Care Research (NIHR, COV-LT2–0043) as part of the STIMULATE-ICP study. KK is supported by the NIHR Applied Research Collaboration East Midlands (ARC EM) and the NIHR Leicester Biomedical Research Centre (BRC). We would like to express our appreciation to Dr. Laura Pasea for her valuable assistance in preparing the data access protocol for accessing data provided by CPRD and initiating the data linkage process. We would also like to express our thanks to the STIMULATE-ICP PPI group for their feedback on the manuscript. Finally, we would like to extend our gratitude to the support provided by the STIMULATE consortium throughout the project. An up-to-date version of Consortium members can be found: https://www.stimulate-icp.org/team and STIMULATE-ICP can be contacted at: info@stimulate-icp.org.

Code availability

The SAS and R code for cleaning and analysing the data can be provided upon reasonable request.

Footnotes

Appendix A

Supplementary data associated with this article can be found in the online version at doi:10.1016/j.jinf.2024.106235.

Appendix A. Supplementary material

Supplementary material

mmc1.docx (880KB, docx)

.

Data Availability

Researchers can apply to access Clinical Practice Research Datalink (CPRD) data with linkage to Hospital Episode Statistics (HES) through https://www.cprd.com/. Data sharing agreements with CPRD do not permit data sharing with third parties. All formulae and additional sources of information are presented in the paper and Supplementary materials. The SAS code for cleaning and analysing the data can be provided upon reasonable request.

References

  • 1.National Institute for Health and Care Excellence (NICE). Scottish Intercollegiate Guidelines Network (SIGN), & Royal College of General Practitioners (RCGP) COVID-19 Rapid Guide: Manag Longterm Eff COVID-19; 2022. 〈https://www.nice.org.uk/guidance/ng188/resources/covid19-rapid-guideline-%20managing-the-longterm-effects-of-covid19-pdf-51035515742〉.
  • 2.Nalbandian A., Sehgal K., Madhavan M.V., McGroder C., Stevens J.S., Cook J.R., et al. Post-acute COVID-19 syndrome. Nat Med. 2021;27:601–615. doi: 10.1038/s41591-021-01283-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Subramanian A., Nirantharakumar K., Hughes S., Williams T., Gokhale K., Chandan J., et al. Assessment of 115 symptoms for long covid (post-covid-19 condition) and their risk factors in non-hospitalised individuals: A retrospective matched cohort study in UK primary care. Nat Med. 2022;28:1706–1714. [Google Scholar]
  • 4.Sudre C.H., Murray B., Varsavsky T., Graham M.S., Penfold R.S., Bowyer R.C., et al. Attributes and predictors of long COVID. Nat Med. 2021;27:626–631. doi: 10.1038/s41591-021-01292-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Subramanian A., Nirantharakumar K., Hughes S., Myles P., Williams T., Gokhale K.M., et al. Symptoms and risk factors for long COVID in non-hospitalized adults. Nat Med. 2022;28:1706–1714. doi: 10.1038/s41591-022-01909-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Aiyegbusi O.L., Hughes S.E., Turner G., Rivera S.C., McMullan C., Chandan J.S., et al. Symptoms, complications and management of long COVID: a review. J R Soc Med. 2021;114:428–442. doi: 10.1177/01410768211032850. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Office for National Statistics. Prevalence of Ongoing Symptoms Following Coronavirus (COVID-19) Infection in the UK; 2022. 〈https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/bulletins/prevalenceofongoingsymptomsfollowingcoronaviruscovid19infectionintheuk/7july2022〉.
  • 8.Feldman D.E., Boudrias M.-H., Mazer B. Long COVID symptoms in a population-based sample of persons discharged home from hospital. Can J Public Health. 2022;113:930–939. doi: 10.17269/s41997-022-00695-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ayoubkhani D., Khunti K., Nafilyan V., Maddox T., Humberstone B., Diamond I., et al. Post-covid syndrome in individuals admitted to hospital with covid-19: retrospective cohort study. BMJ. 2021;372:n693. doi: 10.1136/bmj.n693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wong M.C.-S., Huang J., Wong Y.-Y., Wong G.L.-H., Yip T.C.-F., Chan R.N.-Y., et al. Epidemiology, symptomatology, and risk factors for long covid symptoms: population-based, multicenter study. JMIR Public Health Surveill. 2023 doi: 10.2196/42315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.van der Feltz-Cornelis C.M., Sweetman J., Turk F., Allsopp G., Gabbay M., Khunti K., et al. Integrated care policy recommendations for complex multisystem long term conditions and long COVID. Sci Rep. 2024;14:13634. doi: 10.1038/s41598-024-64060-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Research to Spread Light on Long COVID Uncertainties. 〈https://www.umu.se/en/news/research-to-spread-light-on-long-covid-uncertainties_11713455/〉.
  • 13.Andersen K.M., Bates B.A., Rashidi E.S., Olex A.L., Mannon R.R., Patel R.C., et al. Long-term use of immunosuppressive medicines and in-hospital COVID-19 outcomes: a retrospective cohort study using data from the National COVID Cohort Collaborative. Lancet Rheuma. 2022;4:e33–e41. doi: 10.1016/S2665-9913(21)00325-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Herrett E., Gallagher A.M., Bhaskaran K., Forbes H., Mathur R., van Staa T., et al. Data resource profile: Clinical Practice Research Datalink (CPRD) Int J Epidemiol. 2015;44:827–836. doi: 10.1093/ije/dyv098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wolf A., Dedman D., Campbell J., Booth H., Lunn D., Chapman J., et al. Data resource profile: Clinical Practice Research Datalink (CPRD) Aurum. Int J Epidemiol. 2019;48 doi: 10.1093/ije/dyz034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang H.-I., Han L., Jacobs R., Doran T., Holt R.L.G., Prady S.L., et al. Healthcare resource use and costs for people with type 2 diabetes mellitus with and without severe mental illness in England: longitudinal matched-cohort study using the Clinical Practice Research Datalink. Br J Psychiatry J Ment Sci. 2022;221:402–409. doi: 10.1192/bjp.2021.131. [DOI] [PubMed] [Google Scholar]
  • 17.NHS Digital. Quality and Outcomes Framework (QOF) Indicators 2013–14; 2013.
  • 18.Thygesen J.H., Tomlinson C., Hollings S., Mizani M.A., Handy A., Akbari A., et al. COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records. Lancet Digit Health. 2022;4:e542–e557. doi: 10.1016/S2589-7500(22)00091-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chisholm J. The Read clinical classification. BMJ. 1990;300:1092. doi: 10.1136/bmj.300.6732.1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.National Library of Medicine. Overview of SNOMED CT. 〈https://www.nlm.nih.gov/healthit/snomedct/snomed_overview.html〉.
  • 21.World Health Organization. ICD-10: International Statistical Classification of Diseases and Related Health Problems: Tenth Revision; 2004. 〈https://apps.who.int/iris/handle/10665/42980〉.
  • 22.Clinical Practice Research Datalink. CPRD GOLD SGSS January 2022. Number of unique patients = 378,135 Clinical Practice Research Datalink; 2022. 〈https://doi.org/10.48329/NK4J-1P27〉.
  • 23.Clinical Practice Research Datalink. CPRD Aurum SGSS January 2022. Number of unique patients = 1,720,744 Clinical Practice Research Datalink; 2022. 〈https://doi.org/10.48329/BEJK-QA30〉.
  • 24.Davis H.E., Assaf G.S., McCorkell L., Wei H., Low R.J., Re’em Y., et al. Characterizing long COVID in an international cohort: 7 months of symptoms and their impact. EClinicalMedicine. 2021;38 doi: 10.1016/j.eclinm.2021.101019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.NHS England. Post-COVID Syndrome (Long COVID). 〈https://www.england.nhs.uk/coronavirus/post-covid-syndrome-long-covid/〉.
  • 26.NHS. Long-term Effects of COVID-19 (Long COVID). 〈https://www.nhs.uk/conditions/covid-19/long-term-effects-of-covid-19-long-covid/〉.
  • 27.Public Health England Guidance on Social Distancing for Everyone in the UK Gov UK; 2020. 〈https://www.gov.uk/government/publications/covid-19-guidance-on-social-distancing-and-for-vulnerable-people/guidance-on-social-distancing-for-everyone-in-the-uk-and-protecting-older-people-and-vulnerable-adults〉.
  • 28.NHS. Who is at High Risk from Coronavirus (COVID-19). 〈https://www.nhs.uk/conditions/coronavirus-covid-19/people-at-higher-risk/who-is-at-high-risk-from-coronavirus/〉.
  • 29.UK Health Security Agency and Department of Health and Social Care. COVID-19: guidance for people whose immune system means they are at higher risk. 〈https://www.gov.uk/government/publications/covid-19-guidance-for-people-whose-immune-system-means-they-are-at-higher-risk〉.
  • 30.T. Smith and B. Smith, Graphing the Probability of Event as a Function of Time Using Survivor Function Estimates and the SAS® System’s PROC (https://www.lexjansen.com/wuss/2000/WUSS00109.pdf).
  • 31.Johnston R., Jones K., Manley D. Confounding and collinearity in regression analysis: a cautionary tale and an alternative procedure, illustrated by studies of British voting behaviour. Qual Quant. 2018;52:1957–1976. doi: 10.1007/s11135-017-0584-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Fine J.P., Gray R.J. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94:496–509. [Google Scholar]
  • 33.A. Banerjee, L. Pasea, A. Gonzalez-Izquierdo, C. Van der Feltz-Cornelis, H.-I. Wang, M. Alizadeh Mizani, et al., Characterizing long COVID patients across pre-existing physical and mental health conditions: risk profiling using Electronic Health Records in the United Kingdom, CPRD approved studies, 2022, (https://www.cprd.com/approved-studies/characterizing-long-covid-patients-across-pre-existing-physical-and-mental-health).
  • 34.Thompson E.J., Williams D.M., Walker A.J., Mitchell R.E., Niedzwiedz C.L., Yang T.C., et al. Long COVID burden and risk factors in 10 UK longitudinal studies and Electronic Health Records. Nat Commun. 2022;13:3528. doi: 10.1038/s41467-022-30836-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Walker A.J., MacKenna B., Inglesby P., Tomlinson L., Rentsch C.T., Curtis H.J., et al. Clinical coding of long COVID in English primary care: a federated analysis of 58 million patient records in situ using OpenSAFELY. Br J Gen Pract. 2021;71:e806–e814. doi: 10.3399/BJGP.2021.0301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Willans R., Allsopp G., Jonsson P., Glen F., Macleod J., Wei Y.H., et al. Primary care post-COVID syndrome diagnosis and referral coding. medRxiv. 2023 doi: 10.1101/2023.05.23.23289798. [DOI] [Google Scholar]
  • 37.Tenforde M.W., Kim S.S., Lindsell C.J., Billig Rose E., Shapiro N.I., Files D.C., et al. Symptom duration and risk factors for delayed return to usual health among outpatients with COVID-19 in a multistate health care systems network – United States. MMWR Morb Mortal Wkly Rep. 2020;69:993–998. doi: 10.15585/mmwr.mm6930e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wang S., Quan L., Chavarro J.E., Slopen N., Kubzansky L.D., Koenen K.C., et al. Associations of depression, anxiety, worry, perceived stress, and loneliness prior to infection with risk of post–COVID-19 conditions. JAMA Psychiatry. 2022;79:1081–1091. doi: 10.1001/jamapsychiatry.2022.2640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Fitton R., Sweetman J., Heseltine-Carp W., van der Feltz-Cornelis C. Anti-inflammatory medications for the treatment of mental disorders: a scoping review. Brain Behav Immun Health. 2022;26 doi: 10.1016/j.bbih.2022.100518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.American Psychiatric Association . American Psychiatric Association; Arlington: 2013. Diagnostic and Statistical Manual of Mental Disorders (DSM–5) [Google Scholar]
  • 41.van der Feltz-Cornelis C., Turk F., Sweetman J., Khunti K., Gabbay M., Shepherd J., et al. Prevalence of mental health conditions and brain fog in people with long COVID: a systematic review and meta-analysis. Gen Hosp Psychiatry. 2024;88:10–22. doi: 10.1016/j.genhosppsych.2024.02.009. [DOI] [PubMed] [Google Scholar]
  • 42.Fernández-de-Las-Peñas C., Gómez-Mayordomo V., García-Azorín D., Palacios-Ceña D., Florencio L.L., Guerrero A.L., et al. Previous history of migraine is associated with fatigue, but not headache, as long-term post-COVID symptom after severe acute respiratory SARS-CoV-2 infection: a case-control study. Front Hum Neurosci. 2021;15 doi: 10.3389/fnhum.2021.678472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Magdy R., Elmazny A., Soliman S.H., Elsebaie E.H., Ali S.H., Abdel Fattah A.M., et al. Post-COVID-19 neuropsychiatric manifestations among COVID-19 survivors suffering from migraine: a case-control study. J Headache Pain. 2022;23:101. doi: 10.1186/s10194-022-01468-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Coomes E.A., Haghbayan H. Interleukin-6 in Covid-19: a systematic review and meta-analysis. Rev Med Virol. 2020;30:1–9. doi: 10.1002/rmv.2141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Sharifkashani S., Bafrani M.A., Khaboushan A.S., Pirzadeh M., Kheirandish A., Yavarpour Bali H., et al. Angiotensin-converting enzyme 2 (ACE2) receptor and SARS-CoV-2: Potential therapeutic targeting. Eur J Pharmacol. 2020;884 doi: 10.1016/j.ejphar.2020.173455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Shabnam S., Razieh C., Dambha-Miller H., Yates T., Gillies C., Chudasama Y.V., et al. Socioeconomic inequalities of Long COVID: a retrospective population-based cohort study in the United Kingdom. J R Soc Med. 2023;116:263–273. doi: 10.1177/01410768231168377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Baz S.A., Fang C., Carpentieri J.D., Sheard L. ‘I don’t know what to do or where to go’. Experiences of accessing healthcare support from the perspectives of people living with Long Covid and healthcare professionals: a qualitative study in Bradford, UK. Health Expect Int J Public Particip Health Care Health Policy. 2023;26:542–554. doi: 10.1111/hex.13687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Levy H., Janke A. Health Literacy and Access to Care. J Health Commun. 2016;21:43–50. doi: 10.1080/10810730.2015.1131776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Crook H., Raza S., Nowell J., Young M., Edison P. Long covid-mechanisms, risk factors, and management. BMJ. 2021;374:n1648. doi: 10.1136/bmj.n1648. [DOI] [PubMed] [Google Scholar]
  • 50.Mulchandani R., Lyngdoh T., Kakkar A.K. Deciphering the COVID-19 cytokine storm: systematic review and meta-analysis. Eur J Clin Invest. 2021;51 doi: 10.1111/eci.13429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Baksh R.A., Strydom A., Pape S.E., Chan L.F., Gulliford M.C. Susceptibility to COVID-19 diagnosis in people with down syndrome compared to the general population: matched-cohort study using primary care electronic records in the UK. J Gen Intern Med. 2022;37:2009–2015. doi: 10.1007/s11606-022-07420-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Clinical Practice Research Datalink. Availability of CPRD linked data; 2023. 〈https://www.cprd.com/cprd-linked-data〉.
  • 53.Ledford H. How common is long COVID? Why studies give different answers. Nature. 2022;606:852–853. doi: 10.1038/d41586-022-01702-2. [DOI] [PubMed] [Google Scholar]
  • 54.Khunti K., Banerjee A., Evans R.A., Calvert M. Long COVID research in minority ethnic populations may be lost in translation. Nat Med. 2024:1–2. doi: 10.1038/s41591-024-03070-y. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.docx (880KB, docx)

Data Availability Statement

Researchers can apply to access Clinical Practice Research Datalink (CPRD) data with linkage to Hospital Episode Statistics (HES) through https://www.cprd.com/. Data sharing agreements with CPRD do not permit data sharing with third parties. All formulae and additional sources of information are presented in the paper and Supplementary materials. The SAS code for cleaning and analysing the data can be provided upon reasonable request.

RESOURCES