Skip to main content
NEJM Group COVID-19 Collection logoLink to NEJM Group COVID-19 Collection
. 2020 Jun 17:NEJMoa2020283. doi: 10.1056/NEJMoa2020283

Genomewide Association Study of Severe Covid-19 with Respiratory Failure

David Ellinghaus 1, Frauke Degenhardt 1, Luis Bujanda 1, Maria Buti 1, Agustín Albillos 1, Pietro Invernizzi 1, Javier Fernández 1, Daniele Prati 1, Guido Baselli 1, Rosanna Asselta 1, Marit M Grimsrud 1, Chiara Milani 1, Fátima Aziz 1, Jan Kässens 1, Sandra May 1, Mareike Wendorff 1, Lars Wienbrandt 1, Florian Uellendahl-Werth 1, Tenghao Zheng 1, Xiaoli Yi 1, Raúl de Pablo 1, Adolfo G Chercoles 1, Adriana Palom 1, Alba-Estela Garcia-Fernandez 1, Francisco Rodriguez-Frias 1, Alberto Zanella 1, Alessandra Bandera 1, Alessandro Protti 1, Alessio Aghemo 1, Ana Lleo 1, Andrea Biondi 1, Andrea Caballero-Garralda 1, Andrea Gori 1, Anja Tanck 1, Anna Carreras Nolla 1, Anna Latiano 1, Anna Ludovica Fracanzani 1, Anna Peschuck 1, Antonio Julià 1, Antonio Pesenti 1, Antonio Voza 1, David Jiménez 1, Beatriz Mateos 1, Beatriz Nafria Jimenez 1, Carmen Quereda 1, Cinzia Paccapelo 1, Christoph Gassner 1, Claudio Angelini 1, Cristina Cea 1, Aurora Solier 1, David Pestaña 1, Eduardo Muñiz-Diaz 1, Elena Sandoval 1, Elvezia M Paraboschi 1, Enrique Navas 1, Félix García Sánchez 1, Ferruccio Ceriotti 1, Filippo Martinelli-Boneschi 1, Flora Peyvandi 1, Francesco Blasi 1, Luis Téllez 1, Albert Blanco-Grau 1, Georg Hemmrich-Stanisak 1, Giacomo Grasselli 1, Giorgio Costantino 1, Giulia Cardamone 1, Giuseppe Foti 1, Serena Aneli 1, Hayato Kurihara 1, Hesham ElAbd 1, Ilaria My 1, Iván Galván-Femenia 1, Javier Martín 1, Jeanette Erdmann 1, Jose Ferrusquía-Acosta 1, Koldo Garcia-Etxebarria 1, Laura Izquierdo-Sanchez 1, Laura R Bettini 1, Lauro Sumoy 1, Leonardo Terranova 1, Leticia Moreira 1, Luigi Santoro 1, Luigia Scudeller 1, Francisco Mesonero 1, Luisa Roade 1, Malte C Rühlemann 1, Marco Schaefer 1, Maria Carrabba 1, Mar Riveiro-Barciela 1, Maria E Figuera Basso 1, Maria G Valsecchi 1, María Hernandez-Tejero 1, Marialbert Acosta-Herrera 1, Mariella D’Angiò 1, Marina Baldini 1, Marina Cazzaniga 1, Martin Schulzky 1, Maurizio Cecconi 1, Michael Wittig 1, Michele Ciccarelli 1, Miguel Rodríguez-Gandía 1, Monica Bocciolone 1, Monica Miozzo 1, Nicola Montano 1, Nicole Braun 1, Nicoletta Sacchi 1, Nilda Martínez 1, Onur Özer 1, Orazio Palmieri 1, Paola Faverio 1, Paoletta Preatoni 1, Paolo Bonfanti 1, Paolo Omodei 1, Paolo Tentorio 1, Pedro Castro 1, Pedro M Rodrigues 1, Aaron Blandino Ortiz 1, Rafael de Cid 1, Ricard Ferrer 1, Roberta Gualtierotti 1, Rosa Nieto 1, Siegfried Goerg 1, Salvatore Badalamenti 1, Sara Marsal 1, Giuseppe Matullo 1, Serena Pelusi 1, Simonas Juzenas 1, Stefano Aliberti 1, Valter Monzani 1, Victor Moreno 1, Tanja Wesse 1, Tobias L Lenz 1, Tomas Pumarola 1, Valeria Rimoldi 1, Silvano Bosari 1, Wolfgang Albrecht 1, Wolfgang Peter 1, Manuel Romero-Gómez 1, Mauro D’Amato 1, Stefano Duga 1, Jesus M Banales 1, Johannes R Hov 1, Trine Folseraas 1, Luca Valenti 1, Andre Franke 1, Tom H Karlsen 1,, The Severe Covid-19 GWAS Group
PMCID: PMC7315890  PMID: 32558485

Abstract

Background

There is considerable variation in disease behavior among patients infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes coronavirus disease 2019 (Covid-19). Genomewide association analysis may allow for the identification of potential genetic factors involved in the development of Covid-19.

Methods

We conducted a genomewide association study involving 1980 patients with Covid-19 and severe disease (defined as respiratory failure) at seven hospitals in the Italian and Spanish epicenters of the SARS-CoV-2 pandemic in Europe. After quality control and the exclusion of population outliers, 835 patients and 1255 control participants from Italy and 775 patients and 950 control participants from Spain were included in the final analysis. In total, we analyzed 8,582,968 single-nucleotide polymorphisms and conducted a meta-analysis of the two case–control panels.

Results

We detected cross-replicating associations with rs11385942 at locus 3p21.31 and with rs657152 at locus 9q34.2, which were significant at the genomewide level (P<5×10−8) in the meta-analysis of the two case–control panels (odds ratio, 1.77; 95% confidence interval [CI], 1.48 to 2.11; P=1.15×10−10; and odds ratio, 1.32; 95% CI, 1.20 to 1.47; P=4.95×10−8, respectively). At locus 3p21.31, the association signal spanned the genes SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6 and XCR1. The association signal at locus 9q34.2 coincided with the ABO blood group locus; in this cohort, a blood-group–specific analysis showed a higher risk in blood group A than in other blood groups (odds ratio, 1.45; 95% CI, 1.20 to 1.75; P=1.48×10−4) and a protective effect in blood group O as compared with other blood groups (odds ratio, 0.65; 95% CI, 0.53 to 0.79; P=1.06×10−5).

Conclusions

We identified a 3p21.31 gene cluster as a genetic susceptibility locus in patients with Covid-19 with respiratory failure and confirmed a potential involvement of the ABO blood-group system. (Funded by Stein Erik Hagen and others.)


Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was discovered in Wuhan, China, in late 2019, and coronavirus disease 2019 (Covid-19), the disease caused by SARS-CoV-2, rapidly evolved into a global pandemic.1 As of June 15, 2020, there were more than 8.03 million confirmed cases worldwide, with total deaths exceeding 436,900.2 In Europe, Italy and Spain were severely affected early on, with epidemic peaks starting in the second half of February 2020 (Figure 1) and 61,507 deaths reported by June 15, 2020. Covid-19 has varied manifestations,3 with the large majority of infected persons having only mild symptoms or even no symptoms.4 Mortality rates are driven predominantly by the subgroup of patients who have severe respiratory failure related to interstitial pneumonia in both lungs and acute respiratory distress syndrome.5 Severe Covid-19 with respiratory failure requires early and prolonged support by mechanical ventilation.6

Figure 1. Timeline of Rapid Covid-19 Genomewide Association Study (GWAS).

Figure 1

The main events and milestones of the study are summarized in the plot. Samples from patients in three Italian hospitals (hospital A: Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan; hospital B: Humanitas Clinical and Research Center, IRCCS, Milan; and hospital C: UNIMIB School of Medicine, San Gerardo Hospital, Monza) and four Spanish hospitals (hospital A: Hospital Clínic and IDIBAPS, Barcelona; hospital B: Hospital Universitario Vall d’Hebron, Barcelona; hospital C: Hospital Universitario Ramón y Cajal, Madrid; and hospital D: Donostia University Hospital, San Sebastian) were obtained around the peak of the local epidemics, and ethics applications were quickly obtained by means of fast-track procedures (i.e., every local ethics review board supported studies of coronavirus disease 2019 [Covid-19] studies by providing rapid turn-around times, thus facilitating this fast de novo data generation). All the obtained blood samples were centrally isolated, genotyped, and analyzed within 8 weeks. Control data were obtained from control participants and from historical control data in Italy and Spain. The rapid workflow from patients to target identification shows the usefulness of GWAS, a standardized research tool that often relies on international and interdisciplinary cooperation. One center alone could not have completed this study, not to mention the increase in statistical power that was available because of the contribution of patients from multiple centers. The speed of data production depended heavily on laboratory automation, and the speed of analyses reflects existing analytic pipelines and the support of public so-called imputation servers (here, the Michigan imputation server of the G. Abecasis group). QC denotes quality control.

The pathogenesis of severe Covid-19 and the associated respiratory failure is poorly understood, but higher mortality is consistently associated with older age and male sex.7,8 Clinical associations have also been reported for hypertension, diabetes, and other obesity-related and cardiovascular disease traits, but the relative role of clinical risk factors in determining the severity of Covid-19 has not yet been clarified.7-11 Observational data on lymphocytic endotheliitis and diffuse microvascular and macrovascular thromboembolic complications suggest that Covid-19 is a systemic disease that involves injury to the vascular endothelium but provide little insight into the underlying pathogenesis.12-14 At the peak of the epidemic in Italy and Spain in early 2020, we performed a genomewide association study (GWAS) in an attempt to delineate host genetic factors contributing to severe Covid-19 with respiratory failure. The relatively low disease burden of Covid-19 in Norway and Germany allowed for a complementary team to be set up, whereby genotyping and analysis could occur in parallel with the rapid recruitment of patients in the heavily affected Italian and Spanish epicenters.

Methods

Study Participants and Recruitment

We recruited 1980 patients with severe Covid-19, which was defined as hospitalization with respiratory failure and a confirmed SARS-CoV-2 viral RNA polymerase-chain-reaction (PCR) test from nasopharyngeal swabs or other relevant biologic fluids, cross sectionally, from intensive care units and general wards at seven hospitals in four cities in the pandemic epicenters in Italy and Spain (Table S1A in Supplementary Appendix 1, available with the full text of this article at NEJM.org). The hospitals in Italy were the following: Fondazione IRCCS Cá Granda Ospedale Maggiore Policlinico in Milan (597 patients); Humanitas Clinical and Research Center, IRCCS, in Milan (154 patients); and UNIMIB (Università degli Studi di Milano–Bicocca) School of Medicine, San Gerardo Hospital, in Monza (a suburb of Milan) (200 patients). The hospitals in Spain were the following: Hospital Clínic and IDIBAPS (Instituto de Investigaciones Biomédicas August Pi i Sunyer) in Barcelona (56 patients), Hospital Universitario Vall d’Hebron in Barcelona (337 patients), Hospital Universitario Ramón y Cajal in Madrid (298 patients), and Donostia University Hospital in San Sebastian (338 patients).

Respiratory failure was defined in the simplest possible manner in order to ensure feasibility: the use of oxygen supplementation or mechanical ventilation, with severity graded according to the maximum respiratory support received at any point during hospitalization (supplemental oxygen therapy only, noninvasive ventilatory support, invasive ventilatory support, or extracorporeal membrane oxygenation). For severity assessments, severity was also dichotomized as no mechanical ventilation or mechanical ventilation. Whole-blood samples or buffy coats from diagnostic venipuncture were obtained for DNA extraction.

For comparison, we included 2381 control participants from Italy and Spain (Table S1B in Supplementary Appendix 1). We recruited 998 randomly selected blood donors at Fondazione IRCCS Cá Granda Ospedale Maggiore Policlinico, Milan, who underwent genotyping for the purpose of the present study. A total of 40 of these participants had evidence of the development of anti–SARS-CoV-2 antibodies, all of whom had mild or no Covid-19 symptoms. We also included two control panels with genotype data derived from previous studies and from persons with unknown SARS-CoV-2 infection status using the same genotyping array. The panels included 396 healthy volunteers, blood donors, and outpatients of gastroenterology departments in Italy15 and 987 healthy blood donors in San Sebastian, Spain.

Ethics Committee Approval

The project protocol involved the rapid recruitment of patient-participants and no additional project-related procedures (we primarily used material from clinically indicated venipunctures) and afforded anonymity, owing to the minimal data set collected. Differences in recruitment and consent procedures among the centers arose because some centers integrated the project into larger Covid-19 biobanking efforts, whereas other centers did not, and because there were differences in how local ethics committees provided guidance on the handling of anonymization or deidentification of data as well as consent procedures. Written informed consent was obtained, sometimes in a delayed fashion, from the study patients at each center when possible. In some instances, informed consent was provided verbally or by the next of kin, depending on local ethics committee regulations and special policies issued for Covid-19 research. For some severely ill patients, an exemption from informed consent was obtained from a local ethics committee or according to local regulations in order to allow the use of completely anonymized surplus material from diagnostic venipuncture.

The following approvals of the project were obtained from the relevant ethics committees: Germany: Kiel (reference number, D464/20); Italy: Fondazione IRCCS Cá Granda Ospedale Maggiore Policlinico (reference numbers, 342_2020 for patients and 334-2020 for control participants), Humanitas Clinical and Research Center, IRCCS (reference number, 316/20), the University of Milano–Bicocca School of Medicine, San Gerardo Hospital, Monza (the ethics committee of the National Institute of Infectious Diseases Lazzarro Spallanzani reference number, 84/2020); Norway: Regional Committee for Medical and Health Research Ethics in South-Eastern Norway (reference number, 132550); Spain: Hospital Clínic, Barcelona (reference number, HCB/2020/0405), Hospital Universitario Vall d’Hebron, Barcelona (reference number, PR[AG]244/2020), Hospital Universitario Ramón y Cajal, Madrid (reference number, 093/20) and Donostia University Hospital, San Sebastian (reference number, PI2020064).

Sample Processing, Genotyping, and Imputation

We performed DNA extraction using a Chemagic 360 (PerkinElmer) with the use of the low-volume kit CMG-1491 and the buffy-coat kit CMG-714 (Chemagen), respectively. For genotyping, we used the Global Screening Array (GSA), version 2.0 (Illumina), which contains 712,189 variants before quality control. Details on genotyping and quality-control procedures are provided in the Supplementary Methods section in Supplementary Appendix 1. To maximize genetic coverage, we performed single-nucleotide polymorphism (SNP) imputation on genome build GRCh38 using the Michigan Imputation Server and 194,512 haplotypes generated by the Trans-Omics for Precision Medicine (TOPMed) program (freeze 5).16

After the exclusion of samples during quality control (the majority of which were due to population outliers; see the Supplementary Methods section and Table S1B and S1C), the final case–control data sets comprised 835 patients and 1255 control participants from Italy and 775 patients and 950 control participants from Spain. A total of 8,965,091 SNPs were included in the Italian cohort and 9,140,716 SNPs in the Spanish cohort.

Statistical Analysis

To take imputation uncertainty into account, we tested for phenotypic associations with allele dosage data separately for both the Italian and Spanish case–control panels with the use of the PLINK logistic-regression framework for dosage data (PLINK, version 1.9).17 We carried out two genomewide tests of association that included covariates from principal-component analyses, with adjustments to control for potential population stratification (main analysis) and potential population stratification and age and sex bias (analysis corrected for age and sex). A fixed-effects meta-analysis was conducted with the use of the meta-analysis tool METAL18 on 8,582,968 variants that were common to both the Italian and Spanish data sets with the use of effect-size estimates and their standard errors from the study-specific association analyses.

For the genomewide meta-analysis, we used the commonly accepted threshold of 5×10−8 for joint P values to determine statistical significance. Bayesian fine-mapping analysis was performed for loci reaching genomewide significance (see the Supplementary Methods section). Genomewide summary statistics of our analyses are publicly available through our web browser (www.c19-genetics.eu) and have been submitted to the European Bioinformatics Institute (www.ebi.ac.uk/gwas; accession numbers, GCST90000255 and GCST90000256).

On the basis of the results from the TOPMed genotype imputation, we selected three ABO SNPs (rs8176747, rs41302905, and rs8176719)19,20 to infer the ABO blood type and calculated odds ratios according to blood type (A vs. B, AB, or O; B vs. A, AB, or O; AB vs. A, B, or O; and O vs. A, AB, or B) (see the Supplementary Methods). To assess in detail the HLA complex at locus 6p21, we performed sequencing-based HLA typing of seven classical HLA loci (HLA-A, -C, -B, -DRB1, -DQA1, -DQB1, and -DPB1) in a subgroup of 835 patients and 891 control participants from Italy and 773 patients from Spain (see the Supplementary Methods). We also assessed allelic distribution according to no mechanical ventilation (supplemental oxygen only) as compared with mechanical ventilation of any type. A similar assessment was made for lead SNPs rs11385942 and rs657152 at loci 3p21.31 and 9q34.2, respectively.

Results

Patients, Genotyping, and Quality Control

The milestones of the study in the context of the peak outbreaks in Italy and Spain are shown in Figure 1. Data on the age, sex, maximum respiratory support at any point during hospitalization, and relevant coexisting conditions (type 2 diabetes, hypertension, and coronary heart disease) in the patients who were included in the final analysis are shown in Table 1 and in Table S2 in Supplementary Appendix 1. Because we used the same genotyping platform (GSA) to obtain both data sets, we were able to perform a uniform quality control of the merged Italian and Spanish SNP data sets, thus reducing technical confounders to a minimum. A quantile–quantile (Q-Q) plot of the two meta-analyses (the main analysis and the analysis corrected for age and sex) showed significant associations in the tail of the distribution with minimal genomic inflation (λGC=1.015 for main analysis and λGC=1.006 for analysis corrected for age and sex) (Fig. S2 in Supplementary Appendix 1). We also carried out separate association analyses for the Italian and Spanish data sets (see the Supplementary Methods section and Fig. S3).

Table 1. Overview of Patients Included in the Final Analysis.*.

Characteristic Italian Hospitals Spanish Hospitals
A
(N=503)
B
(N=140)
C
(N=192)
A
(N=45)
B
(N=228)
C
(N=201)
D
(N=301)
Median age (IQR) — yr 64 (54–76) 67 (57–75) 66 (56–74) 69 (59–75) 65 (56–72) 69 (60–79) 67 (57–75)
Female sex — no. (%) 159 (32) 39 (28) 51 (27) 13 (29) 78 (34) 50 (25) 124 (41)
Respiratory support — no. (%)
Supplemental oxygen only 0 70 (50) 67 (35) 7 (16) 105 (46) 106 (53) 255 (85)
Noninvasive ventilation 399 (79) 25 (18) 89 (46) 6 (13) 7 (3) 16 (8) 0
Ventilator 104 (21) 45 (32) 33 (17) 31 (69) 116 (51) 77 (38) 46 (15)
ECMO 0 0 3 (2) 1 (2) 0 2 (1) 0
Hypertension — no./total no. (%) 166/503 (33) 71/140 (51) 109/192 (57) 26/45 (58) 113/228 (50) 112/199 (56) 114/301 (38)
Coronary artery disease — no./total no. (%) 21/503 (4) 25/140 (18) 25/192 (13) 4/45 (9) 14/228 (6) 35/199 (18) 15/301 (5)
Diabetes — no./total no. (%) 63/503 (13) 18/140 (13) 34/192 (18) 10/45 (22) 50/228 (22) 57/199 (29) 65/301 (22)
*

In Italy, hospital A was Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan; hospital B Humanitas Clinical and Research Center, IRCCS, Milan; and hospital C UNIMIB School of Medicine, San Gerardo Hospital, Monza. In Spain, hospital A was Hospital Clínic and IDIBAPS, Barcelona; hospital B Hospital Universitario Vall d’Hebron, Barcelona; hospital C Hospital Universitario Ramón y Cajal, Madrid; and hospital D Donostia University Hospital, San Sebastian. The predominance of men among the patients and the advanced age (median, >63 years) were consistent across all the centers. The sample numbers provided are after quality control was conducted for the genomewide association study (Table S1C in Supplementary Appendix 1). Allele distributions for detected risk variants at loci 3p21.31 and 9q34.2 in clinical subsets are shown in Table S2. ECMO denotes extracorporeal membrane oxygenation, and IQR interquartile range.

Genomewide Association Analysis

We found two loci to be associated with Covid-19–induced respiratory failure with genomewide significance (P<5×10−8) in the main meta-analysis: the rs11385942 insertion–deletion GA or G variant at locus 3p21.31 (odds ratio for the GA allele, 1.77; 95% confidence interval [CI], 1.48 to 2.11; P=1.15×10−10) and the rs657152 A or C SNP at locus 9q34.2 (odds ratio for the A allele, 1.32; 95% CI, 1.20 to 1.47; P=4.95×10−8) (Figure 2 and Table 2 and Supplementary Appendix 2, available at NEJM.org). Both loci showed nominally significant association in both the Spanish and Italian subanalyses (Table 2). The meta-analysis association results for recessive and heterozygous genetic models for the two meta-analyses (main analysis and the analysis corrected for age and sex) are provided in Supplementary Appendix 3, available at NEJM.org. The imputation quality of the associated markers was good (Table 2 and Supplementary Appendix 2), and manual inspection of genotype cluster plots of genotyped SNPs in these regions showed distinct genotype clouds for homozygous and heterozygous calls (Fig. S4 in Supplementary Appendix 1). Furthermore, the analyses that were corrected for age and sex corroborated the observations at both rs11385942 (meta-analysis odds ratio, 2.11; 95% CI, 1.70 to 2.61; P=9.46×10−12) and rs657152 (meta-analysis odds ratio, 1.39; 95% CI, 1.22 to 1.59; P=5.35×10−7) (Table 2 and Fig. S5 in Supplementary Appendix 1).

Figure 2. GWAS Summary (Manhattan) Plot of the Meta-analysis Association Statistics Highlighting Two Susceptibility Loci with Genomewide Significance for Severe Covid-19 with Respiratory Failure.

Figure 2

Shown is a Manhattan plot of the association statistics from the main meta-analysis (controlled for potential population stratification). The red dashed line indicates the genomewide significance threshold of a P value less than 5×10−8. Figure S6 in Supplementary Appendix 1 shows Manhattan plots that include hits passing a suggestive significance threshold of a P value less than 1×10−5 (total of 24 additional suggestive genomic loci) (see the Supplementary Methods section and Supplementary Appendix 4).

Table 2. Susceptibility Loci Associated with Severe Covid-19 with Respiratory Failure.*.

Chromosome and Analysis Meta-analysis Italian Panel Spanish Panel
P Value Odds Ratio
(95% CI)
P Value Odds Ratio
(95% CI)
Allele Frequency P Value Odds Ratio
(95% CI)
Allele Frequency
patient control patient control
3p21.31
Main analysis 1.15×10−10 1.77
(1.48–2.11)
1.98×10−7 1.74
(1.27–2.38)
0.14 0.09 1.32×10−4 1.85
(1.50–2.28)
0.09 0.05
Analysis corrected for age and sex 9.46×10−12 2.11
(1.70–2.61)
7.02×10−8 1.95
(1.53–2.48)
0.14 0.09 1.17×10−5 2.79
(1.76–4.42)
0.09 0.05
9q34.2
Main analysis 4.95×10−8 1.32
(1.20–1.47)
2.90×10−6 1.37
(1.20–1.57)
0.42 0.35 3.55×10−3 1.26
(1.08–1.48)
0.42 0.35
Analysis corrected for age and sex 5.35×10−7 1.39
(1.22–1.59)
5.31×10−5 1.37
(1.17–1.60)
0.42 0.35 2.81×10−3 1.45
(1.13–1.84)
0.42 0.35
*

The meta-analysis included 1610 patients and 2205 control participants; the Italian analysis, 835 and 1255, respectively; and the Spanish analysis, 775 and 950, respectively. Allele frequencies of the minor or risk allele (see below) are shown among the patients and the control participants. All the association test statistics were adjusted for the top 10 principal components from the principal-component analysis. Two analyses were performed: a main analysis, which was corrected for 10 principal components, and an analysis that was corrected for age and sex in addition to 10 principal components. In the analyses that were corrected for age and sex, 25 control participants were excluded from the Spanish analysis and the meta-analysis because of missing covariate data. The P values and corresponding odds ratios and 95% confidence intervals (CIs) are shown with respect to the minor allele. Association results for the recessive and heterozygous models for both meta-analyses (main and corrected for age and sex) are shown in Supplementary Appendix 3. Covid-19 denotes coronavirus disease 2019.

For chromosome 3p21.31, the association boundaries for each index single-nucleotide polymorphism (SNP; see the Supplementary Methods section), with the genomic positions retrieved from genome build hg38, were chr3:45800446 through 46135604. The Single Nucleotide Polymorphism database (dbSNP) identifier was rs11385942 (the rs identifier from the National Center for Biotechnology Information, rs11385942, is annotated as chr3:45834968 through 45834969:AAA:AA in dbSNP, version 153, and as chr3:45834967:GA:G in the Trans-Omics for Precision Medicine [TOPMed] imputation reference panel). The SNP rs11385942 was imputed according to TOPMed with high confidence (TOPMed estimated imputation accuracy, R2=0.94 and R2=0.95 for the Italian and Spanish panels, respectively) (Supplementary Appendix 2).The minor or risk allele was GA, and the major allele was G. The key genes (i.e., the candidate genes in the region) were SLC6A20, LZTFL1, FYCO1, CXCR6, XCR1, and CCR9.

For chromosome 9q34.2, the association boundaries for each index SNP, with the genomic positions retrieved from genome build hg38, were chr9:133257521 through 133279871. The SNP rs657152 was genotyped according to the Global Screening Array (GSA) in the Italian and Spanish panels (Supplementary Appendix 2). The minor or risk allele was A, and the major allele was C. The key gene was ABO.

The allele frequencies in Spanish and Italian control data sets from previously published studies21-27 are consistent with those we report here (Supplementary Appendix 2). A further 24 different genomic loci showed suggestive evidence (P<1×10−5) for association with Covid-19–induced respiratory failure in the main analysis (Supplementary Appendix 4, available at NEJM.org, and Fig. S6 in Supplementary Appendix 1). Association signals at loci 3p21.31 and 9q34.2 were fine-mapped to 22 and 38 variants, respectively, with greater than 95% certainty (Figure 3A and 3B and Supplementary Appendix 5, available at NEJM.org).

Figure 3. Regional Association Plots of Susceptibility Loci Associated with Severe Covid-19 with Respiratory Failure.

Figure 3

Bayesian fine-mapping analysis prioritized 22 and 38 variants for loci 3p21.31 (Panel A) and 9q34.2 (Panel B), respectively, with greater than 95% certainty. The linkage disequilibrium values were calculated on the basis of genotypes of the merged Italian and Spanish data sets derived from TOPMed (Trans-Omics for Precision Medicine) imputation. The positions in the genome assembly hg38 are plotted. The recombination rate is shown in centimorgans (cM) per million base pairs (Mb). The plot shows the names and locations of the genes; the transcribed strand is indicated with an arrow. Genes are represented with intronic and exonic regions. The purple diamond in each panel represents the variant most strongly associated with severe Covid-19 and respiratory failure.

Chromosome 3p21.31

The association signal at locus 3p21.31 comprised six genes (SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, and XCR1) (Figure 3A). The risk allele GA of rs11385942 is associated with reduced expression of CXCR6 and increased expression of SLC6A20, and LZTFL1 is strongly expressed in human lung cells (Fig. S7 and Supplementary Appendix 6, available at NEJM.org). We found that the frequency of the risk allele of the lead variant at 3p21.31 (rs11385942) was higher among patients who received mechanical ventilation than among those who received oxygen supplementation only in both the main meta-analysis (odds ratio, 1.70; 95% CI, 1.27 to 2.26; P=3.30×10−4) and the meta-analysis corrected for age and sex (odds ratio, 1.56; 95% CI, 1.17 to 2.01; P=0.003) (Supplementary Appendix 7, available at NEJM.org). Furthermore, the 19 patients who were homozygous for the rs11385942 risk allele were younger than 1591 patients who were heterozygous or homozygous for the nonrisk allele (median age, 59 years [interquartile range, 49 to 68] vs. 66 years [interquartile range, 56 to 75]; P=0.005). Available variant database entries suggest that the frequency of this risk allele varies among populations worldwide (Fig. S8 in Supplementary Appendix 1).

ABO Locus

At locus 9q34.2 the association signal coincided with the ABO blood group locus (Figure 3B and Fig. S9 in Supplementary Appendix 1). Accordingly, the distribution of ABO blood groups (predicted from combinations of genotypes of three different SNPs) was skewed among patients with Covid-19 who had respiratory failure, as compared with the distribution among control participants. In the meta-analysis corrected for age and sex, we found a higher risk among persons with blood group A than among patients with other blood groups (odds ratio, 1.45; 95% CI, 1.20 to 1.75; P=1.48×10−4) and a protective effect for blood group O as compared with the other blood groups (odds ratio, 0.65; 95% CI, 0.53 to 0.79; P=1.06×10−5). Details are provided in Supplementary Appendix 8, available at NEJM.org. Both associations and effect directions were consistent in the separate Spanish and Italian case–control analyses. We found no significant difference in blood-group distribution between patients receiving supplemental oxygen only and those receiving mechanical ventilation of any kind. The ABO blood-group frequency distributions in public registries are provided for comparison in Supplementary Appendix 8, along with details of the results presented here, and corroborate our observations.

HLA Analysis

Given its important role in several viral infections, we scrutinized the extended HLA region (chromosome 6, 25 through 34 Mb). There were no SNP association signals at the HLA complex that met even the significance threshold of suggestive association: P<1×10−5 (Fig. S10 in Supplementary Appendix 1). Dedicated analysis of the classical HLA loci showed no significant allele associations with either Covid-19 or disease severity (oxygen supplementation only or mechanical ventilation of any kind), and further analysis of heterozygote and divergent allele advantage or predicted number of HLA-bound SARS-CoV-2 peptides did not show significant associations with Covid-19 in this data set (see the HLA Analyses section in Supplementary Appendix 1 and Supplementary Appendix 9, available at NEJM.org).

Discussion

Using a pragmatic approach with simplified inclusion criteria and a complementary team of clinicians at the European Covid-19 epicenters in Italy and Spain and scientists in the less-burdened countries of Germany and Norway, we performed a GWAS that included de novo genotyping for Covid-19 with respiratory failure in approximately 2 months. We detected a novel susceptibility locus at a chromosome 3p21.31 gene cluster and confirmed a potential involvement of the ABO blood-group system in Covid-19.

On chromosome 3p21.31, the peak association signal covered a cluster of six genes (SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, and XCR1), several of which have functions that are potentially relevant to Covid-19. A causative gene cannot be reliably implicated by the present data. One candidate is SLC6A20, which encodes the sodium–imino acid (proline) transporter 1 (SIT1) and which functionally interacts with angiotensin-converting enzyme 2, the SARS-CoV-2 cell-surface receptor.28,29 However, the locus also contains genes encoding chemokine receptors, including the CC motif chemokine receptor 9 (CCR9) and the C-X-C motif chemokine receptor 6 (CXCR6), the latter of which regulates the specific location of lung-resident memory CD8 T cells throughout the sustained immune response to airway pathogens, including influenza viruses.30 Flanking genes (e.g., CCR1 and CCR2) also have relevant functions,31 and further studies will be needed to delineate the functional consequences of detected associations.

The preliminary results from the Covid-19 Host Genetics Consortium32 include suggestive associations within the same locus at chromosome 3p21.31, which lend considerable support to our findings (Fig. S11 in Supplementary Appendix 1). The consortium analysis also used population-based controls, but the patients included persons with mild Covid-19 and those with severe Covid-19. The parallel findings nevertheless underscore an important point about the ascertainment of patients and controls in genetic studies of Covid-19. Because the majority of patients with SARS-CoV-2 infection are asymptomatic, any sample involving patients with a positive nasopharyngeal RNA test is likely to hold a bias toward some degree of symptomatic burden. Two of the identifiers for inclusion in the current study were a positive result for the presence of SARS-CoV-2 according to PCR testing and receipt of respiratory support (an extreme Covid-19 phenotype). As such, it seems reasonable to conclude that the chromosome 3p21.31 locus is involved in Covid-19 susceptibility per se, with a possible enrichment in patients with severe disease. This latter interpretation is supported by the significantly higher frequency of the risk allele among patients who received mechanical ventilation than among those who received supplemental oxygen only as well as by the finding of younger age among patients who were homozygous for the risk allele than among patients who were heterozygous or homozygous for the nonrisk allele.

Nongenetic studies that were reported as preprints33,34 have previously implicated the involvement of ABO blood groups in Covid-19 susceptibility, and ABO blood groups have also been implicated in susceptibility to SARS-CoV-1 infection.35 Our genetic data confirm that blood group O is associated with a risk of acquiring Covid-19 that was lower than that in non-O blood groups, whereas blood group A was associated with a higher risk than non-A blood groups.33,34 The biologic mechanisms undergirding these findings may have to do with the ABO group per se (e.g., with the development of neutralizing antibodies against protein-linked N-glycans)36 or with other biologic effects of the identified variant,37-39 including the stabilization of von Willebrand factor.40,41 The ABO locus holds considerable risk for population stratification,42 which is increased by the inclusion of randomly selected blood donors in the current study (for which there is an inherent risk of blood group O enrichment). Alignment of the allele frequencies at the ABO locus in our control population with those in several non–blood-donor control populations would suggest that this is not a major bias, and at least one study34 that tested for association with blood type used disease controls with no affiliation to blood donors.

The pragmatic aspects leading to the feasibility of this massive undertaking in a very short period of time during the extreme clinical circumstances of the pandemic imposed limitations that will be important to explore in follow-up studies. For example, to enable the recruitment of study participants, a bare minimum of clinical metadata was requested. For this reason, extensive genotype–phenotype elaboration of current findings could not be conducted, and adjustments for all potential sources of bias (e.g., underlying cardiovascular and metabolic factors relevant to Covid-19) could not be performed. Furthermore, we have limited information about the SARS-CoV-2 infection status in the control participants; this concern is mitigated by the fact that the presence of susceptible persons in the control group would only bias the tests toward the null. In addition, few restrictions were imposed during inclusion, which led to genotyped samples having to be excluded owing to differing ethnic groups (population outliers). Further exploration of current findings, both as to their usefulness in clinical risk profiling of patients with Covid-19 and toward a mechanistic understanding of the underlying pathophysiology, is warranted.

Acknowledgments

We thank all the patients who consented to participate in this study, and we express our condolences to the families of patients who died from Covid-19. We also thank the entire clinical staff during the outbreak situation at the different centers who were able to work on this scientific study in parallel with their clinical duties; all the members of the Humanitas Covid-19 Task Force for contributions to the recruitment of patients (see the Supplementary Notes section in Supplementary Appendix 1); Sören Brunak and Karina Banasik for discussions on the ABO association; Goncalo Abecasis and his team for providing the Michigan imputation server; Fabrizio Bossa and Francesca Tavano for contributions to control-sample acquisition; Maria Reig for help in the case-sample acquisition; the staff of the Basque Biobank in Spain for assistance in the acquisition of samples; the staff of GCAT|Genomes for Life, a cohort study of the Genomes of Catalonia, Institute for Health Science Research Germans Trias i Pujol, for data contribution; Alexander Eck, Jenspeter Horst, and Jens Scholz for supporting the HLA typing in the project; and the members of the ethics commissions, review boards, and consortia who fast-track reviewed our applications and enabled this rapid genetic discovery study.

Supplementary Appendix 1

Supplementary Appendix 2

Supplementary Appendix 3

Supplementary Appendix 4

Supplementary Appendix 5

Supplementary Appendix 6

Supplementary Appendix 7

Supplementary Appendix 8

Supplementary Appendix 9

Disclosure Forms

This article was published on June 17, 2020, at NEJM.org.

Footnotes

Supported by a philanthropic donation from Stein Erik Hagen and Canica; by a grant from the Deutsche Forschungsgemeinschaft Cluster of Excellence “Precision Medicine in Chronic Inflammation” (EXC2167); by a Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico Covid-19 Biobank grant (to Dr. Valenti); by grants from the Italian Ministry of Health (RF-2016-02364358, to Dr. Valenti) and Ministero dell’Istruzione, dell’Università e della Ricerca project “Dipartimenti di Eccellenza 2018–2022” (D15D18000410001 to the Department of Medical Sciences, University of Turin; by a grant from the Spanish Ministry of Science and Innovation JdC fellowship (IJC2018-035131-I, to Dr. Acosta-Herrera); and by the GCAT Cession Research Project PI-2020-01. HLA typing was performed and supported by the Stefan-Morsch-Stiftung.

Disclosure forms provided by the authors are available with the full text of this article at NEJM.org.

References

  • 1.Zhu N, Zhang D, Wang W, et al. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med 2020;382:727-733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Dong E, Du H, Gardner L. An interactive Web-based dashboard to track COVID-19 in real time. Lancet Infect Dis 2020;20:533-534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nat Microbiol 2020;5:536-544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wu Z, McGoogan JM. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA 2020. February 24 (Epub ahead of print). [DOI] [PubMed] [Google Scholar]
  • 5.Berlin DA, Gulick RM, Martinez FJ. Severe Covid-19. N Engl J Med. DOI: 10.1056/NEJMcp2009575. [DOI] [PubMed] [Google Scholar]
  • 6.Marini JJ, Gattinoni L. Management of COVID-19 respiratory distress. JAMA 2020. April 24 (Epub ahead of print). [DOI] [PubMed] [Google Scholar]
  • 7.Zhou F, Yu T, Du R, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet 2020;395:1054-1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Li X, Xu S, Yu M, et al. Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan. J Allergy Clin Immunol 2020. April 12 (Epub ahead of print). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chen R, Liang W, Jiang M, et al. Risk factors of fatal outcome in hospitalized subjects with coronavirus disease 2019 from a nationwide analysis in China. Chest 2020. April 15 (Epub ahead of print). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Docherty AB, Harrison EM, Green CA, et al. Features of 20133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study. BMJ 2020;369:m1985-m1985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Richardson S, Hirsch JS, Narasimhan M, et al. Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area. JAMA 2020;323(20):2052-2059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Levi M, Thachil J, Iba T, Levy JH. Coagulation abnormalities and thrombosis in patients with COVID-19. Lancet Haematol 2020;7(6):e438-e440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Varga Z, Flammer AJ, Steiger P, et al. Endothelial cell infection and endotheliitis in COVID-19. Lancet 2020;395:1417-1418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ackermann M, Verleden SE, Kuehnel M, et al. Pulmonary vascular endothelialitis, thrombosis, and angiogenesis in Covid-19. N Engl J Med. DOI: 10.1056/NEJMoa2015432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Franke A, McGovern DP, Barrett JC, et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat Genet 2010;42:1118-1125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Taliun D, Harris DN, Kessler MD, et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. March 6, 2019. (https://www.biorxiv.org/content/10.1101/563866v1). preprint. [DOI] [PMC free article] [PubMed]
  • 17.Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 2015;4:7-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010;26:2190-2191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bugert P, Rink G, Kemp K, Klüter H. Blood group ABO genotyping in paternity testing. Transfus Med Hemother 2012;39:182-186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. IPD-IMGT/HLA database. Nucleic Acids Res 2020;48(D1):D948-D955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Dubois PC, Trynka G, Franke L, et al. Multiple common variants for celiac disease influencing immune gene expression. Nat Genet 2010;42:295-302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bentham J, Morris DL, Graham DSC, et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat Genet 2015;47:1457-1464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Myocardial Infarction Genetics Consortium. Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nat Genet 2009;41:334-341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Julià A, González I, Fernández-Nebro A, et al. A genome-wide association study identifies SLC8A3 as a susceptibility locus for ACPA-positive rheumatoid arthritis. Rheumatology (Oxford) 2016;55:1106-1111. [DOI] [PubMed] [Google Scholar]
  • 25.López-Isac E, Acosta-Herrera M, Kerick M, et al. GWAS for systemic sclerosis identifies multiple risk loci and highlights fibrotic and vasculopathy pathways. Nat Commun 2019;10:4955-4955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Obón-Santacana M, Vilardell M, Carreras A, et al. GCAT|Genomes for life: a prospective cohort study of the genomes of Catalonia. BMJ Open 2018;8(3):e018324-e018324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Galván-Femenía I, Obón-Santacana M, Piñeyro D, et al. Multitrait genome association analysis identifies new susceptibility genes for human anthropometric variation in the GCAT cohort. J Med Genet 2018;55:765-778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Vuille-dit-Bille RN, Camargo SM, Emmenegger L, et al. Human intestine luminal ACE2 and amino acid transporter expression increased by ACE-inhibitors. Amino Acids 2015;47:693-705. [DOI] [PubMed] [Google Scholar]
  • 29.Kuba K, Imai Y, Ohto-Nakanishi T, Penninger JM. Trilogy of ACE2: a peptidase in the renin-angiotensin system, a SARS receptor, and a partner for amino acid transporters. Pharmacol Ther 2010;128:119-128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wein AN, McMaster SR, Takamura S, et al. CXCR6 regulates localization of tissue-resident memory CD8 T cells to the airways. J Exp Med 2019;216:2748-2762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hickey MJ, Held KS, Baum E, Gao JL, Murphy PM, Lane TE. CCR1 deficiency increases susceptibility to fatal coronavirus infection of the central nervous system. Viral Immunol 2007;20:599-608. [DOI] [PubMed] [Google Scholar]
  • 32.COVID-19 Host Genetics Initiative. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur J Hum Genet 2020;28:715-718. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Zhao J, Yang Y, Huang H, et al. Relationship between the ABO blood group and the COVID-19 susceptibility. March 27, 2020. (https://www.medrxiv.org/content/10.1101/2020.03.11.20031096v2). preprint.
  • 34.Zietz M, Tatonetti NP. Testing the association between blood type and COVID-19 infection, intubation, and death. April 11, 2020. (https://www.medrxiv.org/content/10.1101/2020.04.08.20058073v1). preprint. [DOI] [PMC free article] [PubMed]
  • 35.Cheng Y, Cheng G, Chui CH, et al. ABO blood group and susceptibility to severe acute respiratory syndrome. JAMA 2005;293:1450-1451. [DOI] [PubMed] [Google Scholar]
  • 36.Breiman A, Ruvën-Clouet N, Le Pendu J. Harnessing the natural anti-glycan immune response to limit the transmission of enveloped viruses such as SARS-CoV-2. PLoS Pathog 2020;16(5):e1008556-e1008556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Comuzzie AG, Cole SA, Laston SL, et al. Novel genetic loci identified for the pathophysiology of childhood obesity in the Hispanic population. PLoS One 2012;7(12):e51954-e51954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Aziz M, Fatima R, Assaly R. Elevated interleukin-6 and severe COVID-19: A meta-analysis. J Med Virol 2020. April 28 (Epub ahead of print). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Naitza S, Porcu E, Steri M, et al. A genome-wide association scan on the levels of markers of inflammation in Sardinians reveals associations that underpin its complex regulation. PLoS Genet 2012;8(1):e1002480-e1002480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Franchini M, Crestani S, Frattini F, Sissa C, Bonfanti C. ABO blood group and von Willebrand factor: biological implications. Clin Chem Lab Med 2014;52:1273-1276. [DOI] [PubMed] [Google Scholar]
  • 41.Murray GP, Post SR, Post GR. ABO blood group is a determinant of von Willebrand factor protein levels in human pulmonary endothelial cells. J Clin Pathol 2020;73:347-349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Thomson G, Bodmer WF. Letter: population stratification as an explanation of IQ and ABO association. Nature 1975;254:363-364. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from The New England Journal of Medicine are provided here courtesy of Massachusetts Medical Society

RESOURCES