Abstract
Background
Developing insight into the pathogenesis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is of critical importance to overcome the global pandemic caused by coronavirus disease 2019 (covid-19). In this study, we have applied Mendelian randomization (MR) to systematically evaluate the effect of 10 cardiometabolic risk factors and genetic liability to lifetime smoking on 97 circulating host proteins postulated to either interact or contribute to the maladaptive host response of SARS-CoV-2.
Methods
We applied the inverse variance weighted (IVW) approach and several robust MR methods in a two-sample setting to systemically estimate the genetically predicted effect of each risk factor in turn on levels of each circulating protein. Multivariable MR was conducted to simultaneously evaluate the effects of multiple risk factors on the same protein. We also applied MR using cis-regulatory variants at the genomic location responsible for encoding these proteins to estimate whether their circulating levels may influence severe SARS-CoV-2.
Findings
In total, we identified evidence supporting 105 effects between risk factors and circulating proteins which were robust to multiple testing corrections and sensitivity analyzes. For example, body mass index provided evidence of an effect on 23 circulating proteins with a variety of functions, such as inflammatory markers c-reactive protein (IVW Beta=0.34 per standard deviation change, 95% CI=0.26 to 0.41, P = 2.19 × 10−16) and interleukin-1 receptor antagonist (IVW Beta=0.23, 95% CI=0.17 to 0.30, P = 9.04 × 10−12). Further analyzes using multivariable MR provided evidence that the effect of BMI on lowering immunoglobulin G, an antibody class involved in protection from infection, is substantially mediated by raised triglycerides levels (IVW Beta=-0.18, 95% CI=-0.25 to -0.12, P = 2.32 × 10−08, proportion mediated=44.1%). The strongest evidence that any of the circulating proteins highlighted by our initial analysis influence severe SARS-CoV-2 was identified for soluble glycoprotein 130 (odds ratio=1.81, 95% CI=1.25 to 2.62, P = 0.002), a signal transductor for interleukin-6 type cytokines which are involved in inflammatory response. However, based on current case samples for severe SARS-CoV-2 we were unable to replicate findings in independent samples.
Interpretation
Our findings highlight several key proteins which are influenced by established exposures for disease. Future research to determine whether these circulating proteins mediate environmental effects onto risk of SARS-CoV-2 infection or covid-19 progression are warranted to help elucidate therapeutic strategies for severe covid-19 disease.
Funding
The Medical Research Council, the Wellcome Trust, the British Heart Foundation and UK Research and Innovation.
Keywords: SARS-CoV-2, Covid19, Mendelian randomization, Cardiometabolic risk factors, Circulating proteins
Research in context.
Evidence before this study
It remains unclear why certain individuals develop more severe symptoms of coronavirus disease 19 (covid-19) compared to others. However, increasingly findings from the literature suggest that established cardiometabolic risk factors, such as body mass index and smoking, influence the risk of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) which is caused by covid-19.
In this study, we used data from a resource involving genome-wide association studies (GWAS) of 97 unique proteins which may play a role in severe SARS-CoV-2 (available at https://omicscience.org/apps/covidpgwas/). This enabled us to use genetic variation to estimate the effects of 10 cardiometabolic risk factors on each of these proteins in turn using an approach known as Mendelian randomization (MR). We additionally used this approach to estimate the effects of these circulating proteins on risk of severe SARS-CoV-2, using studies from the literature who have made GWAS results on this outcome.
Added value of this study
Our study provides a systematic evaluation of the genetically predicted effects of 10 cardiometabolic risk factors on each of the 97 unique proteins. Altogether, we found 105 effects which were robust to multiple testing corrections which may be valuable for future covid-19 research. We also evaluated whether any of these proteins influence risk of SARS-CoV-2, with soluble glycoprotein 130 providing the strongest evidence of a genetically predicted effect using MR. This protein is involved on the interleukin 6 receptor pathway which plays an important role in the body's immune response. However, further data is required to robustly support this gene's putative role in risk of severe SARS-CoV-2.
Implications of all the available evidence
Our findings are important in terms of developing insight into the molecular pathways by which modifiable lifestyle factors influence disease risk. Specifically with respect to severe covid-19, we note that the GWAS datasets of SAR-CoV-2 used in this work will capture genetic effects on increased susceptibility to infection as well as progression to severe symptoms. This is particularly important when considering the implications of therapeutically targeting any of the proteins highlighted by our study.
Alt-text: Unlabelled box
Research in context
Evidence before this study
It remains unclear why certain individuals develop more severe symptoms of coronavirus disease 19 (covid-19) compared to others. However, increasingly findings from the literature suggest that established cardiometabolic risk factors, such as body mass index and smoking, influence the risk of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) which is caused by covid-19.
In this study, we used data from a resource involving genome-wide association studies (GWAS) of 97 unique proteins which may play a role in severe SARS-CoV-2 (available at https://omicscience.org/apps/covidpgwas/). This enabled us to use genetic variation to estimate the effects of 10 cardiometabolic risk factors on each of these proteins in turn using an approach known as Mendelian randomization (MR). We additionally used this approach to estimate the effects of these circulating proteins on risk of severe SARS-CoV-2, using studies from the literature who have made GWAS results on this outcome.
Added value of this study
Our study provides a systematic evaluation of the genetically predicted effects of 10 cardiometabolic risk factors on each of the 97 unique proteins. Altogether, we found 105 effects which were robust to multiple testing corrections which may be valuable for future covid-19 research. We also evaluated whether any of these proteins influence risk of SARS-CoV-2, with soluble glycoprotein 130 providing the strongest evidence of a genetically predicted effect using MR. This protein is involved on the interleukin 6 receptor pathway which plays an important role in the body's immune response. However, further data is required to robustly support this gene's putative role in risk of severe SARS-CoV-2.
Implications of all the available evidence
Our findings are important in terms of developing insight into the molecular pathways by which modifiable lifestyle factors influence disease risk. Specifically with respect to severe covid-19, we note that the GWAS datasets of SAR-CoV-2 used in this work will capture genetic effects on increased susceptibility to infection as well as progression to severe symptoms. This is particularly important when considering the implications of therapeutically targeting any of the proteins highlighted by our study.
1. Introduction
On the 11th of March 2020 the World Health Organisation (WHO) declared the coronavirus disease 2019 (covid-19) a global pandemic [1]. Although strict lockdown measures have been enforced in many countries to control the spread of infection, the number of deaths worldwide which have been attributed to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to rise [2]. Furthermore, despite widespread ongoing biomedical research it remains unclear why some individuals develop severe symptoms of SARS-CoV-2 once contracting covid-19, whereas an estimated 80% of individuals display either asymptomatic or mild infections [3]. It is becoming increasingly evident however based on findings from the literature that established cardiometabolic disease risk factors play a role in the severity of symptoms for SARS-CoV-2 [4, 5].
To address this challenge, researchers in the field, led by colleagues from the MRC Epidemiology Unit, have rapidly generated a curated dataset concerning the genetic architecture of 97 unique proteins which may be involved in influencing severe SARS-CoV-2 [6]. These include inflammatory cytokines and antibodies (such as immunoglobulin G) which are involved in immune response to infection, proteins involved in fibrinolysis and blood coagulation and gene products which have been reported to interact with SARS-CoV-2 proteins in human cells [7]. A complete list of these proteins can be found in Supplementary Table 1.
This curated resource provides an opportunity to undertake Mendelian randomization (MR) analyzes to develop insight into modifiable risk factors that influence these SARS-CoV-2-related proteins, as well as potential downstream consequences on risk of covid-19. MR can be implemented as a form of instrumental variable analysis which exploits the random assortment of genetic alleles at birth under Mendel's laws of Inheritance [8, 9]. As such genetic variants can be leveraged as instrumental variables to investigate causal relationships between conventional exposures (such as cardiometabolic risk factors) and outcomes (such as circulating proteins) (Fig. 1A). As these inherited genetic variants are fixed at conception, MR is typically robust to confounding factors and reverse causation which can bias analyzes in an observational setting which do not make use of human genetics data.
In this study, we systematically applied MR to estimate the effects of 10 cardiometabolic exposures and genetic liability to lifetime smoking in turn on each of the SARS-CoV-2 prioritized proteins. We focused on these types of exposures given findings from the literature providing evidence that they may increase risk of severe covid-19, including observations from cohort studies, healthcare records and previous genetic studies [10], [11], [12], [13], [14]. This was followed by a series of sensitivity analyzes as well as applying multivariable MR to evaluate whether exposures independently influence the same circulating protein or act along overlapping causal pathways. We also sought to investigate the potential effects of proteins highlighted by this analysis on risk of severe covid-19 using data from recently conducted genome-wide association studies (GWAS).
2. Methods
2.1. Data resources
2.1.1. Deriving genetic instruments for modifiable exposures
We obtained genetic instruments for 11 exposures using data from large-scale GWAS. These were body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), high density lipoprotein (HDL) cholesterol, low density lipoprotein (LDL) cholesterol, triglycerides, apolipoprotein A-I (Apo A-I), apolipoprotein B (Apo B), genetic liability to lifetime smoking, waist-hip-ratio adjusted for BMI and childhood adiposity based on reported body size at age 10 [15], [16], [17], [18]. Details on the study characteristics for the GWAS used to derive these instruments can be found in Supplementary Table 2.
We undertook linkage disequilibrium (LD) clumping to identify independent genetic instruments for these 11 exposures assessed using the software PLINK [19]. This process involves removing genetic variants which are correlated with the mostly strongly associated variant with a trait of interested in a region based on pairwise LD (using r2 < 0.001 in this study) using a reference panel of 503 individuals of European descent from phase 3 (version 5) of the 1000 genomes project [20].
2.1.2. Quantitative trait loci data for SARS-CoV-2-related proteins
All pQTL summary statistics for 97 unique proteins were obtained from the https://omicscience.org/apps/covidpgwas webserver [6]. Details on how these pQTL were derived are described in detail in the study by Pietzner et al. and outlined in Supplementary Fig. 1. Briefly, plasma samples from 10,708 individuals from the Fenland population-based cohort study were eligible for analysis after exclusions. In total, 409 circulating proteins were prioritized due to any of the following criteria; evidence suggesting that they interact with SARS-CoV-2 (n = 332) [7], associated with disease severity (n = 26) [21], involved in viral entry (n = 2) [22] or that they are clinical biomarkers of adverse, prognosis, complications and disease deterioration (n = 54) [23], [24], [25], [26]. Of the proteins, SOMAscan proteomic assays were used to derive data on 179 of them.
In the same set of participants, imputed genotype data on 17,652,797 genetic variants were available after imputation using the UK10K+1000 G phase3 reference panel. Summary statistics were available in a total of 97 unique proteins from the webserver as they had at least 1 pQTL acting in cis, which is defined here as genetic variants robustly associated with circulating proteins (based on P<5 × 10−08) and located within a 1Mb window around the genes responsible for encoding them.
We undertook LD clumping as before to identify pQTL to be used as instruments in MR analyzes. However, for protein instrumental variables we applied a more lenient LD threshold of r2<0.2 to identify weakly correlated pQTL all within a cis-window of 1Mb either side of the lead cis-pQTL for each protein analyzed. MR was then undertaken for protein targets whilst taking into account LD structure of pQTL. Further details on this method have been described in detail by Burgess et al [27]. In brief, the pairwise correlations between all instruments are incorporated in the standard error of the test statistic for the summary-level weighted generalized linear regression MR when deriving causal estimates.
2.1.3. Covid-19 GWAS datasets
Genetic estimates on SARS-CoV-2 were obtained using data from a GWAS of severe covid-19 [28] based on 1980 patients from intensive care units and wards at seven hospital located in the pandemic epicenters in Italy and Spain (accessed on 11/08/2020). Severe covid-19 was defined as hospitalization with respiratory failure and a confirmed SARS-CoV-2 viral RNA polymerase-chain-reaction (PCR) test using nasopharyngeal swabs or other biological fluids. General population controls analysed in this GWAS had unknown covid-19 status. Effect estimates from these GWAS were mapped to hg19 coordinates using the LiftOver tool (https://genome.sph.umich.edu/wiki/LiftOver). We analyzed GWAS data on severe SARS-CoV-2 data as detection and classification of these is likely to be more complete and less selected than for reported covid-19 symptoms or SARS-CoV-2 test positivity, for which potentially misleading conclusions can be drawn [29]. We also sought out replication of findings using GWAS results from the covid-19 host genetic initiative [30] using data on hospitalized covid-19 cases compared to population controls (https://www.covid19hg.org/results/, accessed on 11/08/2020), as well as a GWAS of mortality attributed to covid-19 in the UK Biobank study compared to population controls based on analyzes by Johnson and colleagues [31] (available at https://grasp.nhlbi.nih.gov/Covid19GWASResults.aspx, accessed on 11/08/2020).
2.2. Ethics statement
All data analyzed in this study is summary-level data. The relevant ethical approval can be found in the corresponding studies referenced for each dataset.
2.3. Statistical analysis
We firstly applied MR to estimate the effect of each of the 11 exposures in turn on each protein in a two-sample setting [32]. Initially we used the inverse variance weighted (IVW) approach which takes the SNP-outcome estimates and regresses them on those for the SNP-exposure associations. This provides an overall weighted estimate of the causal effect which is based on the inverse of the square of the standard error for the SNP-outcome association [33]. We applied a correction using false discovery rate (FDR)<5% to these results to account for multiple testing. This threshold has been used in this study as a heuristic to highlight findings with the strongest statistical evidence to investigate in further detail.
For effects which survived FDR corrections we applied other MR methods as sensitivity analyzes. This firstly involved applying the weighted median and MR-Egger approaches which are more robust to horizontal pleiotropy in comparison to the IVW method [34, 35]. We also applied the MR directionality test (also referred to as the ‘Steiger method’) to support evidence that our genetic variant is a valid instrument for the exposure in line with the underlying assumptions of MR [36]. Multivariable MR was undertaken to evaluate the direct effects of exposures on circulating proteins whilst accounting for the effects of other exposures [37, 38]. In doing so, we were able to investigate whether the effect of exposures on proteins were putatively mediated via another exposure. To estimate the proportion mediated we applied the product method described previously by Burgess and colleagues [39], using genetic instruments from the GIANT consortium for BMI to avoid overlapping samples with the UK Biobank [40].
We next applied MR to investigate the genetically predicted effects of circulating proteins robust to FDR corrections and sensitivity analyzes in the previous analysis on risk of severe SARS-CoV-2 using GWAS results from Ellinghaus et al. This was undertaken by applying the IVW method which uses correlated cis-regulatory variants whilst accounting for their correlation structure [27]. For proteins which provided strong evidence of an effect using the IVW method, we also applied the MR-Egger approach whilst accounting for correlation structure amongst instruments. The MR-Egger is traditionally applied to investigate horizontal pleiotropy in MR studies, although was applied predominantly as a sensitivity analysis in this work given the low number of instrumental variables available for our cis-pQTL analyzes. We also attempted to replicate findings using the GWAS data from the covid-19 HGI and Johnson et al. analyzes [30, 31]. Finally, we applied the cis-correlated IVW approach systematically to all proteins with at least 2 cis-pQTL (as this is the minimum number required for the IVW method) for each covid-19 GWAS dataset. This allowed us to highlight proteins which may play a role in disease but are not strongly under the influence of modifiable risk factors. Importantly, given that these proteins were derived in samples who did not have covid-19, our MR estimates are therefore mainly useful in terms of prioritising candidates for further research rather than implicating them directly in severe covid-19 susceptibility.
All analyzes were undertaken using the ‘TwoSampleMR’ and ‘MendelianRandomization’ packages using R (version 3.5.1) [41, 42]. The forest plot in Fig. 2 was generated using ‘ggplot2’ v2.2.1 [43]. Fig. 3 was generated using the LD link resource [44].
2.4. Role of funding source
The funders had no role in study design, data collection, data analyzes, interpretation, or writing of report.
3. Results
3.1. A systematic Mendelian randomization analysis of circulating proteins
Across the 11 exposures assessed, there were 253 genetically predicted effects on circulating proteins which survived FDR<5% corrections using the IVW method (Supplementary Table 3). Amongst top findings was a strong effect of BMI on C-reactive protein (CRP) levels (Beta=0.34 per standard deviation change in BMI, 95% CI=0.26 to 0.41, P = 2.19 × 10−16) which is a well-established marker of chronic inflammation [45]. Elsewhere, there was strong evidence of an effect of HDL cholesterol on elevated levels of serum amyloid A-1 (Beta=0.23, 95% CI=0.16 to 0.29, P = 1.59 × 10−12) and A-2 (Beta=0.24, 95% CI=0.17 to 0.30, P = 4.38 × 10−13) proteins. Undertaking sensitivity analyzes found 106 effects that were robust to FDR<5% corrections using either the weighted median or MR-Egger methods (Supplementary Tables 4 and 5). The MR directionality tests provided evidence that assumptions regarding directionality may have been violated for one of these effects, which was between waist-hip-ratio adjusted for BMI and the protein ITIH3 (IVW Beta=−0.31, 95% CI=−0.46 to −0.16, P = 4.07 × 10−05) (Supplementary Table 6). IVW estimates for the 105 effects which were robust to sensitivity analyzes can be found in Supplementary Table 7. An overview of the analytical pipeline applied in this section including method and datasets used can be found in Supplementary Fig. 2.
Amongst the exposures which contributed most to the remaining 105 effects were BMI (23 effects) and triglycerides (27 effects). As illustrated in Fig. 2, the effects driven by BMI were typically spread across the 6 subcategories of circulating proteins. This included effects on coagulation factor IX (IVW Beta=0.21, 95% CI=0.14 to 0.27, P = 2.42 × 10−07), tissue-type plasminogen activator (Beta=0.16, 95% CI=0.09 to 0.23, P = 2.38 × 10−06) and cytokines including interleukin-1 receptor antagonist (Beta=0.23, 95% CI=0.17 to 0.30, P = 9.04 × 10−12). In contrast, the majority of triglycerides effects were found to be on proteins allocated to the SARS-CoV-Human protein-protein interaction (PPI) or severe covid-19 disease subcategories. There were exceptions to this however, such as an effect on clotting factor vitamin K-dependent protein S (Beta=0.21, 95% CI=0.15 to 0.27, P = 1.25 × 10−10) and cytokine signal transducer interleukin-6 receptor subunit beta (Beta=−0.21, 95% CI=−0.28 to −0.13, P = 4.12 × 10−08). We also note the conflicting directions of effect which risk factors have on the proteins assessed even within the same category. For example, within the fibrinolysis category BMI provided evidence of an effect on higher levels of fibrinogen (Beta=0.09, 95% CI=0.02 to 0.16, P = 0.008), as well as an inverse effect on antithrombin III (Beta=−0.25, 95% CI=−0.32 to −0.18, P = 4.43 × 10−13). Findings from the literature supported the direction of effect between BMI and circulating proteins for various effects identified in this analysis (CRP, Factor B and H, the interleukin 1 family of proteins, SAA/2, fibrinogen and antithrombin III), although for others there was no clear prior evidence suggesting that obesity influences their levels (Supplementary Table 8).
BMI is recognized to causally influence triglycerides and we therefore undertook multivariable MR to evaluate the direct effects of BMI and triglycerides on the 7 proteins which they had in common based on univariable estimates in the previous analysis. The majority of these effects remained robust after accounting for the effects of the other exposure, suggesting that BMI and triglycerides influence these proteins directly via separate causal pathways (Supplementary Table 9). The notable exception to this was the effect of BMI on immunoglobulin G. Effect estimates for BMI on this circulating protein identified in the univariable analysis (Beta=−0.11, 95% CI=−0.18 to 0.04, P = 0.02) attenuated to the null when accounting for the effect of triglycerides (Beta=−0.06, 95% CI=−0.13 to 0.02, P = 0.13). This suggests that BMI indirectly lowers immunoglobulin G due to its influence on raising triglyceride levels. We estimated that 44% of the BMI effect on immunoglobulin G was mediated via triglycerides using mediation MR.
3.2. Harnessing cis-regulatory variants to evaluate effects of circulating proteins on risk of severe SARS-CoV-2
For each protein highlighted in the previous analysis, we undertook MR using cis-pQTL as instruments to estimate their effects on risk of severe SARS-CoV-2 (Supplementary Table 10). Although no results survived multiple testing corrections based on a false-discovery rate threshold of less than 5%, we sought to replicate our lead findings from this analysis. The protein which provided the strongest evidence that it may influence risk of covid-19 was gp130, soluble, also known as glycoprotein 130 (odds ratio (OR)=1.81 increased risk of severe SARS-CoV-2 per doubling of gp130, 95% CI=1.25 to 2.62, P = 0.002). This protein is encoded by the IL6ST gene and is responsible for signal transduction with all members of the interleukin 6 receptor family [46]. There were 18 weakly correlated pQTL scattered across this locus used as instrumental variables as illustrated in Fig. 3. Their pairwise LD correlations can be found in Supplementary Table 11. Single variant associations for these 18 pQTL with severe SARS-CoV-2 suggested that the two instruments largely responsible for driving the overall IVW estimate were rs929108 (P = 0.002), which is located downstream of IL6ST (i.e. the opposite side compared to IL31RA) and rs6875155 (P = 0.006), located within the gene body of IL6ST itself.
Applying the MR-Egger method accounting for correlation structure using the 18 IL6ST instruments produced an elevated but imprecisely effect estimates which included the null (OR=1.55, 95% CI=1.00 to 2.39, P = 0.05). We were unable to detect robust evidence of replication using the two other covid-19 GWAS datasets (covid-19 host genetics initiative: OR=1.11, 95% CI=0.83 to 1.48, P = 0.48 & Risk of death due to covid-19 death in UK Biobank: OR=1.26, 95% CI=0.79 to 2.00, P = 0.34). Lastly, we applied the IVW method accounting for correlation structure to all proteins with at least 2 cis-pQTL to evaluate their genetically predicted effects on risk of covid-19. However, using current sample sizes we did not detect strong evidence that these circulating proteins influence risk of severe covid-19 based on multiple testing corrections (Supplementary Tables 13–15).
4. Discussion
We have undertaken a comprehensive Mendelian randomization study to systematically evaluate the effect of 11 established risk factors for disease on circulating levels of proteins related to SARS-CoV-2. Our main findings are that among the modifiable risk factors assessed, BMI and triglycerides showed the widest repertoire of causal effects on these circulating proteins (providing evidence of causation for 23 and 27 effects, respectively). Furthermore, of the circulating proteins investigated by our study, the strongest evidence of an effect on developing severe covid-19 was identified for glycoprotein 130, which is involved in the transmission of molecular signals for inflammatory interleukin cytokines.
It is important to recognise that the case definitions of severe covid-19 in these studies will capture both genetic effects on increased susceptibility to infection as well as increased progression to severe symptoms [47]. It is likely that genetic effects will differ between infection and progression, indeed they could even be in different directions. MR estimates derived using these datasets should therefore take this into account when interpreting the potential implications of therapeutically targeting any proteins highlighted by this (and similar) studies (Fig. 1B).
Amongst the 105 effects which were robust to multiple testing and sensitivity analyzes there were several well-established relationships based on the literature. For example, having a high BMI is a known driver of systemic inflammation as indexed by C-reactive protein levels [45] and acute inflammatory markers such as fibrinogen [48]. Other findings fit with the known biology of cardiometabolic risk factors and proteins identified by our analysis, such as the effect of HDL cholesterol on serum amyloid A-1 and A-2 proteins, which have previously been proposed as clinically applicable surrogates of HDL vascular functionality [49]. Whilst our results are therefore of immediate importance for SARS-CoV-2 research, they may also be valuable for future endeavors interested in the therapeutic potential of these proteins with respect to a wide range of disease outcomes.
There were several results from our study which may assist in unraveling the complex pathogenesis of severe SARS-CoV-2. For example, immunoglobulin G (IgG) is a class of antibodies produced by plasma B cells in the immune system in response to a pathogen [50]. Our results indicate that having a high BMI may reduce levels of circulating IgG, suggesting that people with obesity have less of this class of antibody to help protect from infection. That being said, an important consideration when interpreting this finding is that IgG levels were measured in individual's in a healthy state and can therefore only act a proxy for IgG response to infection. Additionally, generic IgG levels were measured rather than the specific adaptive immune response to SARS-CoV-2.
Additionally, our multivariable MR estimates for the effect BMI on IgG attenuated when accounting for the effect of triglycerides on this class of antibodies. This suggests that triglycerides may mediate the lowering effect of BMI on IgG, which we estimated as 41.1% of the total effect of BMI on IgG levels being mediated via triglycerides. Further research into the role of IgG and B cell immunity in response to the covid-19 pathogen is therefore warranted, particularly given that IgG is being measured by tests for antibody responses to SARS-CoV-2 [51]. Along with evaluating the effect of modifiable risk factors on antibody mediated immunity to covid-19, it will be critical to develop insight into how these factors influence cell mediated immunity given the emerging importance of the adaptive immune response to SARS-CoV-2 [52].
Using MR to estimate the genetically predicted effects of circulating proteins on severe covid-19 risk highlighted glycoprotein 130 as the protein with the strongest evidence of an effect on severe SARS-CoV-2 (OR=1.81, 95% CI=1.25 to 2.62, P = 0.002), however we were unable to replicate these findings in larger samples. A possible explanation for this could be the heterogeneity between the SARS-CoV-2 GWAS datasets analyzes and their variable case definitions. This requires robust replication that would be necessary before initiating further in-depth analyzes. Glycoprotein 130 is encoded by the IL6ST gene and belongs to the interleukin-6 family of cytokines [46]. It's activation is dependent upon the binding of cytokines with their receptors, such as interleukin-6 (IL6) with interleukin-6 receptor (IL6R) [53]. This is noteworthy due to the extensive interest in repurposing IL6R blockers as a potential therapeutic strategy for SARS-CoV-2 [54], [55], [56], [57], [58]. As lowering the levels of circulating IL6R will lead to lower activation of glycoprotein 130, estimates in this study suggest that this might result in reduced risk of severe SARS-CoV-2 symptoms. These findings therefore corroborate results from a recent MR study which used human genetic data to support the efficacy of IL6R inhibition as a potential treatment option for severe SARS-CoV-2 symptoms [59]. However the MR studies to date have not been able to reliably separate influences on risk of becoming infected from risk of progressing to severe disease following infection (Fig. 1B). Thus they do not provide robust evidence as to whether IL6R inhibition would be expected to favourably influence outcome in severe covid-19. Adequately powered randomized controlled trial data are essential for evaluating the clinical value of therapeutic intervention targeting IL6R [55].
This study has several limitations which should be taken into account when interpreting its findings. The current sample sizes of the SARS-CoV-2 GWAS are (as one would expect) relatively modest compared to large-scale GWAS data which MR studies are contemporaneously applied to, meaning that our cis-pQTL analysis is likely underpowered. We analyzed severe covid-19 as an outcome to mitigate reported selection bias of cases [29], so larger sample sizes of severe SARS-CoV-2 GWAS in the future should improve the statistical power of our approach. Furthermore, although data from plasma is of unprecedented sample size compared to previous large-scale pQTL analyzes (n = 10,708), it remains comparably modest to the sample sizes of GWAS used to derive instrument for the cardiometabolic exposures in this work. This is exaggerated by the fact that protein MRs are typically conducted using instruments relating to a single gene and therefore these genetic variants will explain a lower proportion of variance in the exposure than if instruments are taken from across the genome [60]. Therefore, although we have undertaken thorough evaluations to interrogate bi-directional relationships between the exposures and proteins in this study, the discrepancies between the samples sizes makes the direction of effect challenging to orientate (the majority of exposure instruments were derived using sample sizes of n=~440,000). Finally, although data from plasma pQTL studies provide an exceptional opportunity to leverage instruments for MR studies, it should be noted that serum plasma may not capture signatures confined to disease or cell-type relevant tissues. This is particularly important for a disease with a large autoimmune component such as covid-19 and further emphasis should therefore be noted when interpreting the results of our study on proteins such as IgG. Finally, studies need to be conducted on data that allow MR to separately investigate modifiable influences on acquiring SARS-CoV-2 and on progressing to severe covid-19 or death (Fig. 1B).
In conclusion, our MR study identified many effects between conventional risk factors and circulating proteins which provides a platform for prospective endeavors to dissect related disease pathways. Future research into the pathogenesis of the proteins highlighted by this study are warranted to discern whether they may hold therapeutic potential for severe covid-19.
Funding
TGR, SF, REM and GDS work at the MRC Integrative Epidemiology Unit at at the University of Bristol (MC_UU_00011/1). GDS conducts research at the NIHR Biomedical Research Centre at the University Hospitals Bristol NHS Foundation Trust and the University of Bristol. The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health. SF is supported by a Wellcome Trust PhD studentship in Molecular, Genetic and Lifecourse Epidemiology [108902/Z/15/Z]. TGR is a UKRI Innovation Research Fellow (MR/S003886/1). MVH works in a unit that receives funding from the UK Medical Research Council and is supported by a British Heart Foundation Intermediate Clinical Research Fellowship (FS/18/23/33512) and the National Institute for Health Research Oxford Biomedical Research Centre.
Author contributions
TGR undertook data analysis. SF conducted the literature search. All authors contributed to the design of the study and wrote the manuscript. All authors read and approved the final version of the manuscript.
Materials and correspondence
This publication is the work of the authors and TGR will serve as guarantor for the contents of this paper.
Declaration of Interests
Dr Holmes has collaborated with Boehringer Ingelheim in research, and in adherence to the University of Oxford's Clinical Trial Service Unit & Epidemiological Studies Unit (CSTU) staff policy, did not accept personal honoraria or other payments from pharmaceutical companies. All other authors declare no conflicts of interest.
Acknowledgments
We are extremely grateful to Maik Pietzner, Claudia Langenberg and all their colleagues who were involved in identifying the pQTL for the SARS-CoV-2-related proteins analyzed in this study. We are also enormously thankful to all the authors of the Ellinghaus et al. GWAS, the covid-19 Host Genetics Initiative and Dr Andrew Johnson and colleagues for sharing their summary statistics which made analyzes of covid-19 risk and cause of death in this study possible.
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.ebiom.2021.103228.
Appendix. Supplementary materials
References
- 1.The World Health Organisation. https://www.who.int/dg/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19-11-march-2020.
- 2.Dong E., Du H., Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20(5):533–534. doi: 10.1016/S1473-3099(20)30120-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wu Z., McGoogan J.M. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72314 cases from the Chinese center for disease control and prevention. JAMA. 2020 doi: 10.1001/jama.2020.2648. [DOI] [PubMed] [Google Scholar]
- 4.Sattar N., McInnes I.B., McMurray J.J.V. Obesity is a risk factor for severe COVID-19 infection: multiple potential mechanisms. Circulation. 2020;142(1):4–6. doi: 10.1161/CIRCULATIONAHA.120.047659. [DOI] [PubMed] [Google Scholar]
- 5.Fang L., Karakiulakis G., Roth M. Are patients with hypertension and diabetes mellitus at increased risk for COVID-19 infection? Lancet Respir Med. 2020;8(4):e21. doi: 10.1016/S2213-2600(20)30116-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pietzner M., Wheeler E., Carrasco-Zanini J. Genetic architecture of host proteins interacting with SARS-CoV-2. bioRxiv. 2020 doi: 10.1038/s41467-020-19996-z. 2020.07.01.182709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gordon D.E., Jang G.M., Bouhaddou M. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature. 2020;583(7816):459–468. doi: 10.1038/s41586-020-2286-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Davey Smith G., Ebrahim S. Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32(1):1–22. doi: 10.1093/ije/dyg070. [DOI] [PubMed] [Google Scholar]
- 9.Davey Smith G., Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23(R1):R89–R98. doi: 10.1093/hmg/ddu328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Williamson E.J., Walker A.J., Bhaskaran K. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584(7821):430–436. doi: 10.1038/s41586-020-2521-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ponsford M.J., Gkatzionis A., Walker V.M. Cardiometabolic traits, sepsis, and severe COVID-19: a Mendelian randomization investigation. Circulation. 2020;142(18):1791–1793. doi: 10.1161/CIRCULATIONAHA.120.050753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Aung N., Khanji M.Y., Munroe P.B., Petersen S.E. Causal inference for genetic obesity, cardiometabolic profile and COVID-19 susceptibility: a mendelian randomization study. Front Genet. 2020;11 doi: 10.3389/fgene.2020.586308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Liu D., Cui P., Zeng S. Risk factors for developing into critical COVID-19 patients in Wuhan, China: a multicenter, retrospective, cohort study. EClinicalMedicine. 2020;25 doi: 10.1016/j.eclinm.2020.100471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Morawietz H., Julius U., Bornstein S.R. Cardiovascular diseases, lipid-lowering therapies and European registries in the COVID-19 pandemic. Cardiovasc Res. 2020;116(10):e122–e125. doi: 10.1093/cvr/cvaa176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.O'Connor L.J., Price A.L. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nat Genet. 2018;50(12):1728–1734. doi: 10.1038/s41588-018-0255-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wootton R.E., Richmond R.C., Stuijfzand B.G. Evidence for causal effects of lifetime smoking on risk for depression and schizophrenia: a Mendelian randomisation study. Psychol Med. 2019:1–9. doi: 10.1017/S0033291719002678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Richardson T.G., Sanderson E., Palmer T.M. Evaluating the relationship between circulating lipoprotein lipids and apolipoproteins with risk of coronary heart disease: a multivariable Mendelian randomisation analysis. PLoS Med. 2020;17(3) doi: 10.1371/journal.pmed.1003062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Richardson T.G., Sanderson E., Elsworth B., Tilling K., Davey Smith G. Use of genetic variation to separate the effects of early and later life adiposity on disease risk: mendelian randomisation study. BMJ. 2020;369:m1203. doi: 10.1136/bmj.m1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.The 1000 Genomes Project. Auton A., Brooks L.D. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Messner C.B., Demichev V., Wendisch D. Ultra-high-throughput clinical proteomics reveals classifiers of COVID-19 infection. Cell Syst. 2020;11(1):11–24e4. doi: 10.1016/j.cels.2020.05.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hoffmann M., Kleine-Weber H., Schroeder S. SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor. Cell. 2020;181(2) doi: 10.1016/j.cell.2020.02.052. 271-80e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Jose R.J., Manuel A. COVID-19 cytokine storm: the interplay between inflammation and coagulation. Lancet Respir Med. 2020;8(6):e46–e47. doi: 10.1016/S2213-2600(20)30216-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Merad M., Martin J.C. Pathological inflammation in patients with COVID-19: a key role for monocytes and macrophages. Nat Rev Immunol. 2020;20(6):355–362. doi: 10.1038/s41577-020-0331-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhang L., Yan X., Fan Q. D-dimer levels on admission to predict in-hospital mortality in patients with Covid-19. J Thromb Haemost. 2020;18(6):1324–1329. doi: 10.1111/jth.14859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Violi F., Pastori D., Cangemi R., Pignatelli P., Loffredo L. Hypercoagulation and antithrombotic treatment in coronavirus 2019: a new challenge. Thromb Haemost. 2020;120(6):949–956. doi: 10.1055/s-0040-1710317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Burgess S., Dudbridge F., Thompson S.G. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat Med. 2016;35(11):1880–1906. doi: 10.1002/sim.6835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ellinghaus D., Degenhardt F., Bujanda L. Genomewide association study of severe Covid-19 with respiratory failure. N Engl J Med. 2020 doi: 10.1056/NEJMoa2020283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Griffith G.J., Morris T.T., Tudball M.J. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat Commun. 2020;11(1):5749. doi: 10.1038/s41467-020-19478-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.The COVID-19 Host Genetics Initiative The COVID-19 host genetics initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur J Hum Genet. 2020;28(6):715–718. doi: 10.1038/s41431-020-0636-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Johnson A.D. https://grasp.nhlbi.nih.gov/Covid19GWASResults.aspx.
- 32.Lawlor D.A. Commentary: two-sample Mendelian randomization: opportunities and challenges. Int J Epidemiol. 2016;45(3):908–915. doi: 10.1093/ije/dyw127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Burgess S., Butterworth A., Thompson S.G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37(7):658–665. doi: 10.1002/gepi.21758. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Bowden J., Davey Smith G., Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44(2):512–525. doi: 10.1093/ije/dyv080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Bowden J., Davey Smith G., Haycock P.C., Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol. 2016;40(4):304–314. doi: 10.1002/gepi.21965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hemani G., Tilling K., Davey Smith G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS Genet. 2017;13(11) doi: 10.1371/journal.pgen.1007081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Sanderson E., Smith G.D., Windmeijer F., Bowden J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol. 2020 doi: 10.1093/ije/dyaa101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Burgess S., Thompson S.G. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am J Epidemiol. 2015;181(4):251–260. doi: 10.1093/aje/kwu283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Burgess S., Daniel R.M., Butterworth A.S., Thompson S.G. Consortium EP-I. Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways. Int J Epidemiol. 2015;44(2):484–495. doi: 10.1093/ije/dyu176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Locke A.E., Kahali B., Berndt S.I. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518(7538):197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Hemani G., Zheng J., Elsworth B. The MR-base platform supports systematic causal inference across the human phenome. Elife. 2018;7 doi: 10.7554/eLife.34408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Yavorska O.O., Burgess S. Mendelian randomization: an R package for performing Mendelian randomization analyses using summarized data. Int J Epidemiol. 2017;46(6):1734–1739. doi: 10.1093/ije/dyx034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ginestet C. ggplot2 elegant graphics for data analysis. J R Stat Soc a Stat. 2011;174:245. [Google Scholar]
- 44.Machiela M.J., Chanock S.J. LDassoc: an online tool for interactively exploring genome-wide association study results and prioritizing variants for functional investigation. Bioinformatics. 2018;34(5):887–889. doi: 10.1093/bioinformatics/btx561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Timpson N.J., Nordestgaard B.G., Harbord R.M. C-reactive protein levels and body mass index: elucidating direction of causation through reciprocal Mendelian randomization. Int J Obes Lond. 2011;35(2):300–308. doi: 10.1038/ijo.2010.137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Heinrich P.C., Behrmann I., Haan S., Hermanns H.M., Muller-Newen G., Schaper F. Principles of interleukin (IL)-6-type cytokine signalling and its regulation. Biochem J. 2003;374(Pt 1):1–20. doi: 10.1042/BJ20030407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Paternoster L., Tilling K., Davey Smith G. Genetic epidemiology and Mendelian randomization for informing disease therapeutics: conceptual and methodological challenges. PLoS Genet. 2017;13(10) doi: 10.1371/journal.pgen.1006944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Fibrinogen Studies Collaboration. Kaptoge S., White I.R. Associations of plasma fibrinogen levels with established cardiovascular disease risk factors, inflammatory markers, and other characteristics: individual participant meta-analysis of 154,211 adults in 31 prospective studies: the fibrinogen studies collaboration. Am J Epidemiol. 2007;166(8):867–879. doi: 10.1093/aje/kwm191. [DOI] [PubMed] [Google Scholar]
- 49.Zewinger S., Drechsler C., Kleber M.E. Serum amyloid A: high-density lipoproteins interaction and cardiovascular risk. Eur Heart J. 2015;36(43):3007–3016. doi: 10.1093/eurheartj/ehv352. [DOI] [PubMed] [Google Scholar]
- 50.Reynolds H.Y. Immunoglobulin G and its function in the human respiratory tract. Mayo Clin Proc. 1988;63(2):161–174. doi: 10.1016/s0025-6196(12)64949-0. [DOI] [PubMed] [Google Scholar]
- 51.Long Q.X., Liu B.Z., Deng H.J. Antibody responses to SARS-CoV-2 in patients with COVID-19. Nat Med. 2020;26(6):845–848. doi: 10.1038/s41591-020-0897-1. [DOI] [PubMed] [Google Scholar]
- 52.Le Bert N., Tan A.T., Kunasegaran K. SARS-CoV-2-specific T cell immunity in cases of COVID-19 and SARS, and uninfected controls. Nature. 2020 doi: 10.1038/s41586-020-2550-z. [DOI] [PubMed] [Google Scholar]
- 53.Kurth I., Horsten U., Pflanz S. Activation of the signal transducer glycoprotein 130 by both IL-6 and IL-11 requires two distinct binding epitopes. J Immunol. 1999;162(3):1480–1487. [PubMed] [Google Scholar]
- 54.Xu X., Han M., Li T. Effective treatment of severe COVID-19 patients with tocilizumab. Proc Natl Acad Sci USA. 2020;117(20):10970–10975. doi: 10.1073/pnas.2005615117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Parr J.B. Time to reassess tocilizumab's role in COVID-19 pneumonia. JAMA Intern Med. 2020 doi: 10.1001/jamainternmed.2020.6557. [DOI] [PubMed] [Google Scholar]
- 56.Gupta S., Wang W., Hayek S.S. Association between early treatment with tocilizumab and mortality among critically ill patients with COVID-19. JAMA Intern Med. 2020 doi: 10.1001/jamainternmed.2020.6252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Salvarani C., Dolci G., Massari M. Effect of tocilizumab vs standard care on clinical worsening in patients hospitalized with COVID-19 pneumonia: a randomized clinical trial. JAMA Intern Med. 2020 doi: 10.1001/jamainternmed.2020.6615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Hermine O., Mariette X., Tharaux P.L. Effect of tocilizumab vs usual care in adults hospitalized with covid-19 and moderate or severe pneumonia: a randomized clinical trial. JAMA Intern Med. 2020 doi: 10.1001/jamainternmed.2020.6820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Bovijn J., Lindgren C.M., Holmes M.V. Genetic variants mimicking therapeutic inhibition of IL-6 receptor signaling and risk of COVID-19. Lancet Rheumatol. 2020;2(11):e658–e659. doi: 10.1016/S2665-9913(20)30345-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Holmes M.V., Richardson T.G., Ference B.A., Davies N.M., Davey Smith G. Integrating genomics with biomarkers and therapeutic targets to invigorate cardiovascular drug development. Nat Rev Cardiol. 2021 doi: 10.1038/s41569-020-00493-1. in press. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.