Abstract
Mendelian randomization (MR) is becoming more and more popular for inferring causal relationship between an exposure and a trait. Typically, instrument SNPs are selected from an exposure GWAS based on their summary statistics and the same summary statistics on the selected SNPs are used for subsequent analyses. However, this practice suffers from selection bias and can invalidate MR methods, as showcased via two popular methods: the summary data-based MR (SMR) method and the two-sample MR Steiger method. The SMR method is conservative while the MR Steiger method can be either conservative or liberal. A simple and yet more powerful alternative to SMR is proposed.
Subject terms: Genetics, Gene expression, Genetic association study
Introduction
As a feasible alternative to expensive and sometimes impossible randomized clinical trials, Mendelian randomization (MR) is becoming more and more popular for inferring causal relationship between an exposure and a trait1–3. Summary data-based two-sample MR methods often take the following two steps:
- Step 1
Obtain instruments (typically SNPs) from exposure GWAS (Genome-Wide Association Study) that are significant at genome-wide level (typically );
- Step 2
Investigate the causal relationship between the exposure and the trait, using the summary exposure GWAS statistics at the selected SNPs and a trait GWAS. The summary exposure GWAS statistics are those used in Step 1 for SNP selection.
One appealing feature of these methods is that they only rely on summary statistics on the exposure GWAS and the trait GWAS. Individual-level data are not needed.
The inference validity of this two-step approach is affected by selection bias. When conducting causal inference in Step 2 with respect to the SNPs selected in Step 1, the summary statistics from the exposure GWAS can not be regarded as random samples for the true population association strength4–6. Treating them as random samples leads to over-estimation of the effect size of these SNPs on the exposure. Association strength in a random sample is often much weaker, a phenomenon commonly seen in studies aimed at replicating previous findings. This selection bias has been noted in the literature6,7. But its effect on hypothesis testing related to two sample summary data based Mendelian randomization is largely unknown.
Two popular MR methods, the summary data-based MR method2 and the two-sample MR Steiger method1, are considered. For the summary data-based MR method, the most significant SNP (instead of several SNPs) from a gene is selected as the instrument from the exposure GWAS. For the two-sample MR Steiger method, a SNP significantly associated with both the exposure GWAS and the trait GWAS is selected. The genotype score (0, 1, or 2) at this SNP is denoted by g. The exposure level is denoted by x and the trait value is denoted by y. The Wald statistic on chi-square scale for testing the association between the SNP and the exposure is denoted by . Its value is supposed to be large because it satisfies the selection criterion used in Step 1. For instance, when the selection criterion is , there must be . The Wald statistic for testing the association between the SNP and the trait is denoted by , which is independent of .
Results
Summary data-based MR
Summary data-based MR2 (SMR) is a popular MR method for inferring causality between x and y. Its null is , where is the true regression coefficient for x with y the response. The two-stage least square (2SLS) estimate of is
1 |
where is the least square estimate of , the regression coefficient for g with x the response, and is the least square estimate of , the regression coefficient for g with y the response. is also known as the Wald ratio5. Causal relationship between exposure x and y exists if the following test statistic is significant2:
where and are Wald statistics on chi-square scale. The null distribution of is approximated by 1-df chi-square using the Delta method2.
There are several issues with statistic . The derivation of its null distribution assumes that is a consistent estimator of and (asymptotically) follows a normal distribution (2, Online Methods). However, these two conditions do not hold. If the significance level used in Step 1 is , there must be which implies . As a result, the distribution of is not (asymptotically) normal and is not a consistent estimator of . To numerically demonstrate this point, 10,000 random samples of are generated from a 1-df chi-square with a large non-centrality 13 (to make sure there are reasonable number of ). Among them, 322 are significant at genome-wide significant level . The quantile-quantile plot of these selected against 322 random samples from 1-df chi-square with non-centrality 13 is shown in Fig. 1. The distribution of is clearly different from the distribution of .
The applicability of the Delta method to approximating the distribution of is in doubt even in the absence of the selection process imposed on . Approximating the null distribution of by a 1-df chi-square is equivalent to approximating the null distribution of by a normal distribution. However, according to Eq. (1), is a ratio of two normals. In general, the distribution of the ratio of two normal variables can not be approximated by a normal as it can take a variety of shapes such as bimodal, unimodal, or asymmetric8. It is known that if and are both equal to 0, the distribution of would be a Cauchy, a fat-tailed distribution whose mean and variance do not exist. For the case and , the distribution of can be approximated by a normal only in certain intervals8.
For the case , to our best knowledge, there are no known theoretical results regarding whether the distribution of can be approximated by a normal. The only thing we are sure about is that the distribution of is symmetric because the distribution of is the same as the distribution of . A numerical example is used to examine the distribution of . Ten thousand random ’s are generated from and 10,000 random are generated from N(0, 1). A normal quantile plot of is generated using the qqnorm and qqline functions in R with their default settings and is shown in Fig. 2 (left panel). Similar to a Cauchy distribution, the distribution of is apparently fat-tailed compared to a normal: the lower end is more negative while the upper end is more positive.
A normal quantile plot is also generated for and is shown in the right panel of Fig. 2. It may be surprising that the distribution of appears to be a normal. The reason of this phenomenon is that the range of is greatly reduced under the selection criterion. According to Eq. (1), is roughly proportional to with high probability.
A more general argument that the approximating distribution of is not 1-df chi-square is the following. Since
there is regardless of the distribution of . That is, is always dominated by . Similarly, is always dominated by . Therefore, . Since and approximately follow independent 1-df chi-square distributions, the approximating distribution of can not be 1-df chi-square. Neither the approximate distribution of . Using a 1-df chi-square distribution for results in a conservative test.
We performed extensive simulations to investigate the null distribution of the SMR statistic in a more realistic setting by using imputed GWAS genotype data from the Atherosclerosis Risk in Communities (ARIC) study of European-ancestry samples9. Specifically, we simulated gene expression levels for each Ensemble gene on autosomes at varying numbers of causal eQTLs (n = 1, 5, and 10), (narrow sense) heritability levels ( = 0.1, 0.2, 0.4, 0.8), and sample sizes (N = 250, 500, 1000, and 2000). We tested association between all SNPs within each gene and expression levels of the gene, and only genes whose top associated SNP met the selection criteria () were subjected to SMR test. GWAS association signals were randomly assigned from a standard normal distribution. Figure 3 shows the QQ plot for the SMR statistics when instrumental eQTLs were selected from genes with 5 causal eQTLs and a level of heritability = 0.4 at all four sample sizes. Clearly, the SMR statistics were lower than expected null values at the tail of distribution, though the distribution became closer to the null at larger sample size, which may be explained by the stronger eQTL signals as shown in our numerical example above. The complete set of QQ plots for the SMR test statistic are shown in Supplementary Figs. S1–S12 online. Overall, our simulations showed that the SMR statistics were conservative and did not strictly follow the 1-df chi-squire distribution, especially when the effect size of each individual eQTL was small on average. These results are consistent with our theoretical insights.
More on SMR and a conditional test
One may want to use an estimate of that takes into account the selection. However, such an estimate is not expected to be simple given the complexity of the selection (e.g., the SNP is the most significant one among a number of SNPs). Another alternative is to use another exposure GWAS independent of the exposure GWAS used in Step 1 to estimate and then compute . However, this is not recommended because is inherently conservative. is equal to the half of the harmonic mean of and . Fixing one of and , say , and change , reaches its smallest value when and converges to when . The conservativeness of is also observed in simulation studies by Veturi and Ritchie10.
The null hypothesis for was not specifically defined in Zhu et al.2. It is unlikely to be the intended null . Actually, similar to the Sobel’s statistic popular in mediation analysis, the null corresponding to is . For this null, a statistic more powerful than is . The statistic rejects the null if and only if both and are significant. Therefore, whenever rejects the null, will but not vice versa. This is because .
A test more powerful than (hence also more powerful than ) in the current situation is a conditional test. Because the SNP is selected for its significant association with the exposure, the situation can be excluded. Given this information, a meaningful null would be for which a test statistic is . The null is rejected when is significant. This test, conditional on a significant statistic, assumes that there is no pleiotropy. That is, the selected SNP affects the trait only through the exposure and there are no other paths. In other words, the selected SNP is a valid instrument. In light of Eq. (1), if and only if when the possibility of is excluded. Hence the null for this conditional test is equivalent to . This test is asymptotically valid because asymptotically follows a 1-df chi-square distribution. The threshold for significance for this test is not at the genome level. Rather, it is at the gene level and only needs to be corrected for the number of genes for which SNPs are selected for instruments. This results in a more powerful testing procedure than using a genome-wide threshold.
An empirical study
We compared the performance of conditional test we proposed and the SMR test on an empirical study of schizophrenia. We used to-date the largest GWAS summary statistics for schizophrenia11 and the eQTL results from analysis of 1387 brain samples (prefrontal cortex) by the PsychENCODE12 (downloaded from the SMR data resource website). In total, 9639 genes were tested for SMR at a top associated cis-eQTL () and 65 genes were significant after Bonferroni correction. In contrast, the conditional test, whose test statistic is and considers only those instrumental eQTLs, discovered 127 Bonferroni-significant genes, including 62 genes not detected by SMR (. Supplementary Table S1 online). Among those genes missed by SMR, there were several strong candidates for schizophrenia, such as AKT313–15, RGS616,17, and KCNN3. It may not be surprising that AKT3 and RGS6 were identified as these two harbored genome-wide significant variants () in original GWAS11, but the discovery of KCNN3 was novel and the strongest SNP-level association evidence for this gene was only at (rs10796933). Of note, our previous study also showed evidence for the association of KCNN3 with schizophrenia through integrated analysis of GWAS with methylation QTL18.
Two-sample MR Steiger method
The two-sample MR Steiger method1,19 assumes that there is a causal relationship between the exposure and the trait and that the selected SNP is a valid instrument for one of them (but it is unknown for which one)1. A SNP is selected not only for its association with the exposure but also for its association with the trait1,19. The null for the two-sample MR Steiger test is where and are the (population) Pearson correlation coefficients. Let and be the sample correlation coefficients corresponding to and , respectively. Using Fisher’s Z transformation, there are
2 |
3 |
where and are sample sizes. The null is equivalent to saying that the mean of is equal to the mean of . The two-sample MR Steiger method uses the following statistic1,19:
If is significant and positive, the causal direction is from x to y. If is significant and negative, the causal direction is from y to x.
However, the statistic does not approximately follow a standard normal distribution because the SNP is selected for its significant p-values. Using a selection criterion , or and greater than 29.71679 on 1-df chi-square scale, the sample correlation coefficients and would be at least 0.4823663, 0.1700451, or 0.05443772 for 100, 1000, or 10,000 given the relationship . Although this selection procedure is useful for selecting the instrument SNP, it imposes a lower limit on and . over-estimates and is not consistent. The mean of the statistic is not around 0 even when holds if . The distribution of is truncated and is not normal. So is the distribution of . The variance of is smaller than due to selection. Similarly, the variance of is smaller than . When , the numerator of is around 0 and is conservative. When , the numerator of is no longer around 0 and is liberal. Overall, the distributions of and are truncated normal instead of normal. The argument that the statistic follows asymptotically a standard normal does not hold. The two-sample MR Steiger method can be either liberal or conservative.
Numerical examples are constructed. First we consider the case and demonstrate the effect of selection severity. Ten thousand random samples of and are independently generated from the normal distributions shown in Eqs. (2) and (3). These and form a matrix. The first column contains values for and the second for . Only the rows satisfying and are kept. This selection criterion corresponds to on the p-value scale. When , there are 2557 selected on which the statistic is computed. The sample mean of selected is 0.1903508 while the sample mean of selected () is lower, as expected. A normal quantile-quantile plot of is shown in the left panel of Fig. 4. Clearly the distribution of is different from normal. Type I error rates are inflated. At significance level 0.05 and 0.01, the type I error rates (i.e., the proportion of significant statistics) are 0.08916699 and 0.01486117, respectively. If , the selection is less severe. Almost 75% (7434 out of 10,000) s are selected. Even so, the distribution of shows apparent departure from normal as shown in the right panel of Fig. 4. At significance level 0.05 and 0.01, the type I error rates are 0.0306699 and 0.005111649, respectively. In this case, appears to be conservative.
We also considered larger sample sizes. When , and , 2,375 s are selected. When and , 6,410 s are selected. As shown in Fig. 5, there is apparent departure of the distribution of from a normal. At significance level 0.05, the type I error rate is 0.09515789 for the case and is 0.03728549 for . At significance level 0.01, the type I error rates are 0.01515789 and 0.006084243, respectively. The type I error rates can be either inflated or deflated.
One remedy would be to estimate and by maximizing the conditional likelihood given the SNP selection criteria. Let and denote the density function and the distribution function of the standard normal, respectively. The likelihood ratio statistic for testing is where
with and selection thresholds corresponding to and , respectively. However, due to selection, computation of and can be challenging. One alternative method is to use an exposure GWAS and a trait GWAS that are independent of those used to select the SNP. However, such studies may be impractical to obtain6.
Discussion
Summary statistics MR is subject to selection bias, resulting in excessive false positives (for instance, the MR Steiger method) or missed discoveries (for instance, the SMR method). This bias is a form of winner’s curse. Selection bias has been discussed in the literature in the context of the choice of the instrument SNPs7, colocalisation test20, and estimation of exposure effect5,6.
Our work complements previous studies on selection bias due to selection of SNPs. While previous work focused on the effect of this bias on the Wald ratio5,6 (i.e., estimation), ours focuses on testing whether the exposure causally affects the outcome (i.e., inference). Selection bias leads to underestimation of the Wald ratio5 but its effect on type I error rate can be either liberal or conservative depending on the MR method used. Most importantly, the SMR method is conservative even in the absence of selection bias where is approximately normal.
Correcting for selection bias is a challenging task. Zhao et al.6 get around this issue by using an independent exposure GWAS. On the other hand, our conditional test, an alternative to the SMR method, uses only the trait GWAS. It may be expanded to accommodate multiple instrumental SNPs and the presence of pleiotropy.
Supplementary Information
Acknowledgements
This study was partially supported by National Institutes of Health grant R01MH121394 (to SH).
Author contributions
K.W. conceived the experiment and conducted some simulations, S.H. conducted the simulation study and the empirical study. Both authors drafted and reviewed the manuscript.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-021-87219-6.
References
- 1.Hemani G, Tilling K, Smith GD. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS Genet. 2017;13:e1007081. doi: 10.1371/journal.pgen.1007081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zhu Z, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 2016;48:481. doi: 10.1038/ng.3538. [DOI] [PubMed] [Google Scholar]
- 3.Morrison, J., Knoblauch, N., Marcus, J. H., Stephens, M. & He, X. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nat. Genet.52, 740-747 (2020). [DOI] [PMC free article] [PubMed]
- 4.Bowden J, Dudbridge F. Unbiased estimation of odds ratios: Combining genomewide association scans with replication studies. Genet. Epidemiol. 2009;33:406–418. doi: 10.1002/gepi.20394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Haycock PC, et al. Best (but oft-forgotten) practices: The design, analysis, and interpretation of Mendelian randomization studies. Am. J. Clin. Nutr. 2016;103:965–978. doi: 10.3945/ajcn.115.118216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhao Q, et al. Statistical inference in two-sample summary-data mendelian randomization using robust adjusted profile score. Ann. Stat. 2020;48:1742–1769. doi: 10.1214/19-AOS1866. [DOI] [Google Scholar]
- 7.Hemani G, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7:e34408. doi: 10.7554/eLife.34408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Díaz-Francés E, Rubio FJ. On the existence of a normal approximation to the distribution of the ratio of two independent normal random variables. Stat. Pap. 2013;54:309–323. doi: 10.1007/s00362-012-0429-2. [DOI] [Google Scholar]
- 9.The ARIC investigators The Atherosclerosis Risk in Communities (ARIC) Study: Design and objectives. Am. J. 1989;129:687–702. [PubMed] [Google Scholar]
- 10.Veturi Y, Ritchie MD. How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures? Pac. Symp. Biocomput. 2018;23:228–239. [PMC free article] [PubMed] [Google Scholar]
- 11.Pardiñas AF, et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet. 2018;50:381–389. doi: 10.1038/s41588-018-0059-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science362, eaat8464 (2018). [DOI] [PMC free article] [PubMed]
- 13.Schizophrenia Working Group of the Psychiatric Genomics Consortium and othersSchizophrenia Working Group of the Psychiatric Genomics Consortium and others Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–427. doi: 10.1038/nature13595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ripke S, et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 2013;45:1150. doi: 10.1038/ng.2742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ruderfer DM, et al. Polygenic overlap between schizophrenia risk and antipsychotic response: A genomic medicine approach. Lancet Psychiatry. 2016;3:350–357. doi: 10.1016/S2215-0366(15)00553-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ahlers KE, Chakravarti B, Fisher RA. RGS6 as a novel therapeutic target in CNS diseases and cancer. AAPS J. 2016;18:560–572. doi: 10.1208/s12248-016-9899-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Radulescu, E. et al. Identification and prioritization of gene sets associated with schizophrenia risk by co-expression network analysis in human brain. Mol. Psychiatry 25, 791-804 (2018). [DOI] [PubMed]
- 18.Han, S. et al. Integrating brain methylome with gwas for psychiatric risk gene discovery. bioRxiv 440206 (2018).
- 19.Xue H, Pan W. Inferring causal direction between two traits in the presence of horizontal pleiotropy with GWAS summary data. PLoS Genet. 2020;16:e1009105. doi: 10.1371/journal.pgen.1009105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Wallace C. Statistical testing of shared genetic control for potentially related traits. Genet. Epidemiol. 2013;37:802–813. doi: 10.1002/gepi.21765. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.