Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2021 Apr 7;11:7585. doi: 10.1038/s41598-021-87219-6

Effect of selection bias on two sample summary data based Mendelian randomization

Kai Wang 1,, Shizhong Han 2,3
PMCID: PMC8027662  PMID: 33828182

Abstract

Mendelian randomization (MR) is becoming more and more popular for inferring causal relationship between an exposure and a trait. Typically, instrument SNPs are selected from an exposure GWAS based on their summary statistics and the same summary statistics on the selected SNPs are used for subsequent analyses. However, this practice suffers from selection bias and can invalidate MR methods, as showcased via two popular methods: the summary data-based MR (SMR) method and the two-sample MR Steiger method. The SMR method is conservative while the MR Steiger method can be either conservative or liberal. A simple and yet more powerful alternative to SMR is proposed.

Subject terms: Genetics, Gene expression, Genetic association study

Introduction

As a feasible alternative to expensive and sometimes impossible randomized clinical trials, Mendelian randomization (MR) is becoming more and more popular for inferring causal relationship between an exposure and a trait13. Summary data-based two-sample MR methods often take the following two steps:

Step 1

Obtain instruments (typically SNPs) from exposure GWAS (Genome-Wide Association Study) that are significant at genome-wide level (typically p<5×10-8);

Step 2

Investigate the causal relationship between the exposure and the trait, using the summary exposure GWAS statistics at the selected SNPs and a trait GWAS. The summary exposure GWAS statistics are those used in Step 1 for SNP selection.

One appealing feature of these methods is that they only rely on summary statistics on the exposure GWAS and the trait GWAS. Individual-level data are not needed.

The inference validity of this two-step approach is affected by selection bias. When conducting causal inference in Step 2 with respect to the SNPs selected in Step 1, the summary statistics from the exposure GWAS can not be regarded as random samples for the true population association strength46. Treating them as random samples leads to over-estimation of the effect size of these SNPs on the exposure. Association strength in a random sample is often much weaker, a phenomenon commonly seen in studies aimed at replicating previous findings. This selection bias has been noted in the literature6,7. But its effect on hypothesis testing related to two sample summary data based Mendelian randomization is largely unknown.

Two popular MR methods, the summary data-based MR method2 and the two-sample MR Steiger method1, are considered. For the summary data-based MR method, the most significant SNP (instead of several SNPs) from a gene is selected as the instrument from the exposure GWAS. For the two-sample MR Steiger method, a SNP significantly associated with both the exposure GWAS and the trait GWAS is selected. The genotype score (0, 1, or 2) at this SNP is denoted by g. The exposure level is denoted by x and the trait value is denoted by y. The Wald statistic on chi-square scale for testing the association between the SNP and the exposure is denoted by Wgx. Its value is supposed to be large because it satisfies the selection criterion used in Step 1. For instance, when the selection criterion is p<5×10-8, there must be Wgx>29.71679. The Wald statistic for testing the association between the SNP and the trait is denoted by Wgy, which is independent of Wgx.

Results

Summary data-based MR

Summary data-based MR2 (SMR) is a popular MR method for inferring causality between x and y. Its null is H0:bxy=0, where bxy is the true regression coefficient for x with y the response. The two-stage least square (2SLS) estimate of bxy is

b^xy=b^gyb^gx, 1

where b^gx is the least square estimate of bgx, the regression coefficient for g with x the response, and b^gy is the least square estimate of bgy, the regression coefficient for g with y the response. b^xy is also known as the Wald ratio5. Causal relationship between exposure x and y exists if the following test statistic is significant2:

TSMR=WgxWgyWgx+Wgy,

where Wgx=[b^gx/SE(b^gx)]2 and Wgy=[b^gy/SE(b^gy)]2 are Wald statistics on chi-square scale. The null distribution of TSMR is approximated by 1-df chi-square using the Delta method2.

There are several issues with statistic TSMR. The derivation of its null distribution assumes that b^gx is a consistent estimator of bgx and (asymptotically) follows a normal distribution (2, Online Methods). However, these two conditions do not hold. If the significance level used in Step 1 is 5×10-8, there must be Wgx29.71679 which implies |b^gx|29.71679×SE(b^gx). As a result, the distribution of b^gx is not (asymptotically) normal and b^gx is not a consistent estimator of bgx. To numerically demonstrate this point, 10,000 random samples of Wgx are generated from a 1-df chi-square with a large non-centrality 13 (to make sure there are reasonable number of {Wgx}). Among them, 322 are significant at genome-wide significant level 5×10-8. The quantile-quantile plot of these selected {Wgx} against 322 random samples {Wgy} from 1-df chi-square with non-centrality 13 is shown in Fig. 1. The distribution of {Wgx:Wgx29.71679} is clearly different from the distribution of {Wgx}.

Figure 1.

Figure 1

Quantile-quantile plot for selected {Wgx} (322 out of 10,000) and 322 random {Wgy}. The distribution of selected {Wgx} is different from the distribution of random {Wgy} as shown by the deviation of the points from the 45° line. The vertical line indicates the selection threshold Wgx29.71679 which corresponds to genome-wide significance level 5×10-8.

The applicability of the Delta method to approximating the distribution of TSMR is in doubt even in the absence of the selection process imposed on Wgx. Approximating the null distribution of TSMR by a 1-df chi-square is equivalent to approximating the null distribution of b^xy by a normal distribution. However, according to Eq. (1), b^xy is a ratio of two normals. In general, the distribution of the ratio of two normal variables can not be approximated by a normal as it can take a variety of shapes such as bimodal, unimodal, or asymmetric8. It is known that if bgx and bgy are both equal to 0, the distribution of b^xy would be a Cauchy, a fat-tailed distribution whose mean and variance do not exist. For the case bgx0 and bgy0, the distribution of b^xy can be approximated by a normal only in certain intervals8.

For the case bgy=0, to our best knowledge, there are no known theoretical results regarding whether the distribution of b^xy can be approximated by a normal. The only thing we are sure about is that the distribution of b^xy is symmetric because the distribution of -b^xy=(-b^gy)/b^gx is the same as the distribution of b^xy. A numerical example is used to examine the distribution of b^xy. Ten thousand random b^gx’s are generated from N(13,1) and 10,000 random b^gy are generated from N(0, 1). A normal quantile plot of b^gy/b^gx is generated using the qqnorm and qqline functions in R with their default settings and is shown in Fig. 2 (left panel). Similar to a Cauchy distribution, the distribution of b^xy=b^gy/b^gx is apparently fat-tailed compared to a normal: the lower end is more negative while the upper end is more positive.

Figure 2.

Figure 2

Normal quantile plot for 10,000 b^xy=b^gy/b^gx generated under bgx=13 and bgy=0. The distribution of {b^xy} appears to be fat-tailed compared to a normal (left panel). The distribution of {b^xy:b^gxis significant} (436 out of 10,000) seems to be a normal (right panel) due to selection imposed on b^gx. See the text for explanation.

A normal quantile plot is also generated for {b^xy:W^gx29.71679} and is shown in the right panel of Fig. 2. It may be surprising that the distribution of {b^xy:Wgx29.71679} appears to be a normal. The reason of this phenomenon is that the range of b^gx is greatly reduced under the selection criterion. According to Eq. (1), b^xy is roughly proportional to b^gy with high probability.

A more general argument that the approximating distribution of TSMR is not 1-df chi-square is the following. Since

TSMR=Wgy·11+Wgy/Wgx,

there is TSMR<Wgy regardless of the distribution of Wgx. That is, TSMR is always dominated by Wgy. Similarly, TSMR is always dominated by Wgx. Therefore, TSMR<min{Wgx,Wgy}. Since Wgx and Wgy approximately follow independent 1-df chi-square distributions, the approximating distribution of min{Wgx,Wgy} can not be 1-df chi-square. Neither the approximate distribution of TSMR. Using a 1-df chi-square distribution for TSMR results in a conservative test.

We performed extensive simulations to investigate the null distribution of the SMR statistic in a more realistic setting by using imputed GWAS genotype data from the Atherosclerosis Risk in Communities (ARIC) study of European-ancestry samples9. Specifically, we simulated gene expression levels for each Ensemble gene on autosomes at varying numbers of causal eQTLs (n = 1, 5, and 10), (narrow sense) heritability levels (h2 = 0.1, 0.2, 0.4, 0.8), and sample sizes (N = 250, 500, 1000, and 2000). We tested association between all SNPs within each gene and expression levels of the gene, and only genes whose top associated SNP met the selection criteria (p<5×10-8) were subjected to SMR test. GWAS association signals were randomly assigned from a standard normal distribution. Figure 3 shows the QQ plot for the SMR statistics when instrumental eQTLs were selected from genes with 5 causal eQTLs and a level of heritability = 0.4 at all four sample sizes. Clearly, the SMR statistics were lower than expected null values at the tail of distribution, though the distribution became closer to the null at larger sample size, which may be explained by the stronger eQTL signals as shown in our numerical example above. The complete set of QQ plots for the SMR test statistic are shown in Supplementary Figs. S1S12 online. Overall, our simulations showed that the SMR statistics were conservative and did not strictly follow the 1-df chi-squire distribution, especially when the effect size of each individual eQTL was small on average. These results are consistent with our theoretical insights.

Figure 3.

Figure 3

Quantile-quantile plot for simulated SMR statistics against statistics of 1-df chi-squire distribution. Instrumental eQTLs for SMR test were top associated eQTL (p<5×10-8) selected from genes whose expression levels were simulated under a genetic model of 5 causal eQTLs and heritability of 0.4 at four different sample sizes (N = 250, 500, 1000, and 2000). The grey areas represent the 95% confidence band around 1-df chi-square statistics.

More on SMR and a conditional test

One may want to use an estimate of bgx that takes into account the selection. However, such an estimate is not expected to be simple given the complexity of the selection (e.g., the SNP is the most significant one among a number of SNPs). Another alternative is to use another exposure GWAS independent of the exposure GWAS used in Step 1 to estimate bgx and then compute TSMR. However, this is not recommended because TSMR is inherently conservative. TSMR is equal to the half of the harmonic mean of Wgx and Wgy. Fixing one of Wgx and Wgy, say Wgx, and change Wgy, TSMR reaches its smallest value Wgx/2 when Wgy=Wgx and converges to Wgx when Wgy. The conservativeness of TSMR is also observed in simulation studies by Veturi and Ritchie10.

The null hypothesis for TSMR was not specifically defined in Zhu et al.2. It is unlikely to be the intended null H0:bxy=0. Actually, similar to the Sobel’s statistic popular in mediation analysis, the null corresponding to TSMR is H0:bgx=0orbgy=0. For this null, a statistic more powerful than TSMR is min{Wgx,Wgy}. The statistic min{Wgx,Wgy} rejects the null H0:bgx=0orbgy=0 if and only if both Wgx and Wgy are significant. Therefore, whenever min{Wgx,Wgy} rejects the null, TSMR will but not vice versa. This is because TSMR<min{Wgx,Wgy}.

A test more powerful than min{Wgx,Wgy} (hence also more powerful than TSMR) in the current situation is a conditional test. Because the SNP is selected for its significant association with the exposure, the situation bgx=0 can be excluded. Given this information, a meaningful null would be H0:bgy=0,bgx0 for which a test statistic is Wgy. The null is rejected when Wgy is significant. This test, conditional on a significant Wgx statistic, assumes that there is no pleiotropy. That is, the selected SNP affects the trait only through the exposure and there are no other paths. In other words, the selected SNP is a valid instrument. In light of Eq. (1), bgy=0 if and only if bxy=0 when the possibility of bgx=0 is excluded. Hence the null for this conditional test is equivalent to H0:bxy=0. This test is asymptotically valid because Wgy asymptotically follows a 1-df chi-square distribution. The threshold for significance for this test is not at the genome level. Rather, it is at the gene level and only needs to be corrected for the number of genes for which SNPs are selected for instruments. This results in a more powerful testing procedure than using a genome-wide threshold.

An empirical study

We compared the performance of conditional test we proposed and the SMR test on an empirical study of schizophrenia. We used to-date the largest GWAS summary statistics for schizophrenia11 and the eQTL results from analysis of 1387 brain samples (prefrontal cortex) by the PsychENCODE12 (downloaded from the SMR data resource website). In total, 9639 genes were tested for SMR at a top associated cis-eQTL (p<5×10-8) and 65 genes were significant after Bonferroni correction. In contrast, the conditional test, whose test statistic is Wgy and considers only those instrumental eQTLs, discovered 127 Bonferroni-significant genes, including 62 genes not detected by SMR (p<0.05/9639=5.18726×10-6. Supplementary Table S1 online). Among those genes missed by SMR, there were several strong candidates for schizophrenia, such as AKT31315, RGS616,17, and KCNN3. It may not be surprising that AKT3 and RGS6 were identified as these two harbored genome-wide significant variants (p<5×10-8) in original GWAS11, but the discovery of KCNN3 was novel and the strongest SNP-level association evidence for this gene was only at p=9×10-7 (rs10796933). Of note, our previous study also showed evidence for the association of KCNN3 with schizophrenia through integrated analysis of GWAS with methylation QTL18.

Two-sample MR Steiger method

The two-sample MR Steiger method1,19 assumes that there is a causal relationship between the exposure and the trait and that the selected SNP is a valid instrument for one of them (but it is unknown for which one)1. A SNP is selected not only for its association with the exposure but also for its association with the trait1,19. The null for the two-sample MR Steiger test is H0:ρgx=ρgy where ρgx=Corr(g,x) and ρgy=Corr(g,y) are the (population) Pearson correlation coefficients. Let ρ^gx and ρ^gy be the sample correlation coefficients corresponding to ρgx and ρgy, respectively. Using Fisher’s Z transformation, there are

zgx:=12ln1+|ρ^gx|1-|ρ^gx|N12ln1+|ρgx|1-|ρgx|,1nx-3,and 2
zgy:=12ln1+|ρ^gy|1-|ρ^gy|N12ln1+|ρgy|1-|ρgy|,1ny-3, 3

where nx and ny are sample sizes. The null H0:ρgx=ρgy is equivalent to saying that the mean of zqx is equal to the mean of zqy. The two-sample MR Steiger method uses the following statistic1,19:

TSteiger=zgx-zgy1/(nx-3)+1/(ny-3)N(0,1).

If TSteiger is significant and positive, the causal direction is from x to y. If TSteiger is significant and negative, the causal direction is from y to x.

However, the statistic TSteiger does not approximately follow a standard normal distribution because the SNP is selected for its significant p-values. Using a selection criterion p<5×10-8, or Wgx and Wgy greater than 29.71679 on 1-df chi-square scale, the sample correlation coefficients |ρ^gx| and |ρ^gy| would be at least 0.4823663, 0.1700451, or 0.05443772 for nx= 100, 1000, or 10,000 given the relationship |ρ^gx|=1/1+(nx-2)/Wgx. Although this selection procedure is useful for selecting the instrument SNP, it imposes a lower limit on |ρ^gx| and |ρ^gy|. |ρ^gx||ρ^gy| over-estimates |ρgx||ρgy| and is not consistent. The mean of the statistic TSteiger is not around 0 even when H0:ρgx=ρgy holds if nxny. The distribution of zgx is truncated and is not normal. So is the distribution of zgy. The variance of zgx is smaller than 1/(nx-3) due to selection. Similarly, the variance of zgy is smaller than 1/(ny-3). When nx=ny, the numerator of TSteiger is around 0 and TSteiger is conservative. When nxny, the numerator of TSteiger is no longer around 0 and TSteiger is liberal. Overall, the distributions of zgx and zgy are truncated normal instead of normal. The argument that the statistic TSteiger follows asymptotically a standard normal does not hold. The two-sample MR Steiger method can be either liberal or conservative.

Numerical examples are constructed. First we consider the case nx=1000,ny=10,000 and demonstrate the effect of selection severity. Ten thousand random samples of zgx and zgy are independently generated from the normal distributions shown in Eqs. (2) and (3). These zgx and zgy form a 10,000×2 matrix. The first column contains values for zgx and the second for zgy. Only the rows satisfying zgx0.5ln[(1+0.1700451)/(1-0.1700451)]=0.17171315 and zgy0.5ln[(1+0.05443772)/(1-0.05443772)]=0.05449159 are kept. This selection criterion corresponds to 5×10-8 on the p-value scale. When ρgx=ρgy=0.15, there are 2557 (zgx,zgy) selected on which the statistic TSteiger is computed. The sample mean of selected {zgx} is 0.1903508 while the sample mean of selected {zgy} (=0.1512602) is lower, as expected. A normal quantile-quantile plot of TSteiger is shown in the left panel of Fig. 4. Clearly the distribution of TSteiger is different from normal. Type I error rates are inflated. At significance level 0.05 and 0.01, the type I error rates (i.e., the proportion of significant TSteiger statistics) are 0.08916699 and 0.01486117, respectively. If ρgx=ρgy=0.19, the selection is less severe. Almost 75% (7434 out of 10,000) (zgx,zgy)s are selected. Even so, the distribution of TSteiger shows apparent departure from normal as shown in the right panel of Fig. 4. At significance level 0.05 and 0.01, the type I error rates are 0.0306699 and 0.005111649, respectively. In this case, TSteiger appears to be conservative.

Figure 4.

Figure 4

Normal Q-Q plot of simulated TSteiger with nx=1000,ny=10,000. {(zgx,zgy)} are selected from 10,000 replicates at genome-wide significance level 5×10-8.

We also considered larger sample sizes. When nx=100,000,ny=300,000, and ρgx=ρgy=0.015, 2,375 (zgx,zgy)s are selected. When nx=150,000,ny=400,000 and ρgx=ρgy=0.015, 6,410 (zgx,zgy)s are selected. As shown in Fig. 5, there is apparent departure of the distribution of TSteiger from a normal. At significance level 0.05, the type I error rate is 0.09515789 for the case nx=100,000,ny=300,000 and is 0.03728549 for nx=100,100,ny=400,000. At significance level 0.01, the type I error rates are 0.01515789 and 0.006084243, respectively. The type I error rates can be either inflated or deflated.

Figure 5.

Figure 5

Normal Q-Q plot of simulated TSteiger with ρgx=ρgy=0.015. {(zgx,zgy)} are selected from 10,000 replicates at genome-wide significance level 5×10-8.

One remedy would be to estimate ρgx and ρgy by maximizing the conditional likelihood given the SNP selection criteria. Let ϕ(·) and Φ(·) denote the density function and the distribution function of the standard normal, respectively. The likelihood ratio statistic for testing H0 is 2log(L1/L0) where

L1=maxμgxϕ(nx-3(zgx-μgx))1-Φ(nx-3(cgx-μgx))·maxμgyϕ(ny-3(zgy-μgy))1-Φ(ny-3(cgy-μgy)),L0=maxμgϕ(nx-3(zgx-μg))1-Φ(nx-3(cgx-μg))·ϕ(ny-3(zgy-μg))1-Φ(ny-3(cgy-μg))

with cgx and cgy selection thresholds corresponding to zgx and zgy, respectively. However, due to selection, computation of L1 and L0 can be challenging. One alternative method is to use an exposure GWAS and a trait GWAS that are independent of those used to select the SNP. However, such studies may be impractical to obtain6.

Discussion

Summary statistics MR is subject to selection bias, resulting in excessive false positives (for instance, the MR Steiger method) or missed discoveries (for instance, the SMR method). This bias is a form of winner’s curse. Selection bias has been discussed in the literature in the context of the choice of the instrument SNPs7, colocalisation test20, and estimation of exposure effect5,6.

Our work complements previous studies on selection bias due to selection of SNPs. While previous work focused on the effect of this bias on the Wald ratio5,6 (i.e., estimation), ours focuses on testing whether the exposure causally affects the outcome (i.e., inference). Selection bias leads to underestimation of the Wald ratio5 but its effect on type I error rate can be either liberal or conservative depending on the MR method used. Most importantly, the SMR method is conservative even in the absence of selection bias where b^gx is approximately normal.

Correcting for selection bias is a challenging task. Zhao et al.6 get around this issue by using an independent exposure GWAS. On the other hand, our conditional test, an alternative to the SMR method, uses only the trait GWAS. It may be expanded to accommodate multiple instrumental SNPs and the presence of pleiotropy.

Supplementary Information

Acknowledgements

This study was partially supported by National Institutes of Health grant R01MH121394 (to SH).

Author contributions

K.W. conceived the experiment and conducted some simulations, S.H. conducted the simulation study and the empirical study. Both authors drafted and reviewed the manuscript.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-021-87219-6.

References

  • 1.Hemani G, Tilling K, Smith GD. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS Genet. 2017;13:e1007081. doi: 10.1371/journal.pgen.1007081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zhu Z, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 2016;48:481. doi: 10.1038/ng.3538. [DOI] [PubMed] [Google Scholar]
  • 3.Morrison, J., Knoblauch, N., Marcus, J. H., Stephens, M. & He, X. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nat. Genet.52, 740-747 (2020). [DOI] [PMC free article] [PubMed]
  • 4.Bowden J, Dudbridge F. Unbiased estimation of odds ratios: Combining genomewide association scans with replication studies. Genet. Epidemiol. 2009;33:406–418. doi: 10.1002/gepi.20394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Haycock PC, et al. Best (but oft-forgotten) practices: The design, analysis, and interpretation of Mendelian randomization studies. Am. J. Clin. Nutr. 2016;103:965–978. doi: 10.3945/ajcn.115.118216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zhao Q, et al. Statistical inference in two-sample summary-data mendelian randomization using robust adjusted profile score. Ann. Stat. 2020;48:1742–1769. doi: 10.1214/19-AOS1866. [DOI] [Google Scholar]
  • 7.Hemani G, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7:e34408. doi: 10.7554/eLife.34408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Díaz-Francés E, Rubio FJ. On the existence of a normal approximation to the distribution of the ratio of two independent normal random variables. Stat. Pap. 2013;54:309–323. doi: 10.1007/s00362-012-0429-2. [DOI] [Google Scholar]
  • 9.The ARIC investigators The Atherosclerosis Risk in Communities (ARIC) Study: Design and objectives. Am. J. 1989;129:687–702. [PubMed] [Google Scholar]
  • 10.Veturi Y, Ritchie MD. How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures? Pac. Symp. Biocomput. 2018;23:228–239. [PMC free article] [PubMed] [Google Scholar]
  • 11.Pardiñas AF, et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet. 2018;50:381–389. doi: 10.1038/s41588-018-0059-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science362, eaat8464 (2018). [DOI] [PMC free article] [PubMed]
  • 13.Schizophrenia Working Group of the Psychiatric Genomics Consortium and othersSchizophrenia Working Group of the Psychiatric Genomics Consortium and others Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–427. doi: 10.1038/nature13595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ripke S, et al. Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 2013;45:1150. doi: 10.1038/ng.2742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ruderfer DM, et al. Polygenic overlap between schizophrenia risk and antipsychotic response: A genomic medicine approach. Lancet Psychiatry. 2016;3:350–357. doi: 10.1016/S2215-0366(15)00553-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ahlers KE, Chakravarti B, Fisher RA. RGS6 as a novel therapeutic target in CNS diseases and cancer. AAPS J. 2016;18:560–572. doi: 10.1208/s12248-016-9899-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Radulescu, E. et al. Identification and prioritization of gene sets associated with schizophrenia risk by co-expression network analysis in human brain. Mol. Psychiatry 25, 791-804 (2018). [DOI] [PubMed]
  • 18.Han, S. et al. Integrating brain methylome with gwas for psychiatric risk gene discovery. bioRxiv 440206 (2018).
  • 19.Xue H, Pan W. Inferring causal direction between two traits in the presence of horizontal pleiotropy with GWAS summary data. PLoS Genet. 2020;16:e1009105. doi: 10.1371/journal.pgen.1009105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wallace C. Statistical testing of shared genetic control for potentially related traits. Genet. Epidemiol. 2013;37:802–813. doi: 10.1002/gepi.21765. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES