Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2016 Jun 16;99(1):76–88. doi: 10.1016/j.ajhg.2016.05.001

Transethnic Genetic-Correlation Estimates from Summary Statistics

Brielin C Brown 1,; Asian Genetic Epidemiology Network Type 2 Diabetes Consortium, Chun Jimmie Ye 2, Alkes L Price 3, Noah Zaitlen 4
PMCID: PMC5005434  PMID: 27321947

Abstract

The increasing number of genetic association studies conducted in multiple populations provides an unprecedented opportunity to study how the genetic architecture of complex phenotypes varies between populations, a problem important for both medical and population genetics. Here, we have developed a method for estimating the transethnic genetic correlation: the correlation of causal-variant effect sizes at SNPs common in populations. This methods takes advantage of the entire spectrum of SNP associations and uses only summary-level data from genome-wide association studies. This avoids the computational costs and privacy concerns associated with genotype-level information while remaining scalable to hundreds of thousands of individuals and millions of SNPs. We applied our method to data on gene expression, rheumatoid arthritis, and type 2 diabetes and overwhelmingly found that the genetic correlation was significantly less than 1. Our method is implemented in a Python package called Popcorn.

Introduction

Many complex human phenotypes vary dramatically in their distributions between populations as a result of a combination of genetic and environmental differences. For example, northern Europeans are on average taller than southern Europeans,1 and African Americans have an higher rate of hypertension than European Americans.2 Differences in allele frequencies, effect sizes, and genetic architectures drive the genetic contribution to population phenotypic differentiation. Understanding the root causes of phenotypic differences worldwide has profound implications for biomedical and clinical practice in diverse populations, the transferability of epidemiological results, aiding multi-ethnic disease mapping,3, 4 assessing the contribution of non-additive and rare-variant effects, and modeling the genetic architecture of complex traits. In this work, we consider a central question in the global study of phenotype: do genetic variants have the same phenotypic effects in different populations?

Although the vast majority of genome-wide association studies (GWASs) have been conducted in European populations,5, 6 the growing number of non-European and multi-ethnic studies4, 7, 8 provide an opportunity to study distributions of genetic effects across populations. For example, one recent study used mixed-model-based methods to show that the genome-wide genetic correlation of schizophrenia between European and African Americans is nonzero.9 Although powerful, computational costs and privacy concerns limit the utility of genotype-based methods. In this work, we make two significant contributions to studies of transethnic genetic correlation. First, we expand the definition of genetic correlation to better account for a transethnic context. Second, we develop an approach that uses only summary-level GWAS data to estimate genetic correlation across populations. Like other recent methods based on summary statistics,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 our approach supplements summary association data with linkage disequilibrium (LD) information from external reference panels, avoids privacy concerns, and is scalable to hundreds of thousands of individuals and millions of markers. Unlike traditional approaches that focus on the similarity of GWAS results,22, 23, 24, 25, 26 we use the entire spectrum of GWAS associations while accounting for LD to avoid filtering correlated SNPs.

In a single population, the genetic correlation of two phenotypes is defined as the correlation coefficient of SNP effect sizes.19, 27 In multiple populations, differences in allele frequency motivate multiple possible definitions of genetic correlation. Because a variant can have a higher effect size but lower frequency in one population, we consider both the correlation of allele effect sizes and the correlation of allelic impact. We define the transethnic genetic-effect correlation (ρge, previously defined by Lee et al.27 and implemented in Genome-wide Complex Trait Analysis [GCTA]) as the correlation coefficient of the per-allele SNP effect sizes. Similarly, we define the transethnic genetic-impact correlation (ρgi) as the correlation coefficient of the population-specific allele-variance-normalized SNP effect sizes.

Intuitively, the genetic-effect correlation measures the extent to which the same variant has the same phenotypic change, whereas the genetic-impact correlation gives more weight to common alleles than to rare ones separately in each population. Consider the case of a SNP that is rare in population 1 but common in population 2 and has an identical effect size in both populations. In this case, the correlation of effect sizes (genetic-effect correlation, ρge) is 1. This, however, provides an incomplete picture of the relationship between the two populations, given that the allele has a much bigger impact on the distribution of the phenotype in population 2. Therefore, we define the genetic-impact correlation, ρgi, as the correlation of effect sizes after genotypes are normalized to have mean 0 and variance 1. In our hypothetical case, ρgi < ρge, but the opposite can also be true. Consider again the case of a SNP rare in population 1 but common in population 2. If the effect size is large in the first population but small in the second, then ρge might be much less than 1, but the impact of the allele in the two populations will be similar. Therefore, ρgi will be close to 1. Although other definitions of the genetic correlation are possible (see Discussion), these quantities capture two important questions about the study of disease in multiple populations: to what extent do the same mutations in multiple populations differ in their phenotypic effects, and to what extent are these differences mitigated or exacerbated by differences in allele frequency?

To estimate genetic correlation, we take a Bayesian approach wherein we assume genotypes are drawn separately from within each population and effect sizes have a normal prior (the infinitesimal model28). Although this model is unlikely to represent reality, it has been used successfully in practice.9, 17, 18, 29, 30 The infinitesimal assumption yields a multivariate normal distribution on the observed test statistics (Z scores), where the covariance matrix is a function of the heritability and genetic correlation. Rather than pruning SNPs in LD,11, 31, 32 this allows us to explicitly model the resulting inflation of Z scores. We then maximize an approximate weighted likelihood function to find the heritability and genetic correlation. This method is implemented in a Python package called Popcorn. Although it is derived for quantitative phenotypes, Popcorn extends easily to binary phenotypes under the liability threshold model. We show via extensive simulation that Popcorn produces unbiased estimates of the genetic correlation and the population-specific heritabilities with a SE that decreases as the number of SNPs and individuals in the studies increases. Furthermore, we show that our approach is robust to violations of the infinitesimal assumption.

We applied Popcorn to European and Yoruban gene-expression data,33 as well as GWAS summary statistics from European and East Asian cohorts affected by rheumatoid arthritis (RA) and type 2 diabetes (T2D).34, 35 Our analysis of GEUVADIS (Genetic European Variation in Health and Disease) data showed that our summary-statistic-based estimator is concordant with the mixed-model-based estimator. We found that the mean transethnic genetic correlation across all genes was low (ρge = 0.320 [0.009]) but increased substantially when the gene was highly heritable in both populations (ρge = 0.772 [0.017]). In RA and T2D, we found ρge to be 0.463 (0.058) and 0.621 (0.088), respectively.

Across all phenotypes considered, we overwhelmingly found that the transethnic genetic correlation is significantly less than 1. This observation highlights the need to study phenotypes in multiple populations because it implies that, up to the effects of unobserved variants, effect sizes at common SNPs tend to differ between populations. This indicates that results might not transfer between populations, and therefore predicting disease risk in non-Europeans on the basis of current GWAS results could be problematic. Our results provide further evidence that gaining insight into the genetic architecture of complex traits will require a multi-population approach.

Material and Methods

Our method takes as input summary association statistics from two studies of a phenotype in two different populations, along with two sets of reference genotypes each matching one of the populations in the study. Our method has two steps: first, we estimate the diagonal elements of the LD-matrix products Σ12, Σ22, and Σ1Σ2; second, using these estimates, we find the maximum-likelihood values and estimate SEs of the parameters of interest: h12 or h22 and ρge or ρgi. The details follow.

Consider two GWASs conducted on the same phenotype in different populations. Assume we have N1 individuals genotyped on M SNPs in study 1 and N2 individuals genotyped on the same SNPs in study 2. Let X1 and X2 be the matrices of mean-centered genotypes in studies 1 and 2, respectively, and let Y1 and Y2 be their normalized phenotypes. Let f1 and f2 be vectors of the allele frequencies of the M SNPs common to both populations. If we assume Hardy-Weinberg equilibrium within each population separately, the allele variances are σ12=2f1(1f1) and σ22=2f2(1f2). Let β1 and β2 be the (unobserved) per-allele effect sizes for each SNP in studies 1 and 2, respectively. The heritability in study 1 is then h12=Σiσli2βli2 (and likewise for study 2). The objective of this work is to estimate transethnic genetic correlation from summary statistics of common variants, Z1=[(X1/σ1)TY1]/N1 (and likewise for study 2), and estimates of population LD matrices (Σ1 and Σ2) from external reference panels. Define the genetic-effect correlation as ρge = Cor(β1, β2) and the genetic-impact correlation as ρgi = Cor(σ1β1, σ2β2).

We assume that the genotypes are drawn randomly from each population and that phenotypes are generated by the linear model Y1 = X1β1 + ε1 (likewise for phenotype 2). When effect sizes β are assumed to be inversely proportional to allele frequency, as is commonly done,17, 30 we show (Appendix A) that under the linear infinitesimal genetic architecture, the joint distribution of the Z scores from each study is asymptotically multivariate normal with mean 0 and variance

Var(Z)=[Σ1+N1+1Mh12Σ12ρgih12h22N1N2MΣ1Σ2ρgih12h22N1N2MΣ2Σ1Σ2+N2+1Mh22Σ22]. (Equation 1)

However, when effect sizes are assumed to be independent of allele frequency, we show

Var(Z)=[Σ1+N1+1σ121h12Σ1σ12Σ1ρgeh12h22N1N2σ121σ221Σ1σ12σ22Σ2ρgeh12h22N1N2σ121σ221Σ2σ22σ12Σ1Σ2+N2+1σ221h22Σ2σ22Σ2]. (Equation 2)

Given these equations for variance, we can estimate the quantities ρgi or ρge and h12 or h22 by maximizing the multivariate normal likelihood, l(ρg{i,e},h12,h22|Z,Σ,σ)ln(|C|)ZTC1Z, where C is either of the above covariance matrices in Equation 1 or 2. Because Σ1 and Σ2 are estimated from finite external reference panels, estimating the maximum likelihood of the above multivariate normal distribution leads to over-fitting. We employ two optimizations to avoid this problem. First, we maximize an approximate weighted likelihood that uses only the diagonal elements of each block of Var(Z). This allows us to account for the LD-induced inflation of tests statistics, but it discards covariance information between pairs of Z scores and therefore leads to over-counting Z scores of SNPs in high LD. To compensate for this, we downweight Z scores of SNPs in proportion to their LD. Second, rather than compute the full products Σ12, Σ22, and Σ1Σ2 over all M SNPs in the genome, we choose a window size W and approximate the product by (ΣaΣb)ii=w=iWw=i+Wraiwrbiw. These optimizations are similar to those employed by LD-score regression.17 The full details of the derivation and optimization are provided in Appendix A.

Results

Simulated Genotypes and Simulated Phenotypes

Using HAPGEN2,34 we simulated 50,000 European (EUR)-like and 50,000 East Asian (EAS)-like individuals at 248,953 chromosome 1–3 SNPs with an allele frequency above 1% in both EUR and EAS HapMap 3 populations. HAPGEN2 implements a model that combines haplotype recombination with mutation and results in excess local relatedness among the simulated individuals. To account for this local structure, we used PLINK 235 to filter individuals with genetic relatedness above 0.05, resulting in 4,499 EUR-like individuals and 4,837 EAS-like individuals. From these simulated individuals, 500 per population were chosen uniformly at random to serve as an external reference panel for estimating Σ1 and Σ2.

In each simulation, effect sizes were drawn from a “spike and slab” model, where β1i,β2iN(0,[h12ρgeh12h22ρgeh12h22h22]) with probability p and β1i,β2i=(0,0) with probability 1 − p. ρgi were analytically computed from the simulated effect sizes and allele frequencies in the simulated reference genotypes. Quantitative phenotypes were generated under a linear model with independent and identically distributed noise and normalized to have mean 0 and variance 1, whereas binary phenotypes were generated under a liability threshold model where individuals are labeled as case subjects when their liability exceeds a threshold τ=Φ1(1K), in which K is the population disease prevalence.36

We varied h12, h22, ρge, and ρgi, as well as the number of individuals in each study (N1 and N2), the number of SNPs (M), the population prevalence (K), and proportion of causal variants (p) in the simulated GWASs, and generated summary statistics for each study. The results shown in Figure 1 and Figure S1 demonstrate that the estimators are nearly unbiased as the genetic correlation and heritabilities vary. Furthermore, by varying the proportion of causal variants p, we show that our estimator is robust to violations of the infinitesimal assumption (Figure S2). In Figure S3, we show that the SE of the estimator decreases as the number of SNPs and individuals in the study increases. In Figure S4, we simulate data for 4,499 EUR-like individuals and 15,101 EAS-like individuals for a range of genetic correlations to show that our estimators remain nearly unbiased when the sample sizes of the two populations are very different. Finally, we show in Table S1 that our estimates of the heritability of liability in case-control studies are nearly unbiased.

Figure 1.

Figure 1

True and Estimated Genetic-Impact and Genetic-Effect Correlations

All simulations were conducted with a simulated EUR and EAS heritability of 0.5 with 4,499 simulated EUR and 4,836 simulated EAS individuals at 248,953 SNPs.

Simulations with Nonstandard Disease Models

Our approach, as well as genotype-based methods such as GCTA, makes assumptions about the genetic architecture of complex traits. Previous work has shown that violations of these assumptions can lead to bias in heritability estimation;37 therefore, we sought to quantify the extent to which this bias might affect our estimates. We simulated phenotypes under six different disease models: (1) independent, where the effect size is independent of allele frequency; (2) inverse, where the effect size is inversely proportional to allele frequency; (3) rare, where only SNPs with an allele frequency under 10% affect the trait; (4) common, where only SNPs with an allele frequency between 40% and 50% affect the trait; (5) difference, where effect size is proportional to the difference in allele frequency; and (6) adversarial, a difference model where the sign of β is set to increase the phenotype in the population where the allele is most common. Additional genetic architectures, including ones where effect sizes are not a direct function of MAF,38 are possible.

We simulated phenotypes by using genotypes with an allele frequency above 1% or 5% and compared the true and estimated genetic-impact and genetic-effect correlations among all models (Table 1). We found that when only SNPs with a frequency above 5% in both populations were used, the difference in ρge and ρgi was minimal except in the most adversarial cases. Even in the adversarial model, the true difference was only 7%. Although they are unlikely to represent reality, the four nonstandard disease models result in substantial bias in our estimators. When SNPs with an allele frequency above 1% in both populations are included, the differences are more pronounced. This is because the normalizing constant 1/σ rapidly increases as the SNP becomes more rare. Indeed, as SNPs become more rare, having an accurate disease model becomes increasingly important. Therefore, we proceeded with a 5% MAF cutoff in our analysis of real data and used the notation hc2 to refer to the heritability of SNPs with an allele frequency above 5% in both populations (the common-SNP heritability). Note, however, that one of the advantages of maximum-likelihood estimation in general is that the likelihood can be reformulated to mimic the disease model of interest.

Table 1.

True and Estimated Values of Genetic-Impact and Genetic-Effect Correlations in Simulated EUR-like and EAS-like Genotypes

Model MAF > 0.01
MAF > 0.05
ρge ρgi ρˆge ρˆgi ρge ρgi ρˆge ρˆgi
Independent 0.500 0.478 0.500 0.460 0.500 0.488 0.509 0.469
Inverse 0.431 0.500 0.567 0.496 0.479 0.500 0.555 0.482
Rare 0.500 0.467 0.382 0.863 0.500 0.496 0.998 0.756
Common 0.500 0.500 0.522 0.493 0.500 0.500 0.502 0.496
Difference 0.500 0.416 0.354 0.435 0.500 0.461 0.410 0.412
Adversarial 0.710 0.604 0.525 0.651 0.714 0.667 0.601 0.675

Results are the average of 100 simulations with a phenotype heritability of 0.5 in each population.

Validating Popcorn by Using Gene Expression in GEUVADIS

We compared the common-SNP heritability (hc2) and genetic-correlation estimates of Popcorn to those of GCTA in the GEUVADIS dataset, for which raw genotypes are publicly available. GEUVADIS consists of RNA-sequencing (RNA-seq) data for 464 lymphoblastoid cell line (LCL) samples from five populations in the 1000 Genomes Project. Of these, 375 are of European ancestry (CEU [Utah residents with ancestry from northern and western Europe from the CEPH collection], FIN [Finnish in Finland], GBR [British in England and Scotland], and TSI [Toscani in Italia]), and 89 are of African ancestry (YRI [Yoruba in Ibadan, Nigeria]). Raw RNA-seq reads obtained from the European Nucleotide Archive (accession number ENA: ERP001942) were aligned to the transcriptome with hg19 coordinates from the UCSC Genome Browser. RSEM39 was used for estimating the abundances of each annotated isoform, and total gene abundance was calculated as the sum of all isoform abundances normalized to one million total counts or transcripts per million (TPM). For mapping of expression quantitative trait loci (eQTLs), European and Yoruban samples were analyzed separately. For each population, we median normalized TPMs to account for differences in sequencing depth in each sample and standardized to mean 0 and variance 1. Of the 29,763 total genes, 9,350 with TPM > 2 in both populations were chosen for this analysis.

For each gene and using 30 principal components as covariates, we conducted a cis-eQTL association study at all SNPs within 1 Mb of the gene body and with an allele frequency above 5% in both populations. We found that GCTA and Popcorn agreed on the global distribution of heritability (Figure S5) and that GCTA’s estimates of genetic correlation had a similar distribution to Popcorn’s estimates of genetic-effect and genetic-impact correlation (Figure 2). Although the number of SNPs and individuals included in each gene analysis is too small for obtaining accurate point estimates of the genetic correlation on a per-gene basis (N = 464, M = 4279.5), the large number of genes enables accurate estimation of the global mean heritability and genetic correlation.

Figure 2.

Figure 2

The Distributions of the Estimates of Genetic Correlation Computed with Popcorn and GCTA Are Compared

The distribution was computed via Gaussian kernel density estimation on the set of genetic-correlation estimates.

Common-SNP Heritability and Genetic Correlation of Gene Expression in GEUVADIS

We found that the average cis-hc2 of the expression of the genes we analyzed was 0.093 (0.002) in EUR and 0.088 (0.002) in YRI. Our estimates are higher than previously reported average cis-heritability estimates of 0.055 in whole blood and 0.057 in adipose,40 which could have arisen for several reasons. First, we removed 68% of the transcripts that are lowly expressed in either YRI or EUR data. Second, estimates from RNA-seq analysis of cell lines might not be directly comparable to microarray data from tissue.

The average genetic-effect correlation was 0.320 (0.010), whereas the average genetic-impact correlation was 0.313 (0.010). Notably, the genetic correlation increased as the cis-hc2 of expression in both populations increased (Figure 3). In particular, when the cis-hc2 of the gene was at least 0.2 in both populations, the genetic-effect correlation was 0.772 (0.017), whereas the genetic-impact correlation was 0.753 (0.018).

Figure 3.

Figure 3

Genetic Correlation as a Function of Heritability for Gene Expression

The mean and SE of the genetic correlation of the set of genes with h12 and h22 exceeding threshold h in each analysis (y axis) are plotted against h (x axis).

In order to verify that our analysis did not contain any small-sample-size or conditioning biases, we analyzed the genetic correlation of simulated phenotypes over the GEUVADIS genotypes. We sampled pairs of heritabilities from the distribution of estimated expression heritability and simulated pairs of phenotypes to have the given heritability and a genetic-effect correlation of 0.0 over randomly chosen 4,000 bp regions from chromosome 1 of the GEUVADIS genotypes. Without conditioning, the average estimated genetic-effect correlation was −0.002 (0.003), indicating that the estimator remained unbiased. Furthermore, with conditioning on the heritability estimates above a certain threshold, the average estimated genetic-effect correlation was not significantly different from 0.0 (Figure S6).

We found that although the average genetic correlation was low, the genetic correlation increased with the cis-hc2 of the gene, indicating that as cis-genetic regulation of gene expression increases, it does so similarly in both YRI and EUR populations. This could help interpret the recent observation that although the global genetic correlation of gene expression across tissues is low,40 cis-eQTLs tend to replicate across tissues.41 Because the presence of a cis-eQTL indicates substantial cis-genetic regulation, an analysis of eQTL replication across tissues implicitly conditions on a high heritability of gene expression and therefore might indicate a much higher genetic correlation than the average.

Summary Statistics of RA and T2D

Finally, we sought to examine the transethnic ρgi and ρge in RA and T2D cohorts for which raw genotypes are not available. We obtained summary statistics from a RA GWAS of 58,284 individuals of European descent and 22,515 individuals of East Asian descent8 and from T2D GWASs of 69,033 individuals of European descent (DIAGRAM stage 142) and 18,817 individuals of East Asian descent.43 We used genotypes from 504 East Asian and 503 European individuals sequenced as part of the 1000 Genomes Project as population-specific external reference panels for our EAS and EUR summary statistics, respectively. We removed the major histocompatibility region (chromosome 6, 25–35 Mb) from the RA summary statistics. We estimated the common-SNP heritability and genetic correlation by using 2,539,629 strand-unambiguous SNPs genotyped or imputed in both RA studies and 1,054,079 strand-unambiguous SNPs genotyped or imputed in both T2D studies; all SNPs had an allele frequency above 5% in 1000 Genomes EUR and EAS populations. The hc2 and genetic-correlation estimates are presented in Table 2. Our RA hc2 estimates of 0.177 (0.015) and 0.221 (0.026) for EUR and EAS, respectively, are lower than a previously reported mixed-model-based heritability estimate of 0.32 (0.037) in Europeans.45 Similarly, our T2D hc2 estimates of 0.242 (0.013) and 0.105 (0.021) for EUR and EAS, respectively, are lower than a previously reported mixed-model-based estimate of 0.51 (0.065) in Europeans.45 We stress that this discrepancy is most likely due to the correction of genomic control in summary association data (such correction does not affect genetic-correlation estimates19) and the difference between common-SNP heritability hc2 and total narrow-sense heritability h2. Furthermore, estimates of the heritability of T2D from family studies can vary significantly.46, 47

Table 2.

Heritability and Genetic Correlation of RA and T2D between EUR and EAS Populations

hEUR2Liability hEAS2Liability ρge ρgi
RA estimate (SE) 0.18 (0.02) 0.22 (0.03) 0.46 (0.06) 0.46 (0.06)
95% CI [0.15, 0.21] [0.16, 0.28] [0.34, 0.58] [0.34, 0.58]
p > 0 3.90e−32 1.89e−17 1.37e−15 8.16e−16
p < 1 0.0 3.1e−197 2.53e−20 4.87e−22
T2D estimate (SE) 0.24 (0.01) 0.11 (0.02) 0.62 (0.09) 0.61 (0.08)
95% CI [0.22, 0.26] [0.07, 0.15] [0.44, 0.80] [0.45, 0.77]
p > 0 2.41e−77 5.73e−7 1.70e−12 2.85e−13
p < 1 0.0 0.0 1.066e−5 2.06e−6

EUR RA data contained 8,875 case and 29,367 control subjects for a study prevalence of 0.23. EAS RA data contained 4,873 case and 17,642 control subjects for a study prevalence of 0.22. RA prevalence was assumed to be 0.5% in both populations.8 T2D EUR data contained 12,171 case and 56,862 control subjects for a study prevalence of 0.18. T2D EAS data contained 6,952 case and 11,865 control subjects for a study prevalence of 0.37. T2D EUR prevalence was assumed to be 8%,42 whereas T2D EAS prevalence was assumed to be 9%.44 CI, confidence interval.

We found the genetic-effect correlation in RA and T2D to be 0.463 (0.058) and 0.621 (0.088), respectively, and the genetic-impact correlation was not significantly different at 0.455 (0.056) and 0.606 (0.083), respectively. The transethnic genetic-impact and genetic-effect correlations for both phenotypes were significantly different from both 1 and 0 (Table 2), showing that although the phenotypes have clear genetic overlap, the per-allele effect sizes differ significantly between the two populations.

Summary Statistics of Height and BMI

To further validate that our observations were not a statistical artifact, we used Popcorn to estimate the genetic correlation of one trait in one population across studies and compared it with those of GCTA and LDSC (LD Score). We obtained sex-stratified summary statistics of height and BMI from the GIANT consortium48 and used Popcorn and LDSC to estimate the genetic correlation of height and BMI. Values for GCTA were taken from Yang al.49 Scores for Popcorn and LDSC were computed from variants with an allele frequency above 5% in 1000 Genomes European-descent individuals, and genetic correlation was computed with all strand-unambiguous variants with an allele frequency above 5% in HapMap 3 (these are supplied with the summary statistics). Popcorn’s sex-stratified genetic correlations of height and BMI were not significantly different from 1.0 or from those of LDSC or GCTA (Table S2).

Discussion

We have developed transethnic genetic-effect and genetic-impact correlations and provided a method for estimating these quantities on the basis of only summary-level GWAS information and suitable reference panels. We have applied our estimator to several phenotypes: RA, T2D, and gene expression. Although the GEUVADIS dataset lacks enough power for inferring the genetic correlation of single or small subsets of genes, we can make inferences about the global structure of genetic correlation of gene expression. We have found that the global mean genetic correlation is low but that it increases substantially when the heritability is high in both populations. In all phenotypes analyzed, the genetic correlation was significantly different from both 0 and 1. Our results show that global differences in SNP effect size of complex traits can be large. In contrast, effect sizes of gene expression appear to be more conserved where there is strong genetic regulation.

It is not possible to draw conclusions about polygenic selection from estimates of transethnic genetic correlation. The effect sizes can be identical (ρge=1) while polygenic selection acts to change only the allele frequencies. Similarly, the effect sizes can be different (ρge<1) without selection. Differences in effect sizes at common SNPs can result from many phenomena. We expect that untyped and unimputed variants that are differentially linked to observed SNPs, along with rare or population-specific variants differentially linked to observed SNPs, will contribute significantly. If a gene-gene or gene-environment interaction exists but only marginal effects are tested, the observed marginal effects could be different in each population as a result of differences in allele frequency, even if the interaction effect is the same in both populations, and this will result in decreased genetic correlation. Although within-locus (dominance) interactions might also play a role,50 the magnitude of this effect has been debated.51 Statistical noise could also be to blame, given that within-population meta-analyses of the same trait do not always show identical effects; however, we estimated the sex-stratified genetic correlation of height and BMI and found that neither was significantly different from 1, which agrees with previous results.49 We emphasize that we cannot differentiate between these effects on the basis of this analysis alone, and establishing how much each of these effects contributes to inter-population differences in effect size will require further research.

Estimates of the transethnic genetic correlation are important for several reasons. They might help inform best practices for transethnic meta-analysis and potentially offer improvements over current methods that use FST to cluster populations for analysis.4 Further, the transethnic genetic correlation constrains the limit of out-of-sample phenotype predictive power. If the maximum within-population correlation of predicted phenotype P to true phenotype Y is ρYPmax=h12, then the maximum out-of-population correlation is ρYPmax=ρgeh12 (Appendix A). Our observation that the genetic correlation is low for RA, T2D, and gene expression shows that out-of-population phenotypic predictive power is quite low. Similarly, it implies that assessing disease risk in non-Europeans on the basis of current GWAS results might be problematic; gaining insight into differences in genetic architecture and improving risk assessment will necessitate increased study of disease in many populations.

Although the genetic correlation of multiple phenotypes in one population has a relatively straightforward definition, extending this to multiple populations motivates multiple possible extensions. In this work, we have provided estimators for the correlation of genetic effect and genetic impact, but other quantities related to the shared genetics of complex traits between populations include the correlation of explained variance, ρge=Cor(σ12β12,σ22β22), and the proportion of causal variants shared by the two populations. Interestingly, although our goal was to construct an estimator that determines the extent of genetic sharing independently of allele frequency, we have observed that the correlations of genetic effect and genetic impact are similar. Furthermore, our simulations show that under a random-effects model utilizing only SNPs with an allele frequency above 5% in both populations, the true genetic-effect and genetic-impact correlations are similar. We conclude that at variants common in both populations, differences in effect size and not allele frequency are driving the transethnic phenotypic differences in these traits.

Our approach to estimating genetic correlation has two major advantages over mixed-model-based approaches. First, utilizing summary statistics allows application of the method without data-sharing and privacy concerns that come with raw genotypes. Second, our approach is linear in the number of SNPs and thus avoids the computational bottleneck required for estimating the genetic relationship matrix. Conceptually, our approach is very similar to that taken by LD-score regression. Indeed, the diagonal of the LD-matrix product in one population is exactly the LD scores (1ii2=li). One could ignore our likelihood-based approach and define cross-population scores as ci=mr1imr2im in order to exploit the linear relationship E[Z1iZ2i]=(N1N2/M)ρgih12h22ci (a similar approach can be taken for the genetic-effect correlation). Given that LD-score regression has been successfully used for computing the genetic correlation of two phenotypes in a single population, this derivation can be viewed as an extension of LD-score regression to one phenotype in two different populations. The main difference in our approach is choosing maximum likelihood rather than regression in order to fit the model. A comparison of our method to the LDSC software shows that they perform similarly as heritability estimators (Figure S7).

Of course, our method is not without drawbacks. First, it requires a large sample size and large number of SNPs to achieve SEs low enough for accurate estimation. Until recently, large-sample GWASs have been rare in non-European populations, although they are becoming more common. Similarly, the quality of reference panels could suffer in non-European populations, which could affect downstream analysis.52 Second, our method is limited to analyzing relatively common SNPs, both because having an accurate disease model is important for the analysis of rare variants and because estimates of effect size and correlation coefficients have a high SE at rare SNPs.17 Third, our analysis is currently limited to SNPs that are present in both populations. Indeed, it is currently unclear how best to handle population-specific variants in this framework. Fourth, our estimator of ρ is bounded between −1 and 1. This could induce bias when the true value is close to the boundary and the sample size is small. Fifth and finally, admixed populations induce very long-range LD that is not accounted for in our approach, and we are therefore limited to unadmixed populations.17

Our analysis leaves open several avenues for future work. We approximately maximize the likelihood of an M×M multivariate normal distribution via a method that uses only the diagonal elements of each block and discards covariance information between Z scores. A better approximation might lower the SE of the estimator, facilitating an analysis of the genetic correlation of functional categories, pathways, and genetic regions. We would also like to extend our analysis to include population-specific variants and variants with frequencies from 1% to 5% or lower than 1%. Our simulations indicate that having an accurate disease model is important for determining the difference between genetic-effect and genetic-impact correlations when rare variants are included. Maximum-likelihood approaches are well suited to different genetic architectures. For example, one could estimate both the global relationship between allele frequency and effect size and the global relationship between per-SNP FST and genetic correlation by incorporating parameters α and ϒ into the prior distribution of the effect sizes:

β1i,β2iN(0,[h12σ1iαρgeh12h22FSTiγρgeh12h22FSTiγh22σ1iα]).

We expect that incorporating these parameters will improve estimates of heritability and genetic correlation while revealing important biological insights.

Acknowledgments

The authors would like to acknowledge Lior Pachter, Hilary Finucane, and Yukinori Okada for insightful discussion about the problem. B.C.B. is supported by the National Science Foundation Graduate Research Fellowship Program. A.L.P. is supported by NIH grant R01 HG006399. N.Z. is supported by NIH grant K25HL121295.

Published: June 16, 2016

Footnotes

Supplemental Data include six figures and one table and can be found with this article online at http://dx.doi.org/10.1016/j.ajhg.2016.05.001.

Appendix A

Consider two GWASs of a phenotype conducted in different populations. Assume we have N1 individuals genotyped or imputed to M SNPs in study 1 and N2 individuals genotyped or imputed to M SNPs in study 2. Let X1 and X2 and Y1 and Y2 be the matrices of mean-centered genotypes and phenotypes, respectively, of the individuals in studies 1 and 2, respectively. Let f1 and f2 be the allele frequencies of the M SNPs common to both populations. If we assume Hardy-Weinberg equilibrium, the allele variances are σ12=2f1(1f1) and σ22=2f2(1f2). Let β1 and β2 be the (unobserved) per-allele effect size for each SNP in studies 1 and 2, respectively. Define the genetic-impact correlation as ρgi=Cor(σ12β1,σ22β2) and the genetic-effect correlation as ρge=Cor(β1,β2). We present a maximum-likelihood framework for estimating the heritability of the phenotype in study 1 and its SE, the heritability of the phenotype in study 2 and its SE, and the genetic-effect and genetic-impact correlations of the phenotype between the studies and their SEs given only the summary statistics Z1 and Z2 and reference genotypes G1 and G2 representing the populations in the studies. We assume that genotypes are drawn randomly from populations with expected correlation matrices Σ1 and Σ2 and that every SNP is causal with a normally distributed effect size (although this assumption is not necessary in practice; see Figure S1).

Genetic-Impact Correlation

Let X1=X1/σ12 (and similarly for study 2) be normalized genotype matrices. We consider the standard linear model for the generation of phenotypes, where Y1=X1β1+ε1 and Y2=X2β2+ε2.

For convenience of notation, let hix=ρgih12h22. We assume that the SNP effects follow the infinitesimal model, where every SNP has an effect size drawn from the normal distribution, and that the residuals are independent for each individual and normally distributed:

(β1β2)N([00],1M[h12IMhixIMhixIMh22IM]) (Equation A1)
(ε1ε2)N([00],[(1h12)IM00(1h22)IM]), (Equation A2)

where h12 and h22 are the heritability of the disease in studies 1 and 2, respectively, and ρgi is the genetic-impact correlation.

Using the above model, we compute the distribution of the observed Z scores as a function of the reference-panel correlations and the model parameters (h12, h22, and ρgi). Given a distribution for Z and an observation of Z, we can then choose parameters that give the highest probability of observing Z. First, we compute the distribution of Z. It is well known that the Z scores of a linear regression are normally distributed given β when the sample size is large enough. Because (Z)(Z|β)(β) and the product of normal distributions is normal, we need to compute only the unconditional mean and variance of Z to know its distribution. Specifically, let Z=[Z1T,Z2T]T. Then, its mean is

E[Z]=E[X1TY1N1X2TY2N2]=[1N1(E[X1TX1]E[β1]+E[X1T]E[ε1])1N2(E[X2TX2]E[β2]+E[X2T]E[ε2])]=0.

The within-population variance is

Cov[Z1i,Z1j]=E[Z1iZ1j]=EX,β,ε[E[Z1iZ1j|X,β,ε]]
=1N1EX,β,ε[X1i′T(X1β1+ε1)(X1β1+ε1)TX1j]
=1N1EX,β[X1i′TX1β1β1TX1TX2j]+1N1EX,ε[X1i′Tε1ε1TX1j]
=h12MN1EX[X1i′TX1X1′TX1j]+1h12N1EX[X1i′TX1j]
=h12MN1(N1Mr1ij+N1m=1Mr1imr1jm+N12m=1Mr1imr1jm)+1h12N1r1ij
=r1ij+N1+1Mh12Σ1(i)Σ1(j),

where rpij=Σpij is the correlation coefficient of SNP i and j in population p. Similarly, the between-population variance is

Cov[Z1i,Z2j]=1N1N2EX,β[X1i′TX1β1β2X2TX2j]+1N1N2EX,ε[X1iε1ε2X2j]
=hixMN1N2EX[X1iX1X2TX2j]
=hixMN1N2(N1N2m=1Mr1imr2jm)
=N1N2MhixΣ1(i)Σ2(j),

where Σ(i) denotes the ith row of Σ, and Σ(j) denotes the jth column. The covariance of the Z scores is thus

C=Var(Z)=[Σ1+N1+1Mh12Σ12hixN1N2MΣ1Σ2hixN1N2MΣ2Σ1Σ2+N2+1Mh22Σ22] (Equation A3)

and ZN(0,C).

Genetic-Effect Correlation

Let hex=ρgeh12h22. We modify the procedure above to use mean-centered instead of normalized genotype matrices and model the distribution of the effect sizes as

(β1β2)N([00],[h12σ121IMhexσ121σ221IMhexσ121σ221IMh22σ221IM]). (Equation A4)

Notice that a linear model with effect sizes acting on unnormalized genotypes is the same as a linear model with effect sizes acting on normalized genotypes under the substitution β1,2σ1,22β1,2. Therefore, the covariance of Z scores on the per-allele scale can be immediately inferred from the prior derivation:

C=Var(Z)=[Σ1+N1+1σ121h12Σ1σ12Σ1hexN1N2σ121σ221Σ1σ12σ22Σ2hexN1N2σ121σ221Σ2σ22σ12Σ1Σ2+N2+1σ221h22Σ2σ22Σ2].

Approximate Maximum-Likelihood Estimation

Let C=[C11C12C21C22] be either of the above covariance matrices written in block form. We approximately optimize the above likelihood as follows: first, we find h12 and h22 by maximizing the likelihood corresponding to C11 and C22, and then we find ρgi or ρge by maximizing the likelihood corresponding to C12:

l(h12|Z1,Σ,σ)i=1Mw11i(ln(C11ii)+Z1i2C11ii)
l(h22|Z2,Σ,σ)i=1Mw22i(ln(C22ii)+Z2i2C22ii)
l(ρg{i,e}|Z,h12ˆ,h22ˆ,Σ,σ)i=1Mw12i(ln(C12ii)+Z1iZ2iC12ii).

Because we are discarding between-SNP covariance information (Cov(Z1i,Z1j)), highly correlated SNPs will be over-counted in our approximate likelihood. As a simple example, notice that two SNPs in perfect LD will each contribute identical terms to the approximate likelihood and therefore should be downweighted by a factor of 1/2. The extent to which SNP i is over-counted is exactly the ith entry in its corresponding LD-matrix product. Therefore, we let wjkigi=1/(ΣjΣk)ii and wjkige=1/(Σjσj2σk2Σk)ii to reduce the variance in our estimates of the parameters h12, h22, ρgi, and ρge.

Furthermore, rather than compute the full products Σ12, Σ22, and Σ1Σ2 over all M SNPs in the genome, we choose a window size W and approximate the product by (ΣaΣb)ii=w=iWw=i+Wraiwrbiw. Although maximum-likelihood estimation admits a straightforward estimate of the SE via the fisher information, we found these estimates to be inaccurate in practice. Instead, we use a block jackknife with a block size equal to min (100,(M/200)) SNPs to ensure that blocks are large enough for the removal of residual correlations.

Out-of-Population Prediction of Phenotypic Values

Consider using the results of a GWAS with perfect power in population 2 to predict the phenotypic values of a set of individuals from population 1. This defines the upper limit of the correlation of true and predicted phenotypic values. Let the true values of the effect sizes in population 2 be β2. Let the true phenotypes in population 1 be Y=X1β1+ε1 and the predicted phenotypes be P = X1β2. We are interested in the correlation of the predicted and true phenotypes ρYPMAX=Cor(Y,P). Notice that given X, the true and predicted phenotype of each individual is an affine transformation of a multivariate normal random variable (β):

[YiPi]=[X(i)0M0MX(i)][β1β2]+[εi0].

Therefore, (Yi, Pi) for individual i is multivariate normal with the expected covariance matrix

EX[Cov(Yi,Pi)]=EX[X(i)0M0MX(i)][1σ121IMhexσ121σ221IMhexσ121σ221IMh22σ221IM][X(i)0M0MX(i)]T
=EX[mXim2σ121hexmXim2σ121σ221hexmXim2σ121σ221h22mXim2σ221]
=[1hexσ121σ221hexσ121σ221h22σ121σ221].

Therefore, the expected correlation E[Cor(Yi,Pi)] is

hexh22σ121σ221σ221σ121=ρgeh12.

The expected population correlation tends to the sample correlation as the number of samples increases; therefore,

ρYPMAX=Cor(Y,P)ρgeh12 (Equation A5)

as N.

Web Resources

Supplemental Data

Document S1. Figures S1–S6 and Table S1
mmc1.pdf (927.6KB, pdf)
Document S2. Article plus Supplemental Data
mmc2.pdf (1.6MB, pdf)

References

  • 1.Robinson M.R., Hemani G., Medina-Gomez C., Mezzavilla M., Esko T., Shakhbazov K., Powell J.E., Vinkhuyzen A., Berndt S.I., Gustafsson S. Population genetic differentiation of height and body mass index across Europe. Nat. Genet. 2015;47:1357–1362. doi: 10.1038/ng.3401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Burt V.L., Whelton P., Roccella E.J., Brown C., Cutler J.A., Higgins M., Horan M.J., Labarthe D. Prevalence of hypertension in the US adult population. Results from the Third National Health and Nutrition Examination Survey, 1988-1991. Hypertension. 1995;25:305–313. doi: 10.1161/01.hyp.25.3.305. [DOI] [PubMed] [Google Scholar]
  • 3.Coram M.A., Candille S.I., Duan Q., Chan K.H.K., Li Y., Kooperberg C., Reiner A.P., Tang H. Leveraging Multi-ethnic Evidence for Mapping Complex Traits in Minority Populations: An Empirical Bayes Approach. Am. J. Hum. Genet. 2015;96:740–752. doi: 10.1016/j.ajhg.2015.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Morris A.P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 2011;35:809–822. doi: 10.1002/gepi.20630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bustamante C.D., Burchard E.G., De la Vega F.M. Genomics for the world. Nature. 2011;475:163–165. doi: 10.1038/475163a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Oh S.S., Galanter J., Thakur N., Pino-Yanes M., Barcelo N.E., White M.J., de Bruin D.M., Greenblatt R.M., Bibbins-Domingo K., Wu A.H.B. Diversity in Clinical and Biomedical Research: A Promise Yet to Be Fulfilled. PLoS Med. 2015;12:e1001918. doi: 10.1371/journal.pmed.1001918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Coronary Artery Disease (C4D) Genetics Consortium A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nat. Genet. 2011;43:339–344. doi: 10.1038/ng.782. [DOI] [PubMed] [Google Scholar]
  • 8.Okada Y., Wu D., Trynka G., Raj T., Terao C., Ikari K., Kochi Y., Ohmura K., Suzuki A., Yoshida S., RACI consortium. GARNET consortium Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature. 2014;506:376–381. doi: 10.1038/nature12873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.de Candia T.R., Lee S.H., Yang J., Browning B.L., Gejman P.V., Levinson D.F., Mowry B.J., Hewitt J.K., Goddard M.E., O’Donovan M.C., International Schizophrenia Consortium. Molecular Genetics of Schizophrenia Collaboration Additive genetic variation in schizophrenia risk is shared by populations of African and European descent. Am. J. Hum. Genet. 2013;93:463–470. doi: 10.1016/j.ajhg.2013.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lee S., Teslovich T.M., Boehnke M., Lin X. General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet. 2013;93:42–53. doi: 10.1016/j.ajhg.2013.05.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Palla L., Dudbridge F. A Fast Method that Uses Polygenic Scores to Estimate the Variance Explained by Genome-wide Marker Panels and the Proportion of Variants Affecting a Trait. Am. J. Hum. Genet. 2015;97:250–259. doi: 10.1016/j.ajhg.2015.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yang J., Ferreira T., Morris A.P., Medland S.E., Madden P.A.F., Heath A.C., Martin N.G., Montgomery G.W., Weedon M.N., Loos R.J., Genetic Investigation of ANthropometric Traits (GIANT) Consortium. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 2012;44:369–375, S1–S3. doi: 10.1038/ng.2213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Pasaniuc B., Zaitlen N., Shi H., Bhatia G., Gusev A., Pickrell J., Hirschhorn J., Strachan D.P., Patterson N., Price A.L. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics. 2014;30:2906–2914. doi: 10.1093/bioinformatics/btu416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hormozdiari F., Kostem E., Kang E.Y., Pasaniuc B., Eskin E. Identifying causal variants at loci with multiple signals of association. Genetics. 2014;198:497–508. doi: 10.1534/genetics.114.167908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hormozdiari F., Kichaev G., Yang W.-Y., Pasaniuc B., Eskin E. Identification of causal genes for complex traits. Bioinformatics. 2015;31:i206–i213. doi: 10.1093/bioinformatics/btv240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kichaev G., Yang W.-Y., Lindstrom S., Hormozdiari F., Eskin E., Price A.L., Kraft P., Pasaniuc B. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 2014;10:e1004722. doi: 10.1371/journal.pgen.1004722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bulik-Sullivan B.K., Loh P.-R., Finucane H.K., Ripke S., Yang J., Patterson N., Daly M.J., Price A.L., Neale B.M., Schizophrenia Working Group of the Psychiatric Genomics Consortium LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.-R., Anttila V., Xu H., Zang C., Farh K., ReproGen Consortium. Schizophrenia Working Group of the Psychiatric Genomics Consortium. RACI Consortium Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bulik-Sullivan B., Finucane H.K., Anttila V., Gusev A., Day F.R., Loh P.-R., Duncan L., Perry J.R., Patterson N., Robinson E.B., ReproGen Consortium. Psychiatric Genomics Consortium. Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3 An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Park D.S., Brown B., Eng C., Huntsman S., Hu D., Torgerson D.G., Burchard E.G., Zaitlen N. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses. Bioinformatics. 2015;31:i181–i189. doi: 10.1093/bioinformatics/btv230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Xu Z., Duan Q., Yan S., Chen W., Li M., Lange E., Li Y. DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics. 2015;31:2434–2442. doi: 10.1093/bioinformatics/btv168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Purcell S.M., Wray N.R., Stone J.L., Visscher P.M., O’Donovan M.C., Sullivan P.F., Sklar P., International Schizophrenia Consortium Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. doi: 10.1038/nature08185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zuo L., Zhang C.K., Wang F., Li C.-S.R., Zhao H., Lu L., Zhang X.-Y., Lu L., Zhang H., Zhang F. A novel, functional and replicable risk gene region for alcohol dependence identified by genome-wide association study. PLoS ONE. 2011;6:e26726. doi: 10.1371/journal.pone.0026726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Fesinmeyer M.D., North K.E., Ritchie M.D., Lim U., Franceschini N., Wilkens L.R., Gross M.D., Bůžková P., Glenn K., Quibrera P.M. Genetic risk factors for BMI and obesity in an ethnically diverse population: results from the population architecture using genomics and epidemiology (PAGE) study. Obesity (Silver Spring) 2013;21:835–846. doi: 10.1002/oby.20268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Chang M.H., Ned R.M., Hong Y., Yesupriya A., Yang Q., Liu T., Janssens A.C.J.W., Dowling N.F. Racial/ethnic variation in the association of lipid-related genetic variants with blood lipids in the US adult population. Circ Cardiovasc Genet. 2011;4:523–533. doi: 10.1161/CIRCGENETICS.111.959577. [DOI] [PubMed] [Google Scholar]
  • 26.Waters K.M., Stram D.O., Hassanein M.T., Le Marchand L., Wilkens L.R., Maskarinec G., Monroe K.R., Kolonel L.N., Altshuler D., Henderson B.E., Haiman C.A. Consistent association of type 2 diabetes risk variants found in europeans in diverse racial and ethnic groups. PLoS Genet. 2010;6:e1001078. doi: 10.1371/journal.pgen.1001078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lee S.H., Yang J., Goddard M.E., Visscher P.M., Wray N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics. 2012;28:2540–2542. doi: 10.1093/bioinformatics/bts474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Falconer D.S., Mackay T.F.C. Longman; Essex, England: 1996. Introduction to quantitative genetics. [Google Scholar]
  • 29.Loh P.-R., Tucker G., Bulik-Sullivan B.K., Vilhjálmsson B.J., Finucane H.K., Salem R.M., Chasman D.I., Ridker P.M., Neale B.M., Berger B. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 2015;47:284–290. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Yang J., Benyamin B., McEvoy B.P., Gordon S., Henders A.K., Nyholt D.R., Madden P.A., Heath A.C., Martin N.G., Montgomery G.W. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.So H.-C., Li M., Sham P.C. Uncovering the total heritability explained by all true susceptibility variants in a genome-wide association study. Genet. Epidemiol. 2011;35:447–456. doi: 10.1002/gepi.20593. [DOI] [PubMed] [Google Scholar]
  • 32.Vattikuti S., Guo J., Chow C.C. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet. 2012;8:e1002637. doi: 10.1371/journal.pgen.1002637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.’t Hoen P.A.C., Friedländer M.R., Almlöf J., Sammeth M., Pulyakhina I., Anvar S.Y., Laros J.F.J., Buermans H.P.J., Karlberg O., Brännvall M., GEUVADIS Consortium Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 2013;31:1015–1022. doi: 10.1038/nbt.2702. [DOI] [PubMed] [Google Scholar]
  • 34.Su Z., Marchini J., Donnelly P. HAPGEN2: simulation of multiple disease SNPs. Bioinformatics. 2011;27:2304–2305. doi: 10.1093/bioinformatics/btr341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Chang C.C., Chow C.C., Tellier L.C., Vattikuti S., Purcell S.M., Lee J.J. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Lee S.H., Wray N.R., Goddard M.E., Visscher P.M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 2011;88:294–305. doi: 10.1016/j.ajhg.2011.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Speed D., Hemani G., Johnson M.R., Balding D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 2012;91:1011–1021. doi: 10.1016/j.ajhg.2012.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Yang J., Bakshi A., Zhu Z., Hemani G., Vinkhuyzen A.A.E., Lee S.H., Robinson M.R., Perry J.R.B., Nolte I.M., van Vliet-Ostaptchouk J.V., LifeLines Cohort Study Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 2015;47:1114–1120. doi: 10.1038/ng.3390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Li B., Dewey C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Price A.L., Helgason A., Thorleifsson G., McCarroll S.A., Kong A., Stefansson K. Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet. 2011;7:e1001317. doi: 10.1371/journal.pgen.1001317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Gaffney D.J. Global properties and functional complexity of human gene regulatory variation. PLoS Genet. 2013;9:e1003501. doi: 10.1371/journal.pgen.1003501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Morris A.P., Voight B.F., Teslovich T.M., Ferreira T., Segrè A.V., Steinthorsdottir V., Strawbridge R.J., Khan H., Grallert H., Mahajan A., Wellcome Trust Case Control Consortium. Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) Investigators. Genetic Investigation of ANthropometric Traits (GIANT) Consortium. Asian Genetic Epidemiology Network–Type 2 Diabetes (AGEN-T2D) Consortium. South Asian Type 2 Diabetes (SAT2D) Consortium. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 2012;44:981–990. doi: 10.1038/ng.2383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Cho Y.S., Chen C.-H., Hu C., Long J., Ong R.T., Sim X., Takeuchi F., Wu Y., Go M.J., Yamauchi T., DIAGRAM Consortium. MuTHER Consortium Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians. Nat. Genet. 2012;44:67–72. doi: 10.1038/ng.1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ma R.C.W., Chan J.C.N. Type 2 diabetes in East Asians: similarities and differences with populations in Europe and the United States. Ann. N Y Acad. Sci. 2013;1281:64–91. doi: 10.1111/nyas.12098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Stahl E.A., Wegmann D., Trynka G., Gutierrez-Achury J., Do R., Voight B.F., Kraft P., Chen R., Kallberg H.J., Kurreeman F.A.S., Diabetes Genetics Replication and Meta-analysis Consortium. Myocardial Infarction Genetics Consortium Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 2012;44:483–489. doi: 10.1038/ng.2232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Marenberg M.E., Risch N., Berkman L.F., Floderus B., de Faire U. Genetic susceptibility to death from coronary heart disease in a study of twins. N. Engl. J. Med. 1994;330:1041–1046. doi: 10.1056/NEJM199404143301503. [DOI] [PubMed] [Google Scholar]
  • 47.Nora J.J., Lortscher R.H., Spangler R.D., Nora A.H., Kimberling W.J. Genetic--epidemiologic study of early-onset ischemic heart disease. Circulation. 1980;61:503–508. doi: 10.1161/01.cir.61.3.503. [DOI] [PubMed] [Google Scholar]
  • 48.Randall J.C., Winkler T.W., Kutalik Z., Berndt S.I., Jackson A.U., Monda K.L., Kilpeläinen T.O., Esko T., Mägi R., Li S., DIAGRAM Consortium. MAGIC Investigators Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet. 2013;9:e1003500. doi: 10.1371/journal.pgen.1003500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Yang J., Bakshi A., Zhu Z., Hemani G., Vinkhuyzen A.A.E., Nolte I.M., van Vliet-Ostaptchouk J.V., Snieder H., Esko T., Milani L., Lifelines Cohort Study Genome-wide genetic homogeneity between sexes and populations for human height and body mass index. Hum. Mol. Genet. 2015;24:7445–7449. doi: 10.1093/hmg/ddv443. [DOI] [PubMed] [Google Scholar]
  • 50.Chen X., Kuja-Halkola R., Rahman I., Arpegård J., Viktorin A., Karlsson R., Hägg S., Svensson P., Pedersen N.L., Magnusson P.K.E. Dominant Genetic Variation and Missing Heritability for Human Complex Traits: Insights from Twin versus Genome-wide Common SNP Models. Am. J. Hum. Genet. 2015;97:708–714. doi: 10.1016/j.ajhg.2015.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Zhu Z., Bakshi A., Vinkhuyzen A.A.E., Hemani G., Lee S.H., Nolte I.M., van Vliet-Ostaptchouk J.V., Snieder H., Esko T., Milani L., LifeLines Cohort Study Dominance genetic variation contributes little to the missing heritability for human complex traits. Am. J. Hum. Genet. 2015;96:377–385. doi: 10.1016/j.ajhg.2015.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Marchini J., Howie B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 2010;11:499–511. doi: 10.1038/nrg2796. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S6 and Table S1
mmc1.pdf (927.6KB, pdf)
Document S2. Article plus Supplemental Data
mmc2.pdf (1.6MB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES