Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2020 May 21;106(6):805–817. doi: 10.1016/j.ajhg.2020.04.012

Localizing Components of Shared Transethnic Genetic Architecture of Complex Traits from GWAS Summary Data

Huwenbo Shi 1,2,3,11,, Kathryn S Burch 1,11,∗∗, Ruth Johnson 4, Malika K Freund 5, Gleb Kichaev 1, Nicholas Mancuso 6, Astrid M Manuel 7, Natalie Dong 8, Bogdan Pasaniuc 1,5,9,10
PMCID: PMC7273527  PMID: 32442408

Abstract

Despite strong transethnic genetic correlations reported in the literature for many complex traits, the non-transferability of polygenic risk scores across populations suggests the presence of population-specific components of genetic architecture. We propose an approach that models GWAS summary data for one trait in two populations to estimate genome-wide proportions of population-specific/shared causal SNPs. In simulations across various genetic architectures, we show that our approach yields approximately unbiased estimates with in-sample LD and slight upward-bias with out-of-sample LD. We analyze nine complex traits in individuals of East Asian and European ancestry, restricting to common SNPs (MAF > 5%), and find that most common causal SNPs are shared by both populations. Using the genome-wide estimates as priors in an empirical Bayes framework, we perform fine-mapping and observe that high-posterior SNPs (for both the population-specific and shared causal configurations) have highly correlated effects in East Asians and Europeans. In population-specific GWAS risk regions, we observe a 2.8× enrichment of shared high-posterior SNPs, suggesting that population-specific GWAS risk regions harbor shared causal SNPs that are undetected in the other GWASs due to differences in LD, allele frequencies, and/or sample size. Finally, we report enrichments of shared high-posterior SNPs in 53 tissue-specific functional categories and find evidence that SNP-heritability enrichments are driven largely by many low-effect common SNPs.

Keywords: transethnic, GWAS, linkage disequilibrium, fine-mapping, complex traits, PRS, polygenicity, ancestry

Introduction

Genetic and phenotypic variations among humans have been shaped by many factors, including migration histories, geodemographic events, and environmental background.1, 2, 3, 4, 5 As a result, the underlying genetic architecture of a given complex trait—defined here in terms of “polygenicity” (the number of variants with nonzero effects)6, 7, 8, 9, 10 and the coupling of causal effect sizes with minor allele frequency (MAF),11,12 linkage disequilibrium (LD),13, 14, 15 and other genomic features16—varies among ancestral populations. While the vast majority of genome-wide association studies (GWASs) to date have been performed in individuals of European descent,17, 18, 19, 20 growing numbers of studies performed in individuals of non-European ancestry21, 22, 23, 24, 25, 26, 27 have created opportunities for well-powered transethnic genetic studies.21,22,24,26,28, 29, 30, 31, 32, 33

Risk regions identified through GWASs tend to replicate across populations,17,21,22,33, 34, 35 indicating that complex traits have genetic components that are shared among populations. Indeed, for certain post-GWAS analyses such as disease mapping23,31,36 and statistical fine-mapping,28,37, 38, 39, 40 under the assumption that two populations share one or more causal variants, population-specific LD patterns can be leveraged to improve performance over approaches that model a single population. On the other hand, several studies have shown that heterogeneity in genetic architectures limits transferability of polygenic risk scores (PRSs) across populations;5,41, 42, 43, 44, 45, 46, 47, 48 critically, if applied in a clinical setting, existing PRSs may exacerbate health disparities among ethnic groups.49 The population specificity of existing PRSs as well as estimates of transethnic genetic correlations less than one reported in the literature30,50, 51, 52, 53 indicate that (1) LD tagging and allele frequencies of shared causal variants vary across populations, (2) that a sizeable number of causal variants are population specific, and/or (3) that causal effect sizes vary across populations due to, for example, different gene-environment interactions. In a region with population-specific LD, a single genetic variant that is significantly associated with a trait in two populations may actually be tagging distinct population-specific causal variants (Figure 1). Conversely, two distinct associations in two populations may be driven by the same underlying causal variants (i.e., colocalization). Thus, identifying shared and population-specific components of genetic architecture could help improve transethnic analyses (e.g., transferability of PRSs across populations19,41,42,45,46) and uncover novel disease etiologies.

Figure 1.

Figure 1

Toy Examples to Illustrate How Population-Specific LD Patterns Affect GWAS Associations

(A) SNPs 3 and 5 are causal in both East Asians and Europeans and have the same causal effect size of 0.1. However, due to different LD patterns in East Asians and Europeans, SNPs 2 and 4 are observed to be GWAS significant, respectively.

(B) Different SNPs are causal in East Asians (SNPs 1 and 5) and Europeans (SNPs 2 and 4). However, due to population-specific LD, SNP 3 is observed to be GWAS significant in both populations. The stars in the rightmost plots represent the SNPs with true nonzero effects; the GWAS-significant SNP is highlighted in a darker color.

In this work, we introduce PESCA (population-specific/shared causal variants), an approach that requires only GWAS summary association statistics and ancestry-matched estimates of LD to infer genome-wide proportions of population-specific and shared causal variants for a single trait in two populations. These estimates are then used as priors in an empirical Bayes framework to localize and test for enrichment of population-specific/shared causal variants in regions of interest. In this context, a “causal variant” is a variant measured in the given GWAS that either has a nonzero effect on the trait (e.g., a nonsynonymous variant that alters protein folding) or tags a nonzero effect at an unmeasured variant through LD. It is therefore important to note that the set of “causal variants” that PESCA aims to identify is defined with respect to the set of variants included in the GWAS and can contain variants with indirect nonzero effects that are statistical rather than biological in nature (this is analogous to the definition of SNP-heritability, which is also a function of a specific set of SNPs11,54, 55, 56). We also note that the definition of enrichment used here is related to, but conceptually distinct from, definitions of SNP-heritability enrichment.13,16 Under our framework, an enrichment of causal SNPs greater than 1 indicates that, compared to the genome-wide background, there are more causal SNPs in that region than expected57,58 (Material and Methods). In contrast, an enrichment of SNP-heritability greater than 1 indicates that the average per-SNP effect size in the region is larger than the genome-wide average per-SNP effect size.

Through extensive simulations, we show that our method yields approximately unbiased estimates of the proportions of population-specific/shared causal variants if in-sample LD is used and slightly upward-biased estimates if LD is estimated from an external reference panel. We then show that using these estimates as priors to perform fine-mapping (Material and Methods) produces well-calibrated per-SNP posterior probabilities and enrichment test statistics. We apply our approach to publicly available GWAS summary statistics for nine complex traits and diseases in individuals of East Asian (EAS) and European (EUR) ancestry (average NEAS = 94,621, NEUR = 103,507) (Table 1), restricting to common SNPs (MAF > 5%) and using 1000 Genomes59,60 to estimate ancestry-matched LD. On average across the nine traits, we estimate that approximately 80% (SD 15%) of common SNPs that are causal in EAS and 84% (SD 8%) of those in EUR are shared by the other population. Consistent with previous studies based on SNP-heritability,55,61 we find that high-posterior SNPs are distributed uniformly across the genome. We observe that population-specific GWAS risk regions have, on average across the 9 traits, a 2.8× enrichment of shared high-posterior SNPs relative to the genome-wide background, suggesting that many EAS-specific and EUR-specific GWAS risk regions harbor shared causal SNPs that are undetected in the other population due to differences in LD, allele frequencies, and/or GWAS sample size. The effect sizes of SNPs with posterior probability > 0.8 of being causal (for any causal configuration) are highly correlated between EAS and EUR, concordant with replication slopes between EAS and EUR marginal effects close to 1 that have been reported for several complex diseases33 and with strong transethnic genetic correlations previously reported for the same traits analyzed in this work (average ρˆg=0.79±0.07 SEM across the 9 traits).51 Finally, we show that regions flanking genes that are specifically expressed in trait-relevant tissues62 harbor a disproportionate number of shared high-posterior SNPs. Many of the same tissue-specific gene sets are also enriched with SNP-heritability, implying that SNP-heritability enrichments are driven by many low-effect SNPs rather than a small number of high-effect SNPs. Our results suggest that common causal SNPs have similar etiological roles in EAS and EUR and that transferability of PRS and other GWAS findings across populations can be improved by explicitly correcting for population-specific LD and allele frequencies.

Table 1.

Estimated Numbers and Percentages of Population-Specific/Shared Common Causal SNPs for Nine Complex Traits

Trait Name (abbrev.) Pop. Ref. hˆg2(SE) % Sample Size (n) Total # SNPs (MAF > 5%) EAS-Specific Causals (SE) EUR-Specific Causals (SE) Shared Causals (SE) ρˆg(SE)51
Body mass index (BMI) EAS 22 19.8 (0.6) 224,698 258,130 982 (2); 0.4% 1,033 (2); 0.4% 25,641 (16); 10% 0.80 (0.02)
EUR 63 20.6 (0.9) 158,284
Mean corpuscular hemoglobin (MCH) EAS 21 18.6 (2.2) 108,054 480,684 1,165 (6); 0.2% 728 (3); 0.2% 3,082 (4); 0.6% 0.88 (0.05)
EUR 64 22.7 (3.2) 172,332
Mean corpuscular volume (MCV) EAS 21 21.0 (2.1) 108,256 480,678 1,004 (4); 0.2% 737 (5); 0.2% 3,256 (8); 0.7% 0.89 (0.05)
EUR 64 23.6 (3.1) 172,433
High-density lipoprotein (HDL) EAS 21 20.7 (3.0) 70,657 268,198 3,167 (12); 1% 652 (2); 0.2% 4,789 (9); 2% 0.89 (0.06)
EUR 65 16.4 (2.2) 89,614
Low-density lipoprotein (LDL) EAS 21 9.5 (1.3) 72,866 268,201 969 (5); 0.4% 742 (2); 0.3% 3,129 (6); 1% 0.66 (0.11)
EUR 65 13.6 (1.9) 85,491
Total cholesterol (TC) EAS 21 8.1 (0.8) 128,305 268,197 1,892 (3); 0.7% 1,493 (5); 0.6% 5,058 (12); 2% 0.91 (0.07)
EUR 65 22.5 (2.1) 89,865
Triglyceride (TG) EAS 21 13.5 (3.3) 105,597 268,198 2,245 (3); 0.8% 511 (4); 0.2% 3,432 (7); 1% 0.93 (0.07)
EUR 65 13.6 (2.2) 86,502
Major depressive disorder (MDD) EAS 66 35.6 (3.4) 10,640 389,593 88 (4); 0.02% 3,280 (6); 0.8% 7,830 (6); 2% 0.34 (0.07)
EUR 67 19.0 (1.8) 18,759
Rheumatoid arthritis (RA) EAS 36 28.9 (18.3) 22,515 526,206 3 (0.3); 6e−04% 124 (2); 0.02% 1,080 (6); 0.2% 0.87 (0.10)
EUR 36 9.5 (1.9) 58,284

We estimated genome-wide SNP-heritability using LD score regression54 with the intercept constrained to 1 (i.e., assuming no population stratification). Trans-ethnic genetic correlation estimates (ρˆg) computed from a similar set of summary statistics were obtained from a previous study.51 Standard errors of the estimated numbers of population-specific/shared causal SNPs were computed using the last 50 iterations of the EM-MCMC algorithm.

Material and Methods

Distribution of GWAS Summary Statistics in Two Populations

For a given complex trait, we model the causal statuses of SNP i in two populations as a binary vector of size two, Ci=ci1ci2, where each bit, ci1{0,1} and ci2{0,1}, represents the causal status of SNP i in populations 1 and 2, respectively. Ci=00 indicates that SNP i is not causal in either population; Ci=01 and Ci=10 indicate that SNP i is causal only in the first and second population, respectively; and Ci=11 indicates that SNP i is causal in both populations. We assume Ci follows a multivariate Bernoulli (MVB) distribution68,69

CiMVB(f00,f01,f10,f11)

in order to facilitate optimization and interpretation (Supplemental Material and Methods). Assuming the causal status vector of a SNP is independent from those of other SNPs (CiCj for ij), the joint probability of the causal statuses of p SNPs is Pr(C1,,Cp)=i=1pPr(Ci).

Given two genome-wide association studies with sample sizes n1 and n2 for the first and second populations, respectively, we derive the distribution of Z-scores, Z1 and Z2 (both are p×1 vectors), conditional on the causal status vectors for each population, c1=(c11,,cp1)T and c2=c12,,cp2T. Although it is reasonable to suspect that there are nonzero cross-population correlations of effect sizes at shared causal SNPs, to facilitate inference, we impose the (potentially strong) assumption that Z1 and Z2 are independent given c1 and c2. Thus, for population j,

Zj|cjMVN0,Vj+σj2VjdiagcjVj

where Vj is the p×p LD matrix for population j; diag(cj) is a diagonal matrix in which the kth diagonal element is 1 if ckj=1 and 0 if ckj=0; and σj2=(njhgj2/|cj|), where hgj2 and |cj| are the SNP-heritability of the trait and the number of causal SNPs, respectively, in population j (Supplemental Material and Methods).

Finally, we derive the joint probability of Z1 and Z2 by integrating over all possible causal status vectors in the two populations:

PrZ1,Z2;f=c1c2[i=1pPr(Ci=ci1ci2)j=12MVNZj;0,Vj+σj2VjdiagcjVj] (Equation 1)

where f=f00,f01,f10,f11 is the vector of parameters of the MVB distribution. In practice, we partition the genome into approximately independent regions70 and model the distribution of Z-scores at all regions as the product of the distribution of Z-scores in each region (Supplemental Material and Methods).

Estimating Genome-wide Proportions of Population-Specific/Shared Causal SNPs

We use expectation-maximization (EM) coupled with Markov Chain Monte Carlo (MCMC) to maximize the likelihood function in Equation 1 over the MVB parameters f. We initialize f to f=0,3.9,3.9,3.9 which corresponds to 2% of SNPs being causal in population 1, 2% being causal in population 2, and 2% being shared causals. In the expectation step, we approximate the surrogate function Qf|ft using an efficient Gibbs sampler; in the maximization step, we maximize Qf|ft using analytical formulae (Supplemental Material and Methods). From the estimated f, denoted f, we recover the proportions of population-specific and shared causal SNPs. For computational efficiency, we apply the EM algorithm to each chromosome in parallel and aggregate the chromosomal estimates to obtain estimates of the genome-wide proportions of population-specific/shared causal SNPs.

Evaluating per-SNP Posterior Probabilities of Being Causal in a Single or Both Populations

We estimate the posterior probability of each SNP to be causal in a single population (population-specific) or both populations (shared), using the estimated genome-wide proportions of population-specific and shared causal variants (obtained from f) as prior probabilities in an empirical Bayes framework. Specifically, for each SNP i, we evaluate the posterior probabilities Pr(Ci=01|Z1,Z2;f), Pr(Ci=10|Z1,Z2;f), and Pr(Ci=11|Z1,Z2;f). Since evaluating these probabilities requires integrating over the posterior probabilities of all 2(2p) possible causal status configurations, we use a Gibbs sampler to efficiently approximate the posterior probabilities (Supplemental Material and Methods).

Estimating the Numbers of Population-Specific/Shared Causal SNPs in a Region

We infer the posterior expected numbers of population-specific/shared causal SNPs in a region (e.g., an LD block or a chromosome) conditional on the Z-scores (Z1 and Z2) by summing, across all SNPs in the region, the per-SNP posterior probabilities of being causal in a single or both populations. For example, in a region with p SNPs, the posterior expected number of shared causal SNPs is E[q11|Z1,Z2;f]=i=1pE[1{Ci=11}|Z1,Z2;f]=i=1pPr(Ci=11|Z1,Z2;f). Since SNPs in a region are highly correlated, invalidating the use of jackknife to estimate standard errors, we refrain from reporting standard errors of the posterior expected regional numbers of population-specific/shared causal SNPs.

Defining LD Blocks that Are Approximately Independent in Two Populations

For computational efficiency, PESCA assumes that, in both populations, a SNP in a given block is independent from all SNPs in all other blocks. This assumption requires defining blocks of SNPs that are approximately LD independent in both populations. To this end, we first compute the “transethnic LD matrix” (Vtrans) from the East Asian- and European-ancestry LD matrices (VEAS and VEUR) by setting each element in the transethnic LD matrix to the larger of the East Asian-specific and European-specific pairwise LD; i.e., Vtrans,ij=VEAS,ij if |VEAS,ij|>|VEUR,ij| and Vtrans,ij=VEUR,ij if |VEUR,ij|>|VEAS,ij|. The resulting matrix Vtrans is block diagonal due to shared recombination hotspots in both populations; in practice, we apply this procedure to each chromosome separately to obtain 22 chromosome-wide transethnic LD matrices. We then apply LDetect70 to define LD blocks within the transethnic LD matrix. Applying this procedure using the 1000 Genomes Phase 3 reference panel59,60 to create the transethnic LD matrix produces 1,368 LD blocks (average length of 2 Mb) that are approximately independent in individuals of East Asian and European ancestry.

Enrichment of Population-Specific/Shared Causal SNPs in Functional Annotations

We define the enrichment of population-specific/shared causal SNPs in a functional annotation as the ratio between the posterior and prior expected numbers of population-specific/shared causal SNPs. Specifically, we estimate the enrichment of population-specific/shared causal SNPs in a functional annotation k relative to the genome-wide background as

αˆk,b=E[qk,b|Z1,Z2,f]E[qk,b|f]=i  ψ(k)Pr(Ci=b|Z1,Z2,f)pkPr(Ci=b)

where b{01, 10, 11}, qk,b is the number of population-specific (b=01 or b=10) or shared (b=11) causal variants, ψ(k) is the set of SNPs in functional annotation k, and pk is the number of SNPs in functional annotation k. The numerator, E[qk,b|Z1,Z2,f], and denominator, E[qk,b|f], represent the posterior (conditioned on Z-scores) and prior expected numbers of causal SNPs in functional annotation k, respectively. We estimate the standard error of αˆk,b using block jackknife over 1,368 non-overlapping approximately LD-independent blocks across the entire genome. The resulting enrichment test statistics, αˆk,b1/SEαˆk,b, approximately follow a t-distribution with degrees of freedom equal to the number of blocks minus 1.71 Since we are interested in identifying categories of SNPs that harbor more population-specific/shared causal SNPs than expected (i.e., enrichment > 1), we report p values from a one-tailed t test where the null hypothesis is enrichment 1.

We note that our definition of enrichment of causal SNPs is related to, but conceptually different from, enrichment of SNP-heritability.13,16,62 A positive enrichment of causal SNPs in a functional category indicates that, compared to the genome-wide background, there are more causal SNPs in that category than expected; a positive enrichment of SNP-heritability in a category indicates that the average per-SNP effect size in the category is larger than the genome-wide average per-SNP effect size.

Simulation Framework

We used real chromosome 22 genotypes of 10,000 individuals of East Asian ancestry from CONVERGE66 and 50,000 individuals of white British ancestry from the UK Biobank72,73 to simulate causal effects and phenotypes. First, we used PLINK74 (v.1.9) to remove redundant SNPs in the 1000 Genomes Phase 3 reference panel59,60 such that there are no pairs of SNPs with rij2>0.95 (ij). We also removed strand-ambiguous SNPs and SNPs with MAF < 1% in either reference panel, resulting in a total of M = 8,599 SNPs on chromosome 22 to use in simulations.

Given genotypes at M SNPs for n1 and n2 individuals in populations 1 and 2, respectively, we assume the standard linear models y1=X1β1+ε1 (population 1) and y2=X2β2+ε2 (population 2). We assume the phenotypes are standardized within each population such that E[y1]=0, Var[y1]=I and E[y2]=0, Var[y2]=I. Given c1 and c2, the index sets of causal SNPs in each population, the effects at the ith causal SNP in each population, β1i and β2i, are drawn from

β1c1|c1MVN0,hg12|c1|Ic1, β2c2|c2MVN0,hg22|c2|Ic2

where |c1|=i=1Mci1 and |c2|=i=1Mci2 are the total numbers of causal SNPs in each population, hg12 and hg22 are the total SNP-heritabilities in each population, and E[β1iβ1j]=Cov[β1i,β1j]=0 and E[β2iβ2j]=Cov[β2i,β2j]=0 for SNPs ij. The effects at non-causal SNPs are set to 0. The environmental effects for the nth individual in each population are drawn i.i.d. from ϵ1nN0,1hg12 and ϵ2nN(0, 1hg22).

Finally, given the real genotypes and simulated phenotypes for each population, we compute Z-scores for all SNPs in population k as Zk=1/nkykTXk.

Application to Nine Complex Traits and Diseases

We downloaded publicly available East Asian- and European-ancestry GWAS summary statistics for body mass index (BMI), mean corpuscular hemoglobin (MCH), mean corpuscular volume (MCV), high-density lipoprotein (HDL), low-density lipoprotein (LDL), total cholesterol (TC), triglycerides (TG), major depressive disorder (MDD), and rheumatoid arthritis (RA) from various sources (Table 1). The European-ancestry BMI GWAS is doubly corrected for genomic inflation factor,63 which induces downward-bias in the estimated SNP-heritability; we correct this bias by re-inflating the Z-scores for this GWAS by a factor of 1.24. For all traits, we restrict to SNPs with MAF > 5% in both populations to reduce noise in the LD matrices estimated from 1000 Genomes.59,60 We use PLINK74 (v.19) to remove redundant SNPs such that rˆij2<0.95 for all SNPs ij in both ancestry-matched 1000 Genomes59,60 reference panels. The resulting numbers of SNPs that were analyzed for each trait are listed in Table 1.

For each trait, we test for enrichment of population-specific/shared causal SNPs in 53 publicly available tissue-specific gene annotations,62 each of which represents a set of genes that are “specifically expressed” in a GTEx75 tissue (referred to as “SEG annotations”). We set the threshold for statistical significance to p value < 0.05/53 (Bonferroni correction for the number of tests performed per trait).

Results

Performance of PESCA in Simulations

We assessed the performance of PESCA in simulations starting from real genotypes of individuals with East Asian66 (EAS) or European72,73 (EUR) ancestry (NEAS = 10K, NEUR = 50K, M = 8,599 SNPs) (Material and Methods). First, we find that when in-sample LD from the GWAS is available, PESCA yields approximately unbiased estimates of the numbers of population-specific/shared causal SNPs (Figure 2, top panel). For example, in simulations where we randomly selected 50 EAS-specific, 50 EUR-specific, and 50 shared causal SNPs, we obtained estimates (and corresponding standard errors) of 37.8 (4.5) EAS-specific, 40.3 (4.9) EUR-specific, and 64.9 (6.3) shared causal SNPs, respectively. When external reference LD is used (in this case, from 1000 Genomes59,60), PESCA yields a slight upward bias (Figure 2, bottom panel); on the same simulated data, we obtained estimates of 48.0 (5.9) EAS-specific, 53.7 (7.44) EUR-specific, and 78.8 (7.6) shared causal SNPs.

Figure 2.

Figure 2

Genome-wide Estimates of the Numbers of Population-Specific/Shared Causal SNPs in Simulations

The estimates are approximately unbiased when in-sample LD is used (top) and upward-biased when external reference LD is used (bottom). For both populations, we simulate such that the product of SNP-heritability and GWAS sample size is 500. Mean and standard errors were obtained from 25 independent simulations. Error bars represent ±1.96 of the standard error.

We observe a slight decrease in accuracy as the effective sample size, the product of SNP-heritability and sample size (N×hg2), decreases (Figures S1–S5). This is expected as the likelihood of the GWAS summary statistics is a function of N×hg2 (Material and Methods)—as the expected per-SNP variance at causal SNPs (N×hg2 divided by the number of causal SNPs) decreases, GWAS summary statistics provide less information on the causal status of each SNP. Since it is often the case that the sample size of one GWAS is larger than that of the other, we perform simulations in which SNP-heritability is fixed to 0.05 in both populations, the EAS sample size is fixed to NEAS=104, and the EUR sample size is varied such that the effective sample size of the EUR GWAS is 1–5× larger than that of the EAS GWAS. We find that the genome-wide estimators are relatively robust with in-sample LD; with external estimates of LD, when effective sample size differs by a factor of 2 or more, the estimator for the number of EUR-specific causal SNPs becomes less biased while the EAS-specific and shared causal estimators become increasingly inflated (Figure S6). In addition, while it seems likely that the effect sizes of shared causal SNPs would be positively correlated across populations, the PESCA model assumes zero cross-population correlation in order to facilitate inference (Material and Methods). We therefore perform simulations under an alternative model in which EAS and EUR effect sizes at shared causal SNPs are positively correlated and find that our estimates of the genome-wide numbers of shared and population-specific causal SNPs become increasingly inflated and deflated, respectively, as the correlation increases from 0 to 1 (Figure S7).

Next, we use the estimated genome-wide proportions of population-specific/shared causal SNPs to evaluate per-SNP posterior probabilities of being causal in a single population (EAS only or EUR only) or in both populations (Material and Methods). For each of the three causal configurations of interest (EAS only, EUR only, and shared), we observe an increase in the average correlation between the per-SNP posterior probabilities and the true causal status vector for that configuration as N×hg2 increases and as the total number of causal SNPs decreases (i.e., as per-SNP causal effect sizes increase) (Figures S8 and S9). As expected, as the simulated proportion of shared causal SNPs increases, the average correlation between the posterior probabilities and true causal status vectors increases for the shared causal configuration and decreases for the population-specific causal configurations (Figures S8 and S9). Since we did not have access to individual-level genotypes sampled from an ancestral group with shorter LD blocks (e.g., African-ancestry individuals), we use the EAS and EUR LD scores of each SNP as proxies for the strength of LD in the region housing the SNP to investigate the impact of population-specific LD patterns on the per-SNP posterior probabilities. Among the true causal SNPs (shared or population-specific), the posterior probabilities are relatively invariant to the magnitude of the EAS and EUR LD scores (Figure S10). In other words, under the PESCA framework, power to detect a given true causal SNP does not depend on its LD score in either population. Restricting to a set of “high-posterior SNPs” (defined here as SNPs with posterior probability greater than some threshold t), we investigate whether PESCA systematically misclassifies SNPs based on the magnitude of their LD scores. Again, we observe that the average EAS and EUR LD scores do not vary significantly between the true and false positive classifications (Table S1). We then assessed whether our proposed statistics for testing for enrichment of population-specific/shared causal SNPs in functional annotations (Material and Methods) are well calibrated under the null hypothesis of no enrichment. Overall, when both population-specific and shared causal SNPs are drawn at random, the enrichment test statistics are conservative at different levels of polygenicity and GWAS power (N×hg2), irrespective of whether in-sample LD or external reference LD is used (Figures S11 and S16).

Finally, we evaluated the computational efficiency of each stage of inference. In the first stage of inference—estimating genome-wide proportions of population-specific/shared causal SNPs—the maximization step of the EM algorithm uses Gibbs sampling to efficiently sample from the posterior of the causal status vectors (Supplemental Material and Methods). We set both the number of burn-in iterations and the number of samples to 5,000 for the MCMC within the maximization step and found that the overall EM typically converged within 200 iterations (Figures S17–S19). Run-time per EM-iteration increases with the number of causal SNPs (Figure S20); for example, in simulations with a total of 8,589 SNPs, when the maximum number of EM iterations was set to 200, PESCA took an average of 90 min to obtain estimates in simulations with 20 randomly selected causal variants and 360 min in simulations with 100 randomly selected causal SNPs. This is expected because the likelihood function being maximized is proportional to the Bayes factor of only the causal SNPs (Supplemental Material and Methods). In the second stage of inference—evaluating posterior probabilities for each SNP—we set both the number of burn-in iterations and the number of samples to 5,000 for the MCMC and, to ensure stable estimates of the posterior probability, we report the average posterior probability from 20 iterations of the Gibbs sampling procedure. The average run-time was 5 min in simulations with 20 causal variants and 28 min in simulations with 100 causal variants (Figure S20). We note that both stages of inference can be parallelized to decrease run time.

Expected Genome-wide Proportions of Shared Causal SNPs for Nine Complex Traits

We obtained publicly available GWAS summary statistics for nine (non-independent) complex traits and diseases in individuals of EAS and EUR ancestry (average NEAS = 94,621, NEUR = 103,507) (Table 1) and applied PESCA to estimate the genome-wide proportions of population-specific/shared common causal SNPs (Material and Methods). To ensure convergence, we applied 750 EM iterations for each trait (Figures S21–S23). Across the nine traits, the estimated proportions of common causal SNPs in each population (the sum of the numbers of population-specific and shared causal SNPs) are consistent with previously reported estimates of polygenicity in single populations.7,8,55,76,77 For example, we estimate that approximately 10% of common SNPs have nonzero effects on BMI in both EAS and EUR and that 2%–3% have nonzero effects on the lipids traits (Table 1). The low estimates for major depressive disorder and rheumatoid arthritis may be explained in part by their small GWAS sample sizes. While there is heterogeneity in the estimated proportions of shared causal SNPs across the nine traits, we find that most common causal SNPs are shared between the populations, consistent with findings from previous studies.33 For example, for BMI, we estimate that approximately 96% of common causal SNPs in each population are also causal in the other; for total cholesterol (TC), we estimate that 73% of common causal SNPs in EAS and 77% of those in EUR are shared by both populations (Table 1).

High-Posterior SNPs Are Distributed Nearly Uniformly across the Genome

We define 1,368 regions that are approximately LD independent in both populations and estimate the posterior expected numbers of population-specific/shared causal SNPs in each region (Material and Methods). For all nine traits, high-posterior SNPs for both the population-specific and shared causal configurations are spread nearly uniformly across the genome (Figures 3 and S24–S31). For example, mean corpuscular hemoglobin (MCH) harbored, on average, 0.68 (SD 0.42) EAS-specific, 0.53 (SD 0.40) EUR-specific, and 2.19 (SD 1.46) shared high-posterior SNPs per region (Figures 3 and S29). Aggregating posterior probabilities by chromosome, we find that the posterior expected numbers of EAS-specific, EUR-specific, and shared causal SNPs per chromosome are highly correlated with chromosome length (Figures S32–S34), recapitulating previous findings based on regional SNP-heritability.55,61

Figure 3.

Figure 3

Distributions of the Numbers of Population-Specific/Shared Causal SNPs across 1,368 Regions that are Approximately Independent in Both EAS and EUR

Each violin plot represents the distribution of the posterior expected number of population-specific (red/green) or shared (blue) causal SNPs per region; details on how the regions were defined can be found in the Material and Methods. For a single region, the posterior expected number of SNPs in a given causal configuration is estimated by summing, across all SNPs in the region, the per-SNP posterior probabilities of having that causal configuration. The dark lines mark the means of the distributions. The traits are sorted on the x-axis by the average number of shared high-posterior SNPs per region.

Distributions of High-Posterior SNPs across GWAS Risk Regions

We aggregate per-SNP posterior probabilities within GWAS risk regions that are EAS-specific, EUR-specific, or shared by both populations and find that most GWAS risk regions harbor two or more shared high-posterior SNPs (Figures 4 and S35–S39), concordant with previous findings on allelic heterogeneity of complex traits.55,78,79 On average across the 9 traits, we observe a 2.8× enrichment of shared high-posterior SNPs in population-specific GWAS risk regions relative to the genome-wide background. For example, for MCH, the EAS-specific and EUR-specific GWAS risk regions harbor an average of 3.0 (SD 1.7) and 3.3 (SD 1.5) shared high-posterior SNPs per region, respectively, whereas the average number of shared high-posterior SNPs per region across all regions is 2.0 (SD 1.3) (Figure 4). While BMI, the blood traits (MCH and MCV), and rheumatoid arthritis have similar numbers of EAS-specific and EUR-specific high-posterior SNPs in their population-specific GWAS risk regions, the lipids traits (HDL, LDL, total cholesterol, and triglycerides) have significantly more EAS-specific high-posterior SNPs in all GWAS risk regions (Figures 4 and S35–S39).

Figure 4.

Figure 4

Distributions of the Numbers of Population-Specific/Shared Causal Variants at GWAS Risk Regions for Mean Corpuscular Hemoglobin (MCH)

Each violin plot represents the distribution of the posterior expected number of population-specific (red/green) or shared (blue) causal SNPs at regions with significant associations (pGWAS<5×108) in EAS GWAS only, EUR GWAS only, both EAS and EUR, and neither GWAS. The dark lines mark the means of the distributions.

For each causal configuration (EAS-specific, EUR-specific, or shared), we examine the effect sizes of high-posterior SNPs (posterior probability > 0.8) in EAS and EUR (Figure 5). Across the 9 traits, the majority of EAS-specific high-posterior SNPs are nominally significant (pGWAS<5×106) either in the EAS GWAS only or in both GWASs. While five EUR-specific high-posterior SNPs are nominally significant in only the EAS GWAS, the majority are nominally significant either in the EUR GWAS only or in both GWASs. We observe strong correlations between the effect sizes in EAS and EUR for all three sets of high-posterior SNPs (Pearson r2 of 0.79 [EAS-specific], 0.73 [EUR-specific], and 0.80 [shared]) that are driven by SNPs that are nominally significant in both GWASs (Figure 5). Taken together, these results suggest that most population-specific GWAS risk regions harbor shared causal variants that are undetected in the other population due to heterogeneity in LD structures, allele frequencies, and/or GWAS sample sizes.

Figure 5.

Figure 5

Marginal Regression Coefficients of High-Posterior SNPs for Nine Complex Traits

Each plot corresponds to one of the three causal configurations of interest: EAS-specific (A), EUR-specific (B), and shared (C). Each point represents a SNP with posterior probability > 0.8 for a single trait. The x-axis and y-axis mark the estimated marginal effect sizes in EAS and EUR, respectively. The colors indicate whether the SNP is nominally significant (pGWAS<5×106) in both GWASs (purple), the EAS GWAS only (orange), the EUR GWAS only (green), or in neither GWAS (gray). The gray band marks the 95% confidence interval of the regression line.

Enrichment of High-Posterior SNPs near Genes Expressed in Trait-Relevant Tissues

Motivated by recent work that found enrichment of SNP-heritability in regions near genes that are “specifically expressed” in trait-relevant tissues and cell types (referred to as “SEG annotations”), we tested for enrichments of population-specific and shared causal SNPs in the same 53 tissue-specific SEG annotations.62 For a given causal configuration, the enrichment of causal SNPs in an annotation is defined as the ratio between the posterior and prior expected numbers of causal SNPs in the annotation (Material and Methods). For 8 of the 9 traits, we find significant enrichment of shared high-posterior SNPs in at least one SEG annotation (p value < 0.05/53 to correct for 53 tests per trait) (Figures S40–S44). All SEG annotations with significant enrichments of population-specific high-posterior SNPs are also enriched with shared high-posterior SNPs for the same trait, providing additional evidence that many signatures of population-specific genetic architecture are induced by population-specific LD and allele frequencies rather than distinct genetic etiologies. We do not find enrichment of any high-posterior SNPs in any SEG annotation for major depressive disorder (MDD) (Figure S44), which could be due to low GWAS sample sizes (Table 1). Finally, for each SEG annotation, we obtain a meta-analyzed transethnic SNP-heritability enrichment by computing the inverse-variance weighted average of the EAS and EUR SNP-heritability enrichments (estimated separately using stratified LD score regression13,16). We observe a strong correlation between the meta-analyzed SNP-heritability enrichments and the enrichments of shared high-posterior SNPs (Figure 6), suggesting that SNP-heritability enrichments are largely driven by many low-effect SNPs rather than a small number of high-effect SNPs.

Figure 6.

Figure 6

Enrichments of Shared High-Posterior SNPs in 53 Tissue-Specific Functional Categories are Highly Correlated with SNP-Heritability Enrichments

Each point is a trait-tissue pair; each tissue-specific functional category (SEG annotation) is a set of genes that are “specifically expressed” in one of 53 GTEx tissues. The x-axis is the estimated enrichment of shared high-posterior SNPs in the SEG annotation from PESCA. The y-axis is the meta-analyzed transethnic SNP-heritability explained by the SEG annotation, defined as the inverse-variance weighted average of the EAS and EUR SNP-heritability enrichments (estimated separately using stratified LD score regression). The points are colored by whether the trait has a statistically significant enrichment of shared high-posterior SNPs in the corresponding SEG annotation (FDR < 0.1). The gray band marks the 95% confidence interval of the regression line. Enrichment estimates and standard errors for each trait-tissue pair can be found in Figures S40–S44.

Discussion

We have presented PESCA, a method for estimating the genome-wide proportions of SNPs with nonzero effects in a single population (population-specific) or in two populations (shared) from GWAS summary statistics and estimates of LD. We applied PESCA to EAS and EUR GWAS summary statistics for nine complex traits and find that, while the lipids traits have significantly more EAS-specific common causal SNPs compared to the remaining traits, the majority of common causal SNPs are shared by both populations. Regions that harbor statistically significant GWAS associations for one population are enriched with SNPs with high-posterior probability of being causal in both populations. Morever, high-posterior SNPs (posterior probability > 0.8 for any causal configuration) have highly correlated effect sizes in EAS and EUR, recapitulating findings of previous studies.33 For all traits except MDD, we identify tissue-specific SEG annotations62 enriched with shared high-posterior SNPs and observe that all SEG annotations enriched with population-specific high-posterior SNPs are a subset of those enriched with shared high-posterior SNPs. Taken together, our results indicate that most population-specific GWAS risk regions contain shared common causal SNPs that are undetected in the second population due to differences in LD or allele frequencies. This suggests that localizing shared components of genetic architecture and explicitly correcting for population-specific LD and allele frequencies may help improve transferability of results from well-powered European-ancestry studies to other understudied populations. Based on the simulation results in Figure S1 (in which 100% of causal SNPs are shared) and our estimates of SNP-heritability for the traits in Table 1, we recommend applying PESCA to summary statistics for which the effective per-SNP sample size, N×hg2 divided by the number of causal SNPs, is at least 3 for both GWASs. For a typical quantitative trait (e.g., Table 1), this corresponds to a total effective sample size of approximately N×hg2>10,000.

We conclude by discussing the caveats and limitations of our analyses. First, the estimated proportions of causal SNPs must be interpreted with caution as they can be influenced by gene-environment interactions. For example, if a SNP has a nonzero effect on a trait only in the presence of environmental factors that are specific to EAS-ancestry individuals, PESCA will interpret that SNP as an EAS-specific causal SNP even though it would have a nonzero effect in EUR-ancestry individuals in the presence of the same environmental factors.

Second, we chose to analyze a set of traits for which EAS and EUR GWAS summary statistics were publicly available. Since most publicly available summary statistics of large-scale GWAS are meta-analyses of smaller studies, in-sample LD is often unavailable. While PESCA with in-sample LD is relatively robust to differential GWAS power, with external LD, performance decreases when the GWAS effective sample sizes differ by more than a factor of 2×. We note, however, that for the real traits analyzed in this work, effective sample size differs by a maximum factor of 2× (mean corpuscular hemoglobin; Table 1). Additionally, PESCA currently cannot be applied to admixed populations if in-sample LD is unavailable. An extension of PESCA to properly account for external/noisy estimates of LD would thus increase its utility; we defer a thorough investigation of this to future work. In parallel, in light of ongoing efforts at several institutions to establish biobanks,72,73,80, 81, 82 we believe that well-powered GWASs (with in-sample LD) will become increasingly available for diverse and admixed populations. Another challenge is that many publicly available summary statistics were computed from fixed-effect meta-analyses or linear mixed models. Since the PESCA model is defined with respect to GWAS marginal effects estimated by ordinary least-squares (OLS) regression, it is unclear whether PESCA is sensitive to non-OLS association statistics, which have different statistical properties. We defer a thorough investigation of this to future work.

Third, we restricted our analyses to SNPs with MAF > 5% in both populations to reduce noise in the LD matrices estimated from external reference panels. Consequently, the estimates we report in this work do not capture effects of low frequency or rare variants that are not well-tagged by common SNPs. Furthermore, since most common variants are shared across continental populations and rarer variants tend to localize among closely related populations,60 our study design undersamples population-specific causal variants. We note, however, that lower MAF thresholds can be used if in-sample LD is available. We also note that for the purpose of improving transferability of polygenic risk scores (PRSs) across populations, prediction accuracy depends largely on the accuracy of the PRS weights at common SNPs (the average per-SNP contribution to total SNP-heritability is larger for common SNPs than for low frequency or rare variants11).

Finally, PESCA can be sensitive to model misspecification. For computational efficiency, PESCA relies on having regions that are approximately LD independent in both populations; if there is LD leakage between regions, the estimated proportions of causal SNPs will be biased. We therefore recommend defining LD blocks for each pair of populations one analyzes. Similarly, to facilitate inference, PESCA does not explicitly model cross-population correlations of effect sizes at shared causal variants. We conjecture that modeling these correlations can further improve performance.

Declaration of Interests

The authors declare no competing interests.

Acknowledgments

We are grateful to Kangcheng Hou, Alkes L. Price, and Steven Gazal for helpful discussions that greatly improved the quality of this manuscript. We also thank Na Cai, Sriram Sankararaman, Jonathan Flint, and the UK Biobank (application #33297) for providing resources that made this work possible. This work was funded in part by the National Institutes of Health (NIH) under awards R01HG009120, R01HG006399, R01MH115676, U01CA194393, T32NS048004, T32MH073526, and T32HG002536.

Published: May 21, 2020

Footnotes

Supplemental Data can be found online at https://doi.org/10.1016/j.ajhg.2020.04.012.

Contributor Information

Huwenbo Shi, Email: hshi@hsph.harvard.edu.

Kathryn S. Burch, Email: kathrynburch@ucla.edu.

Web Resources

Supplemental Data

Document S1. Figures S1–S44, Table S1, and Supplemental Material and Methods
mmc1.pdf (6.8MB, pdf)
Document S2. Article plus Supplemental Information
mmc2.pdf (7.8MB, pdf)

References

  • 1.Campbell M.C., Tishkoff S.A. The evolution of human genetic and phenotypic variation in Africa. Curr. Biol. 2010;20:R166–R173. doi: 10.1016/j.cub.2009.11.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Cavalli-Sforza L.L., Menozzi P., Piazza A. Demic expansions and human evolution. Science. 1993;259:639–646. doi: 10.1126/science.8430313. [DOI] [PubMed] [Google Scholar]
  • 3.Pritchard J.K., Pickrell J.K., Coop G. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation. Curr. Biol. 2010;20:R208–R215. doi: 10.1016/j.cub.2009.11.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Laland K.N., Odling-Smee J., Myles S. How culture shaped the human genome: bringing genetics and the human sciences together. Nat. Rev. Genet. 2010;11:137–148. doi: 10.1038/nrg2734. [DOI] [PubMed] [Google Scholar]
  • 5.Martin A.R., Gignoux C.R., Walters R.K., Wojcik G.L., Neale B.M., Gravel S., Daly M.J., Bustamante C.D., Kenny E.E. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 2017;100:635–649. doi: 10.1016/j.ajhg.2017.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Timpson N.J., Greenwood C.M.T., Soranzo N., Lawson D.J., Richards J.B. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat. Rev. Genet. 2018;19:110–124. doi: 10.1038/nrg.2017.101. [DOI] [PubMed] [Google Scholar]
  • 7.O’Connor L.J., Schoech A.P., Hormozdiari F., Gazal S., Patterson N., Price A.L. Extreme Polygenicity of Complex Traits Is Explained by Negative Selection. Am. J. Hum. Genet. 2019;105:456–476. doi: 10.1016/j.ajhg.2019.07.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zeng J., de Vlaming R., Wu Y., Robinson M.R., Lloyd-Jones L.R., Yengo L., Yap C.X., Xue A., Sidorenko J., McRae A.F. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 2018;50:746–753. doi: 10.1038/s41588-018-0101-4. [DOI] [PubMed] [Google Scholar]
  • 9.Zhang Y., Qi G., Park J.-H., Chatterjee N. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits. Nat. Genet. 2018;50:1318–1326. doi: 10.1038/s41588-018-0193-x. [DOI] [PubMed] [Google Scholar]
  • 10.Zhu X., Stephens M. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat. Commun. 2018;9:4361. doi: 10.1038/s41467-018-06805-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Yang J., Benyamin B., McEvoy B.P., Gordon S., Henders A.K., Nyholt D.R., Madden P.A., Heath A.C., Martin N.G., Montgomery G.W. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Schoech A.P., Jordan D.M., Loh P.-R., Gazal S., O’Connor L.J., Balick D.J., Palamara P.F., Finucane H.K., Sunyaev S.R., Price A.L. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 2019;10:790. doi: 10.1038/s41467-019-08424-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Gazal S., Finucane H.K., Furlotte N.A., Loh P.-R., Palamara P.F., Liu X., Schoech A., Bulik-Sullivan B., Neale B.M., Gusev A., Price A.L. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 2017;49:1421–1427. doi: 10.1038/ng.3954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Speed D., Hemani G., Johnson M.R., Balding D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 2012;91:1011–1021. doi: 10.1016/j.ajhg.2012.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Speed D., Cai N., Johnson M.R., Nejentsev S., Balding D.J., UCLEB Consortium Reevaluation of SNP heritability in complex human traits. Nat. Genet. 2017;49:986–992. doi: 10.1038/ng.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.-R., Anttila V., Xu H., Zang C., Farh K., ReproGen Consortium. Schizophrenia Working Group of the Psychiatric Genomics Consortium. RACI Consortium Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Popejoy A.B., Fullerton S.M. Genomics is failing on diversity. Nature. 2016;538:161–164. doi: 10.1038/538161a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rosenberg N.A., Huang L., Jewett E.M., Szpiech Z.A., Jankovic I., Boehnke M. Genome-wide association studies in diverse populations. Nat. Rev. Genet. 2010;11:356–366. doi: 10.1038/nrg2760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Visscher P.M., Brown M.A., McCarthy M.I., Yang J. Five years of GWAS discovery. Am. J. Hum. Genet. 2012;90:7–24. doi: 10.1016/j.ajhg.2011.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kanai M., Akiyama M., Takahashi A., Matoba N., Momozawa Y., Ikeda M., Iwata N., Ikegawa S., Hirata M., Matsuda K. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 2018;50:390–400. doi: 10.1038/s41588-018-0047-6. [DOI] [PubMed] [Google Scholar]
  • 22.Akiyama M., Okada Y., Kanai M., Takahashi A., Momozawa Y., Ikeda M., Iwata N., Ikegawa S., Hirata M., Matsuda K. Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nat. Genet. 2017;49:1458–1467. doi: 10.1038/ng.3951. [DOI] [PubMed] [Google Scholar]
  • 23.Li Z., Chen J., Yu H., He L., Xu Y., Zhang D., Yi Q., Li C., Li X., Shen J. Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia. Nat. Genet. 2017;49:1576–1583. doi: 10.1038/ng.3973. [DOI] [PubMed] [Google Scholar]
  • 24.Liu J.Z., van Sommeren S., Huang H., Ng S.C., Alberts R., Takahashi A., Ripke S., Lee J.C., Jostins L., Shah T., International Multiple Sclerosis Genetics Consortium. International IBD Genetics Consortium Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 2015;47:979–986. doi: 10.1038/ng.3359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ng M.C.Y., Shriner D., Chen B.H., Li J., Chen W.-M., Guo X., Liu J., Bielinski S.J., Yanek L.R., Nalls M.A., FIND Consortium. eMERGE Consortium. DIAGRAM Consortium. MuTHER Consortium. MEta-analysis of type 2 DIabetes in African Americans Consortium Meta-analysis of genome-wide association studies in African Americans provides insights into the genetic architecture of type 2 diabetes. PLoS Genet. 2014;10:e1004517. doi: 10.1371/journal.pgen.1004517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Franceschini N., Fox E., Zhang Z., Edwards T.L., Nalls M.A., Sung Y.J., Tayo B.O., Sun Y.V., Gottesman O., Adeyemo A., Asian Genetic Epidemiology Network Consortium Genome-wide association analysis of blood-pressure traits in African-ancestry individuals reveals common associated genes in African and non-African populations. Am. J. Hum. Genet. 2013;93:545–554. doi: 10.1016/j.ajhg.2013.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Schick U.M., Jain D., Hodonsky C.J., Morrison J.V., Davis J.P., Brown L., Sofer T., Conomos M.P., Schurmann C., McHugh C.P. Genome-wide association study of platelet count identifies ancestry-specific loci in Hispanic/Latino Americans. Am. J. Hum. Genet. 2016;98:229–242. doi: 10.1016/j.ajhg.2015.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kichaev G., Pasaniuc B. Leveraging Functional-Annotation Data in Trans-ethnic Fine-Mapping Studies. Am. J. Hum. Genet. 2015;97:260–271. doi: 10.1016/j.ajhg.2015.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mancuso N., Rohland N., Rand K.A., Tandon A., Allen A., Quinque D., Mallick S., Li H., Stram A., Sheng X., PRACTICAL consortium The contribution of rare variation to prostate cancer heritability. Nat. Genet. 2016;48:30–35. doi: 10.1038/ng.3446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Brown B.C., Ye C.J., Price A.L., Zaitlen N., Asian Genetic Epidemiology Network Type 2 Diabetes Consortium Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 2016;99:76–88. doi: 10.1016/j.ajhg.2016.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Morris A.P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 2011;35:809–822. doi: 10.1002/gepi.20630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lam M., Chen C.-Y., Li Z., Martin A.R., Bryois J., Ma X., Gaspar H., Ikeda M., Benyamin B., Brown B.C., Schizophrenia Working Group of the Psychiatric Genomics Consortium. Indonesia Schizophrenia Consortium. Genetic REsearch on schizophreniA neTwork-China and the Netherlands (GREAT-CN) Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 2019;51:1670–1678. doi: 10.1038/s41588-019-0512-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Marigorta U.M., Navarro A. High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genet. 2013;9:e1003566. doi: 10.1371/journal.pgen.1003566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Kraft P., Zeggini E., Ioannidis J.P.A. Replication in genome-wide association studies. Stat. Sci. 2009;24:561–573. doi: 10.1214/09-STS290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Li Y.R., Keating B.J. Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Med. 2014;6:91. doi: 10.1186/s13073-014-0091-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Okada Y., Wu D., Trynka G., Raj T., Terao C., Ikari K., Kochi Y., Ohmura K., Suzuki A., Yoshida S., RACI consortium. GARNET consortium Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature. 2014;506:376–381. doi: 10.1038/nature12873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wu Y., Waite L.L., Jackson A.U., Sheu W.H.-H., Buyske S., Absher D., Arnett D.K., Boerwinkle E., Bonnycastle L.L., Carty C.L. Trans-ethnic fine-mapping of lipid loci identifies population-specific signals and allelic heterogeneity that increases the trait variance explained. PLoS Genet. 2013;9:e1003379. doi: 10.1371/journal.pgen.1003379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Asimit J.L., Rainbow D.B., Fortune M.D., Grinberg N.F., Wicker L.S., Wallace C. Stochastic search and joint fine-mapping increases accuracy and identifies previously unreported associations in immune-mediated diseases. Nat. Commun. 2019;10:3216. doi: 10.1038/s41467-019-11271-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Zaitlen N., Paşaniuc B., Gur T., Ziv E., Halperin E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 2010;86:23–33. doi: 10.1016/j.ajhg.2009.11.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Wen X., Luca F., Pique-Regi R. Cross-population joint analysis of eQTLs: fine mapping and functional annotation. PLoS Genet. 2015;11:e1005176. doi: 10.1371/journal.pgen.1005176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.-R., Bhatia G., Do R., Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Márquez-Luna C., Loh P.-R., Price A.L., South Asian Type 2 Diabetes (SAT2D) Consortium. SIGMA Type 2 Diabetes Consortium Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet. Epidemiol. 2017;41:811–823. doi: 10.1002/gepi.22083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Lewis C.M., Vassos E. Prospects for using risk scores in polygenic medicine. Genome Med. 2017;9:96. doi: 10.1186/s13073-017-0489-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Curtis D. Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia. Psychiatr. Genet. 2018;28:85–89. doi: 10.1097/YPG.0000000000000206. [DOI] [PubMed] [Google Scholar]
  • 45.Chen C.-Y., Han J., Hunter D.J., Kraft P., Price A.L. Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction. Genet. Epidemiol. 2015;39:427–438. doi: 10.1002/gepi.21906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Wojcik G.L., Graff M., Nishimura K.K., Tao R., Haessler J., Gignoux C.R., Highland H.M., Patel Y.M., Sorokin E.P., Avery C.L. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Sirugo G., Williams S.M., Tishkoff S.A. The Missing Diversity in Human Genetic Studies. Cell. 2019;177:26–31. doi: 10.1016/j.cell.2019.02.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Gurdasani D., Barroso I., Zeggini E., Sandhu M.S. Genomics of disease risk in globally diverse populations. Nat. Rev. Genet. 2019;20:520–535. doi: 10.1038/s41576-019-0144-0. [DOI] [PubMed] [Google Scholar]
  • 49.Martin A.R., Kanai M., Kamatani Y., Okada Y., Neale B.M., Daly M.J. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Ikeda M., Takahashi A., Kamatani Y., Momozawa Y., Saito T., Kondo K., Shimasaki A., Kawase K., Sakusabe T., Iwayama Y. Genome-Wide Association Study Detected Novel Susceptibility Genes for Schizophrenia and Shared Trans-Populations/Diseases Genetic Effect. Schizophr. Bull. 2019;45:824–834. doi: 10.1093/schbul/sby140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Shi H., Gazal S., Kanai M., Koch E.M., Schoech A.P., Kim S.S., Luo Y., Amariuta T., Okada Y., Raychaudhuri S. Population-specific causal disease effect sizes in functionally important regions impacted by selection. bioRxiv. 2019 doi: 10.1101/803452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Galinsky K.J., Reshef Y.A., Finucane H.K., Loh P.-R., Zaitlen N., Patterson N.J., Brown B.C., Price A.L. Estimating cross-population genetic correlations of causal effect sizes. Genet. Epidemiol. 2019;43:180–188. doi: 10.1002/gepi.22173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Guo J., Bakshi A., Wang Y., Jiang L., Yengo L., Goddard M.E., Visscher P.M., Yang J. Quantifying genetic heterogeneity between continental populations for human height and body mass index. bioRxiv. 2019 doi: 10.1101/839373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Bulik-Sullivan B.K., Loh P.-R., Finucane H.K., Ripke S., Yang J., Patterson N., Daly M.J., Price A.L., Neale B.M., Schizophrenia Working Group of the Psychiatric Genomics Consortium LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Shi H., Kichaev G., Pasaniuc B. Contrasting the Genetic Architecture of 30 Complex Traits from Summary Association Data. Am. J. Hum. Genet. 2016;99:139–153. doi: 10.1016/j.ajhg.2016.05.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Hou K., Burch K.S., Majumdar A., Shi H., Mancuso N., Wu Y., Sankararaman S., Pasaniuc B. Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture. Nat. Genet. 2019;51:1244–1251. doi: 10.1038/s41588-019-0465-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Farh K.K.-H., Marson A., Zhu J., Kleinewietfeld M., Housley W.J., Beik S., Shoresh N., Whitton H., Ryan R.J.H., Shishkin A.A. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015;518:337–343. doi: 10.1038/nature13835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Huang H., Fang M., Jostins L., Umićević Mirkov M., Boucher G., Anderson C.A., Andersen V., Cleynen I., Cortes A., Crins F., International Inflammatory Bowel Disease Genetics Consortium Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature. 2017;547:173–178. doi: 10.1038/nature22969. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Abecasis G.R., Auton A., Brooks L.D., DePristo M.A., Durbin R.M., Handsaker R.E., Kang H.M., Marth G.T., McVean G.A., 1000 Genomes Project Consortium An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R., 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Loh P.-R., Bhatia G., Gusev A., Finucane H.K., Bulik-Sullivan B.K., Pollack S.J., de Candia T.R., Lee S.H., Wray N.R., Kendler K.S., Schizophrenia Working Group of Psychiatric Genomics Consortium Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 2015;47:1385–1392. doi: 10.1038/ng.3431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Finucane H.K., Reshef Y.A., Anttila V., Slowikowski K., Gusev A., Byrnes A., Gazal S., Loh P.-R., Lareau C., Shoresh N., Brainstorm Consortium Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 2018;50:621–629. doi: 10.1038/s41588-018-0081-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Locke A.E., Kahali B., Berndt S.I., Justice A.E., Pers T.H., Day F.R., Powell C., Vedantam S., Buchkovich M.L., Yang J., LifeLines Cohort Study. ADIPOGen Consortium. AGEN-BMI Working Group. CARDIOGRAMplusC4D Consortium. CKDGen Consortium. GLGC. ICBP. MAGIC Investigators. MuTHER Consortium. MIGen Consortium. PAGE Consortium. ReproGen Consortium. GENIE Consortium. International Endogene Consortium Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Astle W.J., Elding H., Jiang T., Allen D., Ruklisa D., Mann A.L., Mead D., Bouman H., Riveros-Mckay F., Kostadima M.A. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell. 2016;167:1415–1429.e19. doi: 10.1016/j.cell.2016.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Teslovich T.M., Musunuru K., Smith A.V., Edmondson A.C., Stylianou I.M., Koseki M., Pirruccello J.P., Ripatti S., Chasman D.I., Willer C.J. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010;466:707–713. doi: 10.1038/nature09270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Cai N., Bigdeli T.B., Kretzschmar W., Li Y., Liang J., Song L., Hu J., Li Q., Jin W., Hu Z., CONVERGE consortium Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature. 2015;523:588–591. doi: 10.1038/nature14659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Wray N.R., Ripke S., Matthiesen M., Trzaskowski M., Byrne E.M., Abdellaoui A., Adams M.J., Agerbo E., Air T.M., Andlaur T.M.F. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 2018;50:668–681. doi: 10.1038/s41588-018-0090-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Dai B., Ding S., Wahba G. Multivariate bernoulli distribution. Bernoulli. 2013;19:1465–1483. [Google Scholar]
  • 69.Shi H., Pasaniuc B., Lange K.L. A multivariate Bernoulli model to predict DNaseI hypersensitivity status from haplotype data. Bioinformatics. 2015;31:3514–3521. doi: 10.1093/bioinformatics/btv397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Berisa T., Pickrell J.K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2016;32:283–285. doi: 10.1093/bioinformatics/btv546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Miller R.G. Jackknifing variances. Ann. Math. Stat. 1968;39:567–582. [Google Scholar]
  • 72.Sudlow C., Gallacher J., Allen N., Beral V., Burton P., Danesh J., Downey P., Elliott P., Green J., Landray M. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., de Bakker P.I.W., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Lonsdale J., Thomas J., Salvatore M., Phillips R., Lo E., Shad S., Hasz R., Walters G., Garcia F., Young N., GTEx Consortium The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Johnson R., Shi H., Pasaniuc B., Sankararaman S. A unifying framework for joint trait analysis under a non-infinitesimal model. Bioinformatics. 2018;34:i195–i201. doi: 10.1093/bioinformatics/bty254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Holland D., Frei O., Desikan R., Fan C.-C., Shadrin A.A., Smeland O.B., Sundar V.S., Thompson P., Andreassen O.A., Dale A.M. Beyond SNP Heritability: Polygenicity and Discoverability of Phenotypes Estimated with a Univariate Gaussian Mixture Model. bioRxiv. 2019 doi: 10.1101/133132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Hormozdiari F., Zhu A., Kichaev G., Ju C.J.-T., Segrè A.V., Joo J.W.J., Won H., Sankararaman S., Pasaniuc B., Shifman S., Eskin E. Widespread allelic heterogeneity in complex traits. Am. J. Hum. Genet. 2017;100:789–802. doi: 10.1016/j.ajhg.2017.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Gusev A., Bhatia G., Zaitlen N., Vilhjalmsson B.J., Diogo D., Stahl E.A., Gregersen P.K., Worthington J., Klareskog L., Raychaudhuri S. Quantifying missing heritability at known GWAS loci. PLoS Genet. 2013;9:e1003993. doi: 10.1371/journal.pgen.1003993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Nagai A., Hirata M., Kamatani Y., Muto K., Matsuda K., Kiyohara Y., Ninomiya T., Tamakoshi A., Yamagata Z., Mushiroda T., BioBank Japan Cooperative Hospital Group Overview of the BioBank Japan Project: Study design and profile. J. Epidemiol. 2017;27(3S):S2–S8. doi: 10.1016/j.je.2016.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Gaziano J.M., Concato J., Brophy M., Fiore L., Pyarajan S., Breeling J., Whitbourne S., Deen J., Shannon C., Humphries D. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 2016;70:214–223. doi: 10.1016/j.jclinepi.2015.09.016. [DOI] [PubMed] [Google Scholar]
  • 82.Leitsalu L., Haller T., Esko T., Tammesoo M.-L., Alavere H., Snieder H., Perola M., Ng P.C., Mägi R., Milani L. Cohort Profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 2015;44:1137–1147. doi: 10.1093/ije/dyt268. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S44, Table S1, and Supplemental Material and Methods
mmc1.pdf (6.8MB, pdf)
Document S2. Article plus Supplemental Information
mmc2.pdf (7.8MB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES