Summary
Over the past two decades, genome-wide association studies (GWASs) have successfully advanced our understanding of the genetic basis of complex traits. Despite the fruitful discovery of GWASs, most GWAS samples are collected from European populations, and these GWASs are often criticized for their lack of ancestry diversity. Trans-ancestry association mapping (TRAM) offers an exciting opportunity to fill the gap of disparities in genetic studies between non-Europeans and Europeans. Here, we propose a statistical method, LOG-TRAM, to leverage the local genetic architecture for TRAM. By using biobank-scale datasets, we showed that LOG-TRAM can greatly improve the statistical power of identifying risk variants in under-represented populations while producing well-calibrated p values. We applied LOG-TRAM to the GWAS summary statistics of various complex traits/diseases from BioBank Japan, UK Biobank, and African populations. We obtained substantial gains in power and achieved effective correction of confounding biases in TRAM. Finally, we showed that LOG-TRAM can be successfully applied to identify ancestry-specific loci and the LOG-TRAM output can be further used for construction of more accurate polygenic risk scores in under-represented populations.
Keywords: GWAS, meta-analysis, trans-ancestry, local genetic architecture, confounding bias
We introduce LOG-TRAM, a method to leverage the local genetic architecture for trans-ancestry association mapping. Not only can LOG-TRAM control type I errors at the nominal level, but it also can improve the statistical power of identifying risk variants in under-represented populations.
Introduction
Thousands of genome-wide association studies (GWASs) have been conducted to understand the genetic basis of complex human traits/diseases since the first GWAS publication in 2005.1 As of April 2022, more than 370,000 genome-wide significant associations (p value ) have been identified between single-nucleotide polymorphisms (SNPs) and complex human traits.2 The summary statistics from GWASs are accessible through public gateways, such as the GWAS Catalog.2 These datasets contain rich information on genetic variations and phenotypes, providing an unprecedented research opportunity in biomedical and social science.3
Despite the great achievements, GWAS findings are limited by the lack of ancestral diversity. According to the GWAS Diversity Monitor (https://gwasdiversitymonitor.com), about of GWAS participants have been of European ancestry (EUR) to date.4 Non-European populations are severely under-represented in GWASs. For example, the proportions of participants of East Asian (EAS) and African ancestries are less than and , respectively.4 Most of GWAS findings are based on individuals of European ancestry. Unfortunately, these findings may not be directly extrapolated to non-European ancestries.5 As genetic studies with diversified ancestries are important in achieving global health equity, trans-ancestry association mapping (TRAM), which aims to identify risk genetic variants across ancestries (particularly in the under-represented populations), has become a critical step toward precision medicine.5,6
The challenges of TRAM arise from two major aspects. First, the genetic architectures of a phenotype are heterogeneous across ancestries.5 Some trait-associated SNPs have vastly different allele frequencies between European and non-European ancestries.5,7 SNP effect sizes and linkage disequilibrium (LD) patterns can also vary across ancestries.8 Second, the publicly released GWAS summary statistics still suffer from confounding biases.9 Although principal-component analysis (PCA)10 and linear mixed models (LMMs)11 are commonly used for association mapping in GWASs, the population stratification in biobank-scale data, such as socioeconomic status12 or geographic structure,13 may not be fully accounted for in these standard approaches.14 Without correcting confounding biases hidden in GWAS summary statistics, TRAM will produce many false positive findings.
Much effort has been devoted to the development of statistical methods for TRAM. To handle heterogeneity of genetic architectures, several statistical methods have been developed for meta-analysis of GWAS data in various contexts.15 To name a few, the random-effects (RE) model is a popular tool for meta-analysis of GWASs.16,17 Despite its popularity, the RE model assumes that effect sizes vary greatly under the null hypothesis. This assumption can be fairly conservative.18 To improve the power of RE models for TRAM, a new RE method named RE218 has been developed by assuming no heterogeneity under the null hypothesis. Later, these authors further improved the RE2 method by allowing meta-analysis of correlated statistics, namely, RE2C.19 Unlike the general hypothesis testing methods, MANTRA was specifically designed for meta-analysis of trans-ancestry GWAS data.20 MANTRA assumes that the effect sizes should be closely matched for similar genetic ancestries; however, it allows for heterogeneity in effect sizes for more distal ancestries. MTAG21 is a recently developed method to analyze multiple GWASs from a single ancestry. MTAG can greatly improve the statistical power of association mapping because it uses the global genetic correlation to borrow information across related traits. It assumes that the variance and covariance of marginal effect sizes are homogeneous across SNPs. This assumption may limit its usage in the trans-ancestry setting. Very recently, MAMA22 has been developed to improve MTAG for TRAM by accounting for heterogeneity in LD across ancestries. However, both MTAG and MAMA are still not fully satisfactory when the local genetic architecture differs from the global architecture.
In this work, we develop a statistical method to leverage the local genetic architecture for trans-ancestry association mapping (LOG-TRAM). Not only can LOG-TRAM control type I errors at the nominal level, it can also improve the statistical power of identifying risk variants in under-represented populations. Below are the keys to the success of LOG-TRAM. First, it can greatly improve the statistical power of association mapping by making use of biobank-scale datasets of the auxiliary population (e.g., the UK Biobank [UKBB] dataset) while accounting for heterogeneity among multiple ancestries. Second, compared to existing meta-analysis methods that consider the global genetic architecture, LOG-TRAM focuses on the local genetic architectures to localize risk variants, including local heritability, local co-heritability, allele frequencies, SNP effect sizes, and LD patterns. Third, it is capable of correcting the confounding bias hidden in GWAS summary statistics to avoid inflated type I errors. Fourth, LOG-TRAM only takes summary statistics from multiple ancestries as inputs and outputs well-calibrated p values. With the innovations of our model design, LOG-TRAM is computationally efficient, as it has a closed-form solution at each step. Through comprehensive simulation studies, we demonstrated that LOG-TRAM largely outperformed existing meta-analysis approaches in terms of type I error rate and power. Then we applied LOG-TRAM to the GWAS summary statistics of 29 complex traits and diseases from BioBank Japan (BBJ) and UKBB. The analysis results show that LOG-TRAM can effectively account for confounding biases in the GWAS summary statistics and achieve a substantial gain in power for identification of risk variants. Furthermore, we showed the generality of our method in other under-represented populations by applying LOG-TRAM to integrate the GWASs of 17 traits from African and EUR/EAS. Finally, we successfully applied LOG-TRAM to identify ancestry-specific loci and showed that the LOG-TRAM output can be used for the construction of more accurate polygenic risk scores (PRSs) in under-represented populations.
Material and methods
Notation and problem setup
To introduce LOG-TRAM, we begin our formulation with the individual-level GWAS data. Without loss of generality, we consider two populations and the following model that relates genotypes with phenotypes in populations 1 and 2, respectively,
| (Equation 1) |
where and are two phenotype vectors, and are standardized genotype matrices whose columns have zero mean and unit variance, and are the SNP effect sizes, the residual vectors and are independent error terms, M is the number of SNPs, and and are the sample sizes of populations 1 and 2, respectively. We consider population 1 as the under-represented target population and population 2 as the auxiliary population with a biobank-scale sample size, i.e., . We expect that information from population 2 can be useful for association mapping in population 1. Here, we assume that the covariates (e.g., age, sex, and principal components) have been properly adjusted. More detailed treatment on covariate adjustment can be found in our previous work23 and other related work.24 We also implicitly assume that SNP effect sizes increase as the allele frequencies decrease when working with standardized genotype matrices.23
Suppose that individual-level data are not accessible but summary statistics and from the simple linear regressions are available:
| (Equation 2) |
Traditional association methods25 often report genome-wide significant association status based on the Z scores and . However, these association methods have several limitations. First, they do not account for the heterogeneity of LD patterns across ancestries. Second, they do not correct possible confounding factors hidden in summary statistics , resulting in inflated type I errors. Third, they do not fully utilize the correlation between and , leading to a suboptimal statistical power.
Recent meta-analysis studies have revealed that statistical power of association mapping in under-represented populations can be improved by leveraging global cross-population genetic correlation.20,22 However, current trans-ancestry GWAS meta-analysis relies on the assumption that variance and covariance of SNP effect sizes are homogeneous across the entire genome.21,22 As a matter of fact, accumulating evidence suggests different genomic regions contribute disproportionately to the heritability across the genome, inducing heterogeneous genetic similarity between populations in different local regions.26, 27, 28, 29, 30 For example, a very recent study28 has shown that the shared genetics between autism spectrum disorder (ASD) and cognitive performance (CP) and between ASD and intellectual disability are distinct at a local scale, which explains an empirical paradox: a positive genetic correlation between ASD and CP contradicts the comorbidity between ASD and intellectual disability. Therefore, it can be more appropriate to localize risk variants by leveraging the local genetic architecture rather than the global architecture. To see this more intuitively, let us consider the global genetic correlation averaged across the entire genome and the local genetic correlation defined on a local genomic region , where the local genetic correlation can be very different from the global .27,28,31,32 On the one hand, can be nearly zero even if the global correlation is substantial, e.g., . Clearly, no information should be borrowed from the auxiliary population for association mapping of the target population in the local region . In such a case, leveraging the global genetic correlation for TRAM leads to inflated type I errors. On the other hand, when using the global genetic correlation for TRAM, we lose statistical power substantially in the presence of strong local genetic correlation but weak global genetic correlation (e.g., ).
To leverage the local genetic architecture for TRAM, we propose a statistical method named LOG-TRAM, which only requires GWAS summary statistics and a reference genome (e.g., The 1000 Genomes Project) as inputs (Figure 1). The output of LOG-TRAM has the following desired properties: (1) by borrowing information from biobank-scale European datasets through the local genetic architecture, the statistical power of non-EUR association studies can be largely improved; (2) the influence of confounding factors can be adjusted; (3) the LOG-TRAM estimate is unbiased and its standard error can be much smaller than that of standard GWASs (see supplemental methods, section 3.4); and (4) the LOG-TRAM output can be used for downstream analysis, such as construction of more accurate polygenic risk scores in the under-represented populations.
Figure 1.
LOG-TRAM overview
LOG-TRAM only requires GWAS summary statistics and reference genomes of the considered populations as inputs. During the trans-ancestry meta-analysis, LOG-TRAM can handle the heterogeneity of genetic architectures (e.g., allele frequencies, LD patterns, and SNP effect sizes) across populations, and leverage information from biobank-scale European datasets through the local genetic architecture to increase statistical power and simultaneously reduce false positives for non-EUR association studies.
The LOG-TRAM model
To leverage the local genetic architecture, we first partition the genome into consecutive non-overlapping regions (e.g., 1 M base pair segments). Then we focus on identification of risk variants within a given region at a time. Given the local region , we extend Equation 1 as:
| (Equation 3) |
where is the set of all SNPs in the genome except the local region , and are vectors of independent noise with , , and . Partitioning the genome into the local region and the background region allows us to model the regional genetic architecture differently from the global pattern. To achieve this, we introduce the following probabilistic structure for and :
| (Equation 4) |
where and capture the genetic covariance of two populations in the local region and background region , respectively. The diagonal elements represent the local/global per-SNP heritability for populations 1 and 2, and the off-diagonal elements represent the local/global per-SNP co-heritability between the two populations. Without loss of generality, we assume that the phenotypes have been standardized, i.e., . Based on Equations 3 and 4, we have and , where and represent the number of SNPs in and , respectively.
The flexible probabilistic structure given in Equation 4 has three salient properties. First, we allow the effect sizes of SNP j in the target region to have different variances ( and ) in different populations. Importantly, we achieve this through the moment conditions, which does not require any distributional assumptions on . When only a subset of SNPs have non-zero effects, and measure the average variances in local region induced by the non-zero effects in the corresponding populations. This formulation also offers the flexibility to capture the heterogeneous effect sizes across populations. In the extreme case, we allow while when SNPs in local region have zero effects in population 1 but non-zero effects in population 2. Second, we characterize the relationship of cross-population effect sizes in region by the genetic covariance term . When a subset of SNPs have similar effect sizes in two populations, it will be reflected by a non-zero . This property allows us to borrow local information from biobank-scale datasets. Third, rather than assuming that the genetic covariance of SNP effect sizes are homogeneous across the entire genome, i.e., , we allow the per-SNP heritability of the local genomic region ( and ) to be different from the genome background ( and ) and the local per-SNP co-heritability to be different from the global per-SNP co-heritability , effectively capturing the heterogeneous local structures. This assumption distinguishes LOG-TRAM from MTAG or MAMA, both of which assume a global covariance shared across all SNPs .
So far, we have described the method without covariates. In the presence of covariates such as gender, age, and principal-component scores, we extend Equation 3 as
| (Equation 5) |
where and are the covariate matrices of the target and auxiliary population, respectively, and and are the corresponding vectors of covariates effects. To get rid of the covariates, we define the projection matrices and and obtain the new working model as
| (Equation 6) |
where , , , , , and . By the projection of covariates, Equation 5 reduces to Equation 3. In genomics, this projection approach is commonly used for association mapping33 and heritability estimation.34 In the main text, without loss of generality, we work on Equation 3 in the absence of covariates.
The LOG-TRAM model with summary-level data
The individual-level GWAS data are often not easily accessible because of privacy concerns. To overcome this difficulty, we consider the rows of and as independent and identically distributed samples from the target population and auxiliary population, respectively. As such, we define the correlation between SNP j and SNP k as and for target and auxiliary populations, respectively. We can then define the underlying true marginal effect sizes as
| (Equation 7) |
As we can observe, the true marginal effect size is the summation of underlying true effect sizes in LD with SNP j weighted by the pairwise SNP correlation between SNPs j and k. With the above model specification, we can show that the following relationship holds for the obtained summary statistics for SNP j in the local region ,
| (Equation 8) |
where and are the independent estimation errors with and . Because the two datasets are from different populations, we have .
To establish the connection between observed summary statistics and model parameters , we first introduce the LD scores in the context of the LOG-TRAM model. Given the SNP correlations within a population (e.g., and ), we denote and as the single-population LD scores of SNP j in population 1 for regions and , respectively, and as the single-population LD scores of SNP j in population 2 for regions and , respectively, and and as the cross-population LD scores of SNP j for regions and , respectively. We leave the details of LD score calculation in supplemental methods, section 3.5.
On the basis of above, we can derive the following relationships (see supplemental methods, section 3.1):
| (Equation 9) |
As LD decays exponentially with distance,35 Equation 9 implies that the obtained summary statistics largely depend on the local genetic architecture, including the local LD scores (, , and ), local per-SNP heritability ( and ), and local per-SNP co-heritability (). Combining Equation 8 and Equation 9, we can further derive the relationship between the summary statistics and the model parameters and ,
| (Equation 10) |
where is the element-wise product.
To account for the hidden confounding biases in GWAS summary statistics, we generalize linear Equation 3 based on the genetic drift model used in LDSC9 and modify Equation 9 as (see supplemental methods, section 3.2)
| (Equation 11) |
where , , and are inflation constants that adjust for the confounding biases of GWAS standard errors. From Equation 11, we can see that the influence of population stratification on the variance of marginal estimates remains nearly constant across SNPs (, , and ), while the magnitudes of genetic effects are tagged by LD scores. Therefore, the Equation 10 can be updated as follows:
| (Equation 12) |
In real data analysis, we find that inflation constants and are often larger than one, and is very close to zero because of no overlapped samples in trans-ancestry association studies.
The LOG-TRAM estimator
We use the generalized method of moments (GMM)22 to derive the LOG-TRAM estimator . Using Equation 12, we first obtain the conditional mean and conditional variance (see supplemental methods, section 3.3) as
| (Equation 13) |
| (Equation 14) |
where is the first column of . On the basis of Equation 13, we can define and give the first-order moment condition
| (Equation 15) |
On the basis of GMM, we can obtain the LOG-TRAM estimator as
| (Equation 16) |
and its variance as
| (Equation 17) |
Similarly, we can obtain and its variance . Equation 16 implies that the LOG-TRAM estimator incorporates three parts of information. First, the LD heterogeneity between populations is properly adjusted by the cross-population LD scores and when constructing defined in Equation 10. When the LD structure is very different, the cross-population LD scores become smaller, which down-weighs the genetic covariance. Second, by introducing the local genetic covariance , LOG-TRAM is able to utilize the local genetic architecture to improve association mapping when is non-zero. When , indicating independence between effects sizes of the two populations in region , we show that LOG-TRAM statistics would be equivalent to the original GWAS statistics (see supplemental methods, section 3.6), which avoids incorrectly borrowing information from the auxiliary population. Importantly, LOG-TRAM can estimate both the variances ( and ) and the covariance from the data with the unbiased moment estimator (see the following section on parameter estimation and statistical inference), making it adaptive to the local genetic architecture in real data analysis. Third, LOG-TRAM corrects confounding biases from the GWAS data through the term and thus produces well-calibrated statistics.
Parameter estimation and statistical inference
To obtain unknown parameters in and of Equation 16, we consider the following regressions derived from Equation 11:
| (Equation 18) |
Practically, we regress the squared Z scores, computed as , on and of population 1, and we regress the squared Z scores on the single-population LD scores and of population 2. The slopes of the two regressions are the estimates of per-SNP heritabilities (, ) and (, ), and the intercepts are the estimates of and .
Similarly, we regress the products of Z scores from two populations on and . The slopes of this regression are the estimates of the local per-SNP co-heritability and the background per-SNP co-heritability , and the intercept is the estimate of . We followed LDSC and estimated the regression coefficients by using weighted least-squares estimator.9,36 To reduce the influence of outliers, we used a two-step estimator similar to LDSC. In the first step, we estimated the intercepts by excluding SNPs with . In the second step, we estimated the heritabilities or co-heritatilities with all SNPs by fixing the intercepts to the value estimated in the first step.
Plugging in the estimated parameters , we can construct the estimates of and as
| (Equation 19) |
Because these estimates are derived with the method of moments (MoM) based on Equation 4, we do not require the normality of . With moment conditions in Equation 4, the MoM estimators are unbiased and robust, which allow us to produce well-calibrated statistics (see supplemental methods, section 3.7). We then obtain the LOG-TRAM estimates from Equation 16 and the corresponding variance from Equation 17. The standard errors are computed by and . With the above LOG-TRAM outputs, we can detect non-zero SNP effect sizes ( and ) based on the Wald test.
Identification of variants with ancestry-specific effects by LOG-TRAM
Not only can LOG-TRAM provide inference on the SNP effect sizes from the two populations, but it also offers a statistical test to identify variants with ancestry-specific effects. Specifically, we denote the difference of the SNP effects size as , where and are the underlying true marginal effect sizes defined in Equation 7. The estimate of can be obtained from the LOG-TRAM output:
| (Equation 20) |
The variance of is given as
| (Equation 21) |
Here, the covariance term is derived as
| (Equation 22) |
where is defined as . With Equation 20 and Equation 21, we can obtain the LOG-TRAM-based difference test statistic and apply the Wald test to test hypothesis .
Recall that the traditional methods simply obtain the estimate of and its variance based on the original GWAS estimate: , , and then derive the GWAS-based test statistic . As LOG-TRAM can borrow information across populations and account for confounding bias, it can substantially improve the statistical power of identifying variants with ancestry-specific effects.
Compared methods
We compared the performance of LOG-TRAM with several commonly used GWAS meta-analysis methods, including fixed-effect IVW meta-analysis (e.g., FE37), random-effects model (e.g., RE218,19), MANTRA,20 MTAG,21 and MAMA.22 We are aware that some of these methods are not specifically designed for trans-ancestry association mapping (TRAM), such as FE and RE2. But it can be very helpful to evaluate statistical methods for TRAM, and their performance can serve as a reference. Specifically, the fixed-effect (FE) method37 is developed to perform meta-analysis of homogeneous GWASs (e.g., meta-analysis of two different cohort studies of the same phenotype), where a common effect size is assumed across studies. The random-effect (RE) models16 relax the fixed-effect model assumption and allow the heterogeneity of effect sizes. A popular RE method in GWASs is RE2,18 which can improve the power of standard RE models by assuming no heterogeneity under the null hypothesis. RE2C19 is further developed to improve RE2 by allowing correlated statistics. MANTRA,20 a method specifically designed for TRAM, assumes that the effect sizes should be similar for similar genetic ancestries and be different for distal ancestries. MANTRA uses a Bayesian partition model to assign populations into ethnic clusters, where distance between a population and the cluster center is measured by the F-statistics .38 MTAG21 is a recently developed method to analyze multiple GWASs from a single ancestry. Although MTAG is not designed for TRAM, it is extended as a new method named MAMA22 for TRAM, where the global genetic correlation between the same traits is the key to borrow information across ancestries. Indeed, LOG-TRAM can also be viewed as an extension of MTAG. However, the way we extend MTAG is different from that of MAMA. As motivated by recently increasing evidence26, 27, 28, 29, 30 that the local genetic architecture can be quite different from the global genetic architecture, we developed LOG-TRAM for TRAM by leveraging the local genetic architecture. Through comprehensive simulation studies and real data analysis, we show that LOG-TRAM has advantages when the local genetic architecture differs from the global genetic architecture, and it also has comparable performance when the local genetic architecture is consistent with the global genetic architecture.
Results
Simulation study
We performed simulations to compare LOG-TRAM with several existing methods, including FE, RE2, MANTRA, MTAG, and MAMA. To fairly compare different methods, we organized our simulation studies into two categories. (1) The local genetic architecture differs from the global genetic architecture. In this setting, LOG-TRAM can outperform existing statistical methods. (2) The local genetic architecture is consistent with the global genetic architecture. In this setting, LOG-TRAM is comparable to the methods that explore the global genetic correlation, such as MTAG and MAMA.
The local genetic architecture differs from the global genetic architecture
We first conducted the simulations to evaluate the type I error rate. We considered 18K samples from the EAS cohort23 and 100K samples from UKBB as the target and auxiliary populations, respectively. We used their genotype matrices to mimic the different LD patterns and allele frequencies between populations. Precisely 17,248 HapMap3-matched SNPs from chromosome 20 were used in our simulation study. We partitioned chromosome 20 into two segments at base pair position 3,119,133 (GRCh37). One segment taking up about 95% of chromosome 20 was considered as the background region . We generated a polygenic scenario by setting 10% of SNPs with shared non-zero effects and varied the trans-ancestry genetic correlation (denoted as ) among . The shared effects were simulated from the bivariate normal distribution for , where means that the total heritability contributed by chromosome 20 is around 0.01, and is the number of SNPs in the background region . Another segment taking up 5% of chromosome 20 was treated as the local region . We then used the SNPs in to assess the type I error rate of compared methods. Specifically, we set for for EAS while simulating the true effects of EUR by a normal distribution , where , is number of SNPs in the local region , and 0.1 represents that 10% of SNPs jointly contribute to heritability, , with per-SNP heritability . Under this setting, 10% of SNPs in region have non-zero effects in population 2, while all SNPs have zero effects in population 1. The parameter allows us to vary the enrichment fold of local heritability at 1, 3, 5, and 10, such that the local genetic architecture diversely differs from the global one. Given the effect sizes and genotype matrices, quantitative phenotypes in both populations were simulated with Equation 3. After that, we marginally regressed the simulated phenotypes on each SNP to obtain the Z scores of the two datasets. Finally, after performing meta-analysis, we reported the fraction of SNPs with a p value less than 0.05 in the local region of EAS as the type I error rate.
As expected, Figure 2A shows that only the single-ancestry-based GWAS and LOG-TRAM have well-controlled type I error rates regardless of the enrichment fold of local heritability for region in EUR. In contrast, the type I error rates of other meta-analysis methods with the homogeneous assumption (e.g., FE, RE2, MANTRA, MTAG, MAMA) increase as the enrichment fold increases from 1 to 10. More specifically, the FE and RE2 approaches performed worst in most cases. When the background trans-ancestry genetic correlation is non-zero, MANTRA, MTAG, and MAMA suffered from severely inflated type I error rates, as they misused or overused information from large-scale EUR GWAS summary statistics. We also examined whether the performance of LOG-TRAM could be sensitive to the size of window (or local genomic region). Specifically, we considered two window sizes, 1 M and 2 M base pairs, to partition the whole chromosome into multiple non-overlapping local regions. We observed that the type I error rates of LOG-TRAM were well controlled despite the different window sizes. Therefore, we set a window size of 1 M base pairs as the default setting. In summary, LOG-TRAM has a satisfactory control of the type I error rate and its performance is insensitive to the size of a local region. Because FE and RE2 showed severe inflation of the type I error rate, we did not include them in the power comparison.
Figure 2.
Comparisons among LOG-TRAM, MAMA, MTAG, MANTRA, RE2, FE, and single-ancestry GWASs in simulation studies across different genetic architectures
(A) Average type I error rates assessed under different settings of background trans-ancestry genetic correlations and the fold enrichment of EUR local heritability. Error bars represent the standard errors of type I error rates evaluated on 50 replications.
(B) Average statistical power in multiple simulation scenarios with different combinations of background trans-ancestry genetic correlations and local trans-ancestry genetic correlations . Results are also summarized from 50 replications. More simulation results for different settings of total heritability and proportion of causal SNPs are provided in Figures S1–S8.
(C) Average type I error rates assessed when SNPs in the same local region of both populations were simulated as null SNPs.
(D) Average statistical power was evaluated in multiple simulation scenarios where background trans-ancestry genetic correlations were the same with the local trans-ancestry genetic correlations . Error bars represent the standard errors of power evaluated on 50 replications.
Next, we evaluated the power of compared methods in the cross-population setting. Different from the null SNP setting in the local region of EAS for the evaluation of type I error rates, we generated the SNP effects of the two populations in local region by a bivariate normal distribution for , where , is the local trans-ancestry genetic correlation of region , and 0.1 represents that 10% of SNPs with non-zero effects jointly contribute to the local heritability. We considered a wide spectrum of local correlations to mimic the heterogeneous genetic similarity between populations in different gnomic regions: . We anticipated that LOG-TRAM would achieve substantial power gain in identifying SNPs with non-zero effects when the local genetic correlation was strong and the global genetic correlation was weak. As we described in the above, we simulated the quantitative phenotypes and obtained the Z scores of the two datasets. Finally, after performing meta-analysis, we reported the fraction of non-zero SNPs in the EAS local region with a p value less than 0.05 as the power. As shown in Figure 2B, LOG-TRAM achieved the best performance in all settings. When the global/background trans-ancestry genetic correlation is 0, LOG-TRAM was still able to increase the power by utilizing the information from EUR datasets through non-zero local genetic correlation . By contrast, MAMA and MTAG reduced to the standard single-ancestry GWAS because no information could be borrowed from the large-scale EUR datasets. When the global genetic correlation became close to , MAMA and MTAG achieved comparable performance with LOG-TRAM.
The local genetic architecture is consistent with the global genetic architecture
Regarding the simulation analysis when local genetic architecture is consistent with the global genetic architecture, we randomly selected 10% of SNPs across the whole chromosome as non-zero effect SNPs. Then we simulated their effect sizes from the bivariate normal distribution for , where we set meaning that the total heritability contributed by chromosome 20 is around 0.01, and is the trans-ancestry genetic correlation varied at , is the number of SNPs in the chromosome, and means that 10% of SNPs had non-zero effects in both populations. Clearly, under this simulation setting, the local genetic architecture is consistent with the global genetic architecture.
We first used SNPs in to evaluate the power of compared methods in the cross-population setting. As we described above, we reported the fraction of non-zero SNPs in the EAS local region with a p value less than 0.05 as the power. In this setting, the stronger assumptions made by FE and RE2 were satisfied and thus they showed higher statistical power than other methods. As shown in Figure 2D, LOG-TRAM achieved comparable performance with meta-analysis methods with the homogeneous assumption (e.g., MTAG and MAMA).
Next, we still used SNPs in to assess type I error rate of compared methods. To generate null SNPs, we set SNPs effect sizes for in both EAS and EUR. Similarly, we reported the fraction of SNPs with a p value less than 0.05 in the local region of EAS as the type I error rate. As shown in Figure 2C and Figure S9, all the compared methods have well-controlled type I error rates regardless of the trans-ancestry genetic correlation . Nevertheless, we observed that RE2, MANTRA, MTAG, and MAMA may produce slightly deflated p values when was as high as 0.6, suggesting that they may be slightly conservative in this setting. In summary, LOG-TRAM has a comparable performance in power and satisfactory control of the type I error rate when the local genetic architecture is consistent with the global genetic architecture.
Under the setting where the local genetic architecture is consistent with the global genetic architecture, we further compared the performance of GWAS-based and LOG-TRAM-based difference test for identification of ancestry-specific loci in simulation studies. As shown in Figure S10, the LOG-TRAM-based difference test has significant higher power than the GWAS-based test while achieving well-controlled type I error rates. In summary, simulation studies suggest that LOG-TRAM has a great advantage over other methods by leveraging the local genetic architecture.
Real data analysis
Application of LOG-TRAM for trans-ancestry association mapping in East Asian population
To evaluate the performance of LOG-TRAM in real applications, we applied LOG-TRAM to publicly available GWAS summary statistics of 29 phenotypes from EAS and EUR. The details of datasets are given in Table S1. For each GWAS dataset, we used SNPs that overlapped the HapMap 3 list and removed the SNPs with ambiguous alleles. For a pair of phenotypes from different populations, we aligned the sign of their effect sizes to the same allele. The LD scores were estimated with a sliding window of 1 M base pair in the genome with 417 EUR and 377 EAS individuals from the 1000 Genomes Project. Figure 3A shows a summary of results for all analyzed phenotypes in the EAS population. Among all 29 traits, we observed that LOG-TRAM consistently identified more independent loci than the standard single-ancestry GWASs. In summary, LOG-TRAM identified 1,954 lead SNPs in EAS, among which 842 were not reported by the standard GWASs of BBJ and thus considered as novel loci. Besides, for each trait, we compared the original GWAS sample size with the LOG-TRAM effective sample size estimated by the mean statistics. Specifically, we assessed how large of a sample size was needed to attain an equivalent increase in the mean statistics of LOG-TRAM. Overall, we observed that the original GWAS sample size had to be increased in the range 1% (height, the small increase is mainly due to the unadjusted confounding bias of height GWASs in both populations) to 186% (fasting insulin, FI) to achieve an equivalent power gained by LOG-TRAM (Figure 3B and Table S2). We note that the significant power gain in EAS should be attributed to two indispensable key points: (1) the large sample size of EUR GWAS summary statistics (range from 89,858 to 560,658, see Table S2 for more details); (2) the non-zero local genetic covariance between the two populations (Figure 4E). As an example, compared to the 22 independent loci identified in the original GWAS of systolic blood pressure (SBP) from BBJ, LOG-TRAM achieved a substantially higher power for EAS associations by identifying more significant loci. Equivalently, LOG-TRAM increased the sample size of SBP from 136,597 in the original BBJ GWAS to 231,836, indicating that LOG-TRAM can borrow information from UKBB to perform association analysis in EAS. It is also worth noting that LOG-TRAM is computationally efficient. It only took 8 min on average to complete the analysis of the whole genome. The timing was evaluated by the Linux computing server with 20 CPU cores of Intel(R) Xeon(R) Gold 6230N CPU at 2.30 GHz processor, 1 TB of memory, and a 22 TB solid-state disk.
Figure 3.
Summary of analysis results obtained by applying LOG-TRAM to combine the EAS and EUR GWAS summary statistics of 29 phenotypes
(A) LOG-TRAM identified independent novel lead SNPs for the 29 phenotypes in EAS compared to the original GWASs (Table S1).
(B) The effective sample size of association statistics output by LOG-TRAM was computed as , where M is the number of SNPs, is the mean statistics of LOG-TRAM output, c is the LDSC intercept of LOG-TRAM summary statistics, is the heritability of the target trait, and is the mean LD score.
(C) The estimated LDSC intercepts and standard errors (error bars) obtained by LOG-TRAM and alternative approaches, including the original GWAS, MAMA, MTAG, and RE2.
Figure 4.
Trans-ancestry association mapping of BMI
(A) Manhattan plot of the BMI GWAS for the BBJ males, where 24 independent lead SNPs were identified.
(B) Manhattan plot of the LOG-TRAM results for the BBJ male. LOG-TRAM identified 77 independent lead SNPs, of which 53 were not identified in the original BBJ male GWAS. Lead SNPs identified by both LOG-TRAM and GWAS are marked by a cross. Novel lead SNPs are marked by a triangle.
(C) Comparison of the p value output by LOG-TRAM and original BMI GWAS p values of the BBJ male.
(D) Local per-SNP heritability estimated by LOG-TRAM on the basis of BBJ (male) and UKBB.
(E) Local per-SNP co-heritability between BBJ male and UKBB.
(F) Comparison of lead SNPs effect sizes output by LOG-TRAM and effect sizes of GWAS in the replication dataset (the BBJ female).
(G) The QQ plot compares the p values in the replication GWAS data: (1) LOG-TRAM lead SNPs, (2) LOG-TRAM novel lead SNPs, (3) lead SNPs identified from UKBB, and (4) randomly selected SNPs from the replication GWAS. Clearly, SNPs reported by LOG-TRAM are strongly supported in the replication study.
(H) Tissue/cell-type SEG annotation analysis for LOG-TRAM results of BMI in EAS. The p in the label of the y axis represents p values of one-sided test of stratified LDSC regression coefficients. Each circle represents a tissue or cell type from the Franke lab dataset; large circles pass the Bonferroni correction: p value . Dashed line represents the significance threshold: .
Next, we quantified the magnitude of confounding bias in the resulting meta-analysis association statistics by using the LDSC intercept.9 Under the widely accepted LDSC assumption, the LDSC intercept should be one in the absence of confounding bias. As shown in Figure 3C, the LDSC intercept of the standard EAS height GWAS was estimated to be 1.22 (SE = 0.02), suggesting that the standard GWAS suffered from confounding bias. Through the special design of correction for confounding bias, the LDSC intercept of the LOG-TRAM result was reduced to 1.05 (SE = 0.02). By contrast, the LDSC intercept of the MTAG results was smaller than one, suggesting that MTAG over-corrected the confounding bias. The LDSC intercept of the MAMA results was 1.15 (SE = 0.03), indicating that MAMA was unable to fully correct the confounding biases. RE2 performed the worst with the LDSC intercept increased to 1.92 (SE = 0.05). Regarding the association result of EUR (right panel of Figure 3C), it has been well-known that EUR height GWAS from UKBB heavily suffers from uncorrected population structure with the LDSC intercept as high as 1.78 (SE = 0.05). After correction by LOG-TRAM, the LDSC intercept was decreased to 1.09 (SE = 0.04). By comparison, the LDSC intercepts of MTAG, MAMA, and RE2 were estimated to be 1.22 (SE = 0.04), 1.25 (SE = 0.03), and 1.92 (SE = 0.05), respectively, indicating that the results produced by these methods still suffered from confounding biases. Similarly for other traits, such as body mass index (BMI), mean corpuscular hemoglobin (MCH), SBP, and atrial fibrillation (AF), the LDSC intercepts of the LOG-TRAM results were all close to one. The above evidence clearly indicates that LOG-TRAM can effectively account for the confounding bias (see Figures S11, S12, and S13 and Tables S3 and S4 for more examples).
As a concrete example of TRAM, we applied LOG-TRAM to the GWASs of BMI, where the GWAS summary statistics of BBJ (male) and UKBB were used as the inputs. Then we obtained the LOG-TRAM results in EAS and EUR. We used the GWAS of BBJ females as a replication dataset to validate the LOG-TRAM output in EAS. Compared with the original GWAS of BBJ male (Figure 4A), LOG-TRAM identified 53 novel lead SNPs (Figure 4B). We first evaluated the credibility of signals discovered by LOG-TRAM. We compared the effect sizes and p values of the EAS lead SNPs identified by LOG-TRAM with those obtained from the independent replication GWAS (BBJ females). A higher consistency between the effect sizes of discovery and replication cohort would suggest higher credibility of the results. Similarly, a smaller p value in the replication GWAS indicates that the lead SNP is unlikely to be significant by chance. Specifically, we regressed the effect sizes of the lead SNPs obtained from LOG-TRAM output to those observed effect sizes in the replication GWAS. The regression slope can be viewed as a quantitative indicator of the replicability. Ideally, the slope should be close to 1.0 but it can deviate from 1.0 as a result of sex difference in BMI.39,40 We observed a regression slope of 0.705 (SE = 0.04) for the LOG-TRAM results (Figure 4F). As a comparison, the regression slopes were much smaller for both MTAG (0.615, SE = 0.03, Figure S14C) and MAMA (0.630, SE = 0.03, Figure S14B). The significance of lead SNPs in the replication GWAS can also serve as an indicator of replicability. As shown in Figures 4G and S14D, the lead SNPs identified by LOG-TRAM showed more significant p values compared to those SNPs identified from the standard GWAS in European ancestry or the other meta-analysis methods. These results suggest a better replicability of LOG-TRAM.
Next, we focused on the interpretation of novel lead SNPs identified by LOG-TRAM. Among the 53 novel lead SNPs identified by LOG-TRAM, we observed the most significant novel lead SNP (Figure 4C), rs7217403. The local region harboring this variant has 1.1/0.17 = 6.47-fold and 3.33/0.26 = 12.8-fold enrichment of local heritability in EAS and EUR, respectively (Figure 4E). Furthermore, we also found that the local co-heritability of this region was higher than the global co-heritability (1.82 compared to 0.14, Figure 4F), suggesting that LOG-TRAM can effectively leverage the local genetic architecture to boost the power of association mapping. In replication analysis (Figure 4F), the highlighted SNP, rs7217403, showed a notable consistent effect with LOG-TRAM results. Although rs7217403 is located in an intergenic region, it maps near MAP2K3, which has been reported to be significantly associated with BMI across diverse populations, including American Indians,41 Europeans,42,43 and East Asians.44 The expression level of MAP2K3 was positively correlated with BMI in adipose tissue, and in vitro studies suggested that MAP2K3 was activated during adipogenesis.41 Given these statistical and biological evidences, MAP2K3 appears to be a reproducible obesity locus, but knowledge for the molecular mechanisms underlying the association is still lacking. The intergenic variant, rs7217403, identified by LOG-TRAM in EAS may shed light on the genetic etiology of BMI and uncover biologically meaningful variation.
To cross-validate our LOG-TRAM result and reduce the influence of sex difference in our replication analysis, we used UKBB and the BBJ females as the discovery cohort and then replicated the LOG-TRAM output using the BBJ males. We observed that the LOG-TRAM results were highly consistent (Figures S15, S16, and S17). Compared with the original BMI GWAS of BBJ females, LOG-TRAM identified 63 novel lead SNPs (Figure S15B). Similarly, we observed the same most significant novel lead SNP (Figure S15C), rs7217403. The local region harboring this variant showed a consistent local genetic architecture with 1.81/0.19 = 9.53-fold enrichment for local heritability in EAS and 2.33/0.16 = 14.5-fold enrichment for local co-heritability (Figure S15E).
Beside the most significant novel lead SNP (rs7217403), other novel signals and their nearest genes discovered by LOG-TRAM also show potential biological functions related to BMI. As shown in Figure S18C, the local region harboring the novel lead SNP, rs987237, has 0.62/0.17 = 3.65-fold and 3.66/0.26 = 14.1-fold enrichment of local heritability in EAS and EUR, respectively. Meanwhile, the local co-heritability of this region was significantly higher than the global co-heritability (1.49 compared to 0.14). The local heritability/co-heritability estimations are consistent in the cross-validation analysis (Figure S18D). The annotated gene, TFAP2B, functions as both a transcriptional activator and repressor and regulates downstream genes involved in important biological functions, including face, body wall, and limb development.45,46 Besides, studies have identified substantial associations between TFAP2B and BMI,47, 48, 49 suggesting that TFAP2B plays a critical role in controlling the body formation. The other locus, rs879620, also shows strong (Figure S19C) and robust (Figure S19D) local heritability and co-heritability enrichments in EAS and EUR. SNP rs879620 is located in the UTR3 region of ADCY9, which is responsible for coding a membrane-bound enzyme that catalyzes the formation of ubiquitous second messenger cyclic adenosine monophosphate (cAMP) from adenosine triphosphate (ATP). Mutations in ADCY9 have been reported to be associated with cardiovascular disease,50 dyslipidemia,51 and obesity in Europeans8,52 and East Asians.49,53 The connection between the molecular mechanism of these associated SNPs and the etiology of complex diseases needs to be further investigated.
Finally, we applied stratified LD score regression54 to explore the tissues that are functionally relevant with BMI in EAS based on the BBJ GWAS summary statistics and the LOG-TRAM output. We used publicly available EAS samples of 1000 Genomes as LD reference and 108 Franke tissues/cell-type specifically expressed genes (SEGs) as annotations to construct LD scores and evaluated tissue/cell specificity. The Franke lab annotation55 is the largest publicly available tissue/cell-type SEG dataset, comprising of 37,427 human samples. As a comprehensive SEG annotation dataset, it covers a wide spectrum of human tissues/cell types, including adipose, blood, immune, central nervous system (CNS), etc. Integrative analysis of the Franke lab annotations and GWAS summary statistics may offer novel biological insights to elucidate the etiologies of human complex traits/diseases. As shown in Figure S20, none of the tissues/cell types in the annotations were observed to be significantly associated with BMI when using the original BBJ GWAS. In contrast, a significant enrichment was detected in neural stem cells after applying LOG-TRAM (Figure 4H). Besides, consistent with previous tissue-specific enrichment analysis of BMI in European population,54,56,57 LOG-TRAM results show robust and strong signals across a wide range of brain tissues/cell types, suggesting these brain tissues/cell types may also play a more important functional role for BMI in EAS.
Application of LOG-TRAM for trans-ancestry association mapping in African population
Besides the integrative analysis of GWASs from EAS and EUR, we further show that LOG-TRAM can be applied to integrate the GWASs from other populations. As a demonstration, we applied LOG-TRAM to improve association mapping in the African population (AFR). Specifically, we analyzed 17 phenotypes by combining AFR GWAS with EUR/EAS GWAS, including 11 hematological phenotypes (e.g., lymphocyte count [Lym] and eosinophil count [Eosino]),58 three glycemic phenotypes (e.g., fasting glucose [FG] and fasting insulin [FI]),59 and three lipid phenotypes (high-density lipoprotein [HDL], low-density lipoprotein [LDL], and total cholesterol [TC]).60 The details of datasets are given in Table S5. For each GWAS dataset, we used SNPs that overlapped with the HapMap 3 list and removed the SNPs with ambiguous alleles. For a pair of phenotypes from different populations, we aligned the sign of their effect sizes to the same allele. We estimated the LD scores of the AFR population with 505 African samples from the 1000 Genomes Project and used the same definition of 1 M base pair widows as local regions. Similarly, we computed the effective sample sizes of the 17 phenotypes in AFR after applying LOG-TRAM to combine the GWAS from another population. As expected, the estimated effective sample sizes of LOG-TRAM outputs were larger than the sample sizes of original AFR GWASs when integrating with either EUR (Figure 5A) or EAS (Figure 5B). In particular, when the EUR GWASs were combined, LOG-TRAM identified more independent loci in all 17 phenotypes (Figure 5C). We also show the QQ plots of Lym (Figure 5D, left), Eosino (Figure 5D, middle), and FG (Figure 5D, right) as concrete examples. Compared to the integrating AFR with EAS, LOG-TRAM gained more power when combining AFR with EUR because of the larger sample size in EUR.
Figure 5.
Summary of analysis results obtained by applying LOG-TRAM to combine the AFR and EUR/EAS GWAS summary statistics of 17 overlapped phenotypes
(A and B) The effective sample size of association statistics output by LOG-TRAM when combining AFR with EUR (A) or EAS (B) GWAS summary statistics of 17 phenotypes (Table S5).
(C) Comparison of the number of independent lead SNPs identified by LOG-TRAM and original GWASs for the 17 phenotypes in AFR.
(D) QQ plots of single-ancestry GWAS and LOG-TRAM results for lymphocyte count (left), eosinophil count (middle), and fasting glucose (right).
Identification of ancestry-specific loci
Taking the GWAS summary statistics from BBJ and UKBB as input, we applied the LOG-TRAM-based difference test to a number of complex traits, including mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV), HDL, LDL, type 2 diabetes (T2D), and BMI. The test statistics and for overlapping SNPs across two populations were computed with Equation 20 and Equation 21. Here, we take MCHC as an example. In addition to the 23 ancestry-specific lead SNPs (p value ) identified by the standard GWAS-based difference test, the LOG-TRAM-based difference test identified six novel lead SNPs with significantly different effect sizes between EAS and EUR (Figures 6A and 6B). Although those novel SNPs (red dot in Figure 6C) show different effect sizes between EAS and EUR, they were not identified by the GWAS-based difference test because of the limited sample size. In contrast, the LOG-TRAM-based difference test successfully captured those signals, as LOG-TRAM can improve the power of association statistics with a larger effective sample size. From a biological perspective, these SNPs identified by the LOG-TRAM-based difference test were enriched in functional categories related to population differences, suggesting the functional importance of ancestry-specific SNPs. Notably, as shown in Figure 6D, SNPs in the top quantile of background selection statistic61 have more significant p values, while SNPs in the top quantile of recombination rate62 have less significant p values than the average. This phenomenon is consistent with the observation in Shi et al.,30 where the background selection statistic is positively correlated to the depletion of trans-ancestry genetic correlation while recombination rate has a reverse pattern. The ancestry-specific loci identification results of five other traits are provided in Figures S21–S25.
Figure 6.
Applications of LOG-TRAM in ancestry-specific loci identification and PRS construction
(A) Manhattan plot of GWAS-based difference test results for MCHC. Lead SNPs are marked by a cross.
(B) Manhattan plot of LOG-TRAM-based difference test results for MCHC. The dashed line marks the threshold for genome-wide significance (). Lead SNP identified by both the LOG-TRAM and GWAS-based difference tests are marked by crosses. Novel lead SNPs are marked by triangles.
(C) Comparison of MCHC GWAS effect sizes for lead SNPs between EAS and EUR. Red points denote novel loci identified by the LOG-TRAM-based difference test; black points denote lead loci identified in both the LOG-TRAM and GWAS-based difference tests. Vertical and horizontal error bars represent standard errors of GWAS effect sizes in EAS and EUR, respectively.
(D) The QQ-plots of p values for all SNPs and SNPs in the top quantile of four continuous-valued annotations.
(E) Predictive for EAS height PRS models based on GWAS and LOG-TRAM association statistics.
The LOG-TRAM result for construction of polygenic risk scores
To demonstrate the utility of LOG-TRAM in the construction of polygenic risk scores (PRSs) in under-represented populations, we compared the predictive power of PRS constructed by using the standard GWAS summary statistics and the LOG-TRAM summary statistics. We considered construction of PRS for human height as an example. We first applied the most commonly used PRS method, LDpred,63 to build an EAS height PRS model (denoted as GWAS-based PRS) by using the GWAS summary statistics of height from BBJ (n = 159,095). Using the GWAS summary statistics of height from BBJ and UKBB, we applied LOG-TRAM to generate association statistics for EAS population. Then we used the summary statistics (output of LOG-TRAM) to construct PRS, which is referred to as LOG-TRAM-based PRS. After that, we evaluated the prediction performance in an independent Chinese testing dataset.23 In detail, we measured the prediction accuracy of each PRS model by using the predictive , defined as the square of correlation of predicted PRSs and true residual phenotypes after regressing out covariates (e.g., gender, age, genomics PCs).23,64
As shown in Figure 6E, the predictive accuracy obtained by GWAS-based PRS is . In contrast, PRS constructed from the LOG-TRAM association statistics achieved a (0.169–0.144)/0.144) 17% accuracy gain, indicating that the LOG-TRAM association statistics of EAS height successfully borrowed information from the large-scale UKBB datasets. The improvement of prediction accuracy was quite stable when different PRS approaches were applied. When we applied a recently developed PRS method named “dbslmm,”65 we obtained predictive with the GWAS summary statistics. With the same PRS method, we achieved predictive with the LOG-TRAM association statistics, which was a (0.185–0.149)/0.149 24% improvement. In summary, the above results suggest that the LOG-TRAM output can lead to a significant improvement in construction of accurate PRS for under-represented populations.
Discussion
In this paper, we have introduced a trans-ancestry meta-analysis method, LOG-TRAM, aiming to improve the statistical power of GWASs in non-Europeans by leveraging locally shared genetic architectures with biobank-scale auxiliary datasets. Through comprehensive simulations, we showed that our method has greater statistical power while controlling the type I error rate compared to existing approaches, and our method is robust across various genetic architecture settings. We applied LOG-TRAM to GWASs of 29 complex traits and diseases from EAS and EUR, achieving substantial gains in power and effective correction of confounding biases. We found that LOG-TRAM results were reproducible in independent studies. We showed the ability of LOG-TRAM for integrating different populations by applying it to combine AFR GWAS with EAS/EUR GWASs. Finally, we demonstrated that the LOG-TRAM results can be further used for identification of ancestry-specific loci and construction of PRS in under-represented populations.
As we mentioned, accumulating evidence reveal that the shared genetic basis between phenotypes/populations varies across genomic regions.26,27,29,30 For example, the effect sizes of SNPs can be phenotype-/population-specific, and the genetic architecture of multiple ancestries can be locally different. Therefore, assuming a constant variance-covariance matrix of effect sizes across the whole genome violates biological intuition in many circumstances. Stratified LDSC56 is a representative method to explore the local genetic architecture. It assumes that SNPs in different genomic regions (e.g., functional categories) contribute disproportionately to the heritability and estimates the per-SNP heritability in each region by regressing the association statistics to the LD score corresponding to each region. Recently, several other statistical methods, including ρ-HESS,27 SUPERGNOVA,28 and LOGODetect,31 have been developed to estimate local genetic correlation across traits in a single population. Their analysis results consistently suggest that the local genetic correlation can greatly differ from the global genetic correlation, offering new insights into complex human diseases and traits. However, the methods that explore the local genetic architecture have not accounted for heterogeneity across ancestries. A direct application of these methods for association mapping in the trans-ancestry setting is still problematic. To develop the LOG-TRAM method, we considered a sliding window (e.g., 1 M base pair segment) to decompose the whole genome into local region and background region , which allows us to model the regional genetic architectures differently from the global pattern. By successfully leveraging the local genetic architecture and accounting for confounding bias, not only can LOG-TRAM estimate local heritability but it also can be applied for association mapping in the trans-ancestry setting.
We have systematically compared the performance of LOG-TRAM with several commonly used methods for meta-analysis of GWAS data, including FE, RE2, MANTRA, MTAG, and MAMA, and demonstrated the benefit of leveraging local genetic architecture in TRAM. Besides these approaches, we are also aware of Tractor,66 which is an individual-level method for TRAM in admixed populations. It first infers the local ancestry composition of admixed individuals in their genomes and then conducts association mapping. While both Tractor and LOG-TRAM aim to boost the power of association mapping in the multi-ancestry context and estimate ancestry-specific effects, they are different in the following aspects. First, they are designed to address different difficulties. Tractor is designed to account for the local ancestry of admixed individuals when conducting association mapping in admixed populations (e.g., African Americans and Latino/Hispanics). LOG-TRAM aims to improve the power of association mapping in an under-represented population by combining well-powered GWASs from European populations. Second, they take different levels of GWAS data as input. Tractor takes individual-level data with admixed individuals as input, while LOG-TRAM uses summary-level GWAS data from different populations as input. Third, they utilize the local genetic information in different ways. Tractor infers the local ancestry of admixed individuals while LOG-TRAM estimates the local genetic covariance across different populations. Therefore, Tractor and LOG-TRAM are designed for different purposes.
Due to the requirement of characterizing the local genetic architectures, LOG-TRAM may have its own limitations. First, the value of the local covariance matrix can depend on the definition of a “local” region. By expectation, the region defined by a smaller segment in the genome can capture the genetic architecture with higher resolution. However, it would be harder to accurately estimate the local variance and covariance of effect sizes with a small number of SNPs in a smaller region. In practice, it requires a trade-off between fine resolution and accurate parameter estimation. In genomic data analysis, 1 M base pair regions are often considered as local regions, e.g., Loh et al.26 Therefore, we used a sliding window of 1 M base pair regions in our main analysis and demonstrated the robustness of this choice by comparing the results with those obtained with 2 M base pair regions. Second, the estimation errors of the local genetic covariance matrix can be larger than those of global ones. We conducted simulations to confirm that the variance of the local genetic covariance matrix becomes quite large when the sample size of discovery GWAS is less than 10,000, which leads to a reduction of power in association mapping (see supplemental methods, section 3.8). Therefore, for stable estimation of parameters, it is recommended to apply LOG-TRAM to GWASs with at least 10,000 samples. Nowadays, this sample size requirement is often satisfied in real data analysis (see Table S2 for more examples).
Although we have mainly focused on the local regions partitioned by the sliding window approach with a fixed window size in this study, it is worth mentioning that LOG-TRAM can also be applied to more generalized “local regions” defined by functional annotations or tissue/cell type SEG annotations. Indeed, SNPs with biological functions (e.g., gene regulatory elements,67, 68, 69, 70 epigenomic regulations,56,71,72 and tissue-specific expressed genes30,73) have been well known for their enrichment in the heritability of complex traits, which re-emphasizes the widespread of heterogeneous genetic architectures across the genome. Consequently, these functional annotations are widely used in genetic studies to increase statistical power, including GWASs,74, 75, 76 pleiotropy,77 fine-mapping,78,79 and polygenic risk scores.80,81 Very recently, a study82 suggests that functional annotations have great potential in improving the portability of trans-ancestry polygenic risk scores, indicating the substantial share of biological mechanisms across populations. Therefore, leveraging the functional/SEG annotations can also increase the power of trans-ancestry association mapping for under-represented populations. To see this, we considered SNPs annotated by 20 binary functional annotations from Baseline-LD-X model30 as local regions and estimated their local genetic architectures for LOG-TRAM. Intuitively, each annotation can be analogized to one 1 M base pair segment defined in the previous section. We applied the functional-informed LOG-TRAM to the GWASs of BMI and T2D from BBJ and UKBB. Compared to the original EAS GWASs, we identified 85% and 16% more independently significant loci for BMI and T2D (Figures S28 and S31), respectively. In addition, we further used the 53 SEG annotations30 as local regions and applied the SEG-informed LOG-TRAM to the same traits. As shown in Figures S28 and S31, SEG-informed LOG-TRAM achieved comparable power gains in EAS and identified 114% and 25% more independently significant loci for BMI and T2D, respectively.
Our LOG-TRAM approach needs more investigation in the following directions. First, it has been shown that pleiotropic effects are widespread in the genome.83 In a systematic analysis of 4,155 publicly available GWASs, 90% trait-associated loci affect multiple phenotypes simultaneously.2 Hence, joint modeling of multiple GWAS traits across populations may further boost the statistical power of trans-ancestry association mapping. Second, LOG-TRAM assumes that, for a given population, SNPs with lower allele frequencies tend to have larger effect sizes. More specifically, we considered the relationship of per-allele effect size v and allele frequency f as , where . This assumption has been shown to be a stable choice in simulation studies,84 and it was also adopted in the previous trans-ancestry analysis.85 Some recent studies on the estimation of α found that although estimated α are negative for most complex traits, they vary moderately across traits.86,87 Therefore, it would be more appropriate to obtain an estimate of α for each trait rather than fixing . Third, while LOG-TRAM can effectively integrate GWASs from different populations, its input GWAS data usually precludes admixed individuals from analysis. The exclusion of admixed individuals can reduce false positives at a cost of power in association mapping. Considering the increasingly mixed populations and enrichment of heritable diseases in admixed populations,66 it would be interesting to integrate admixed populations (e.g., African Americans) with large-scale GWASs conducted with a more homogeneous ancestry background (e.g., UKBB) to improve power. The nature of LOG-TRAM in characterizing local genetic architectures could offer a convenient formulation to take the local ancestries in admixed populations into account. As inspired by Tractor, we may first infer the ancestries of local regions in the genome for the individuals from an admixed GWAS data. With the inferred local ancestry, the local cross-population genetic correlation between the target ancestry and the ancestry of an auxiliary GWAS data can be estimated and leveraged to improve the power of association mapping. We will explore these potential improvements in the near future.
Acknowledgments
This work is supported in part by National Key R&D Program of China (2020YFA0713900), Hong Kong Research Grant Council [12301417, 16307818, 16301419, 16308120, 16307221], Hong Kong Innovation and Technology Fund [PRP/029/19FX], Hong Kong University of Science and Technology [startup grant R9405, Z0428 from the Big Data Institute], the Open Research Fund from Shenzhen Research Institute of Big Data [2019ORF01004], and Guangdong Provincial Key Laboratory of Big Data Computing, The Chinese University of Hong Kong, Shenzhen. The computational task for this work was partially performed with the X-GPU cluster supported by the RGC Collaborative Research Fund: C6021-19EF.
Declaration of interests
The authors declare no competing interests.
Published: June 16, 2022
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2022.05.013.
Contributor Information
Xiang Wan, Email: wanxiang@sribd.cn.
Can Yang, Email: macyang@ust.hk.
Web resources
LOG-TRAM, https://github.com/YangLabHKUST/LOG-TRAM
Supplemental information
Data and code availability
The publicly available GWAS summary statistics for meta-analysis were obtained from the links summarized in Tables S1 and S5. The UK Biobank data are from UK Biobank resource under application number 30186. The LOG-TRAM software and source codes in this study were publicly available in the GitHub repository of LOG-TRAM (https://github.com/YangLabHKUST/LOG-TRAM).
References
- 1.Klein R.J., Zeiss C., Chew E.Y., Tsai J.-Y., Sackler R.S., Haynes C., Henning A.K., SanGiovanni J.P., Mane S.M., Mayne S.T., et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Watanabe K., Stringer S., Frei O., Umićević Mirkov M., de Leeuw C., Polderman T.J.C., van der Sluis S., Andreassen O.A., Neale B.M., Posthuma D. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 2019;51:1339–1348. doi: 10.1038/s41588-019-0481-0. [DOI] [PubMed] [Google Scholar]
- 3.Tam V., Patel N., Turcotte M., Bossé Y., Paré G., Meyre D. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 2019;20:467–484. doi: 10.1038/s41576-019-0127-1. [DOI] [PubMed] [Google Scholar]
- 4.Mills M.C., Rahal C. The GWAS diversity monitor tracks diversity by disease in real time. Nat. Genet. 2020;52:242–243. doi: 10.1038/s41588-020-0580-y. [DOI] [PubMed] [Google Scholar]
- 5.Gurdasani D., Barroso I., Zeggini E., Sandhu M.S. Genomics of disease risk in globally diverse populations. Nat. Rev. Genet. 2019;20:520–535. doi: 10.1038/s41576-019-0144-0. [DOI] [PubMed] [Google Scholar]
- 6.Hindorff L.A., Bonham V.L., Brody L.C., Ginoza M.E.C., Hutter C.M., Manolio T.A., Green E.D. Prioritizing diversity in human genomics research. Nat. Rev. Genet. 2018;19:175–185. doi: 10.1038/nrg.2017.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Genomes Project Consortium. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wojcik G.L., Graff M., Nishimura K.K., Tao R., Haessler J., Gignoux C.R., Highland H.M., Patel Y.M., Sorokin E.P., Avery C.L., et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019;570:514–518. doi: 10.1038/s41586-019-1310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bulik-Sullivan B.K., Loh P.-R., Finucane H.K., Ripke S., Yang J., Patterson N., Daly M.J., Price A.L., Neale B.M. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
- 11.Yang J., Zaitlen N.A., Goddard M.E., Visscher P.M., Price A.L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 2014;46:100–106. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Abdellaoui A., Hugh-Jones D., Yengo L., Kemper K.E., Nivard M.G., Veul L., Holtz Y., Zietsch B.P., Frayling T.M., Wray N.R., et al. Genetic correlates of social stratification in Great Britain. Nat. Human Behav. 2019;3:1332–1342. doi: 10.1038/s41562-019-0757-5. [DOI] [PubMed] [Google Scholar]
- 13.Haworth S., Mitchell R., Corbin L., Wade K.H., Dudding T., Budu-Aggrey A., Carslake D., Hemani G., Paternoster L., Smith G.D., et al. Apparent latent structure within the UK Biobank sample has implications for epidemiological analysis. Nat. Commun. 2019;10:333–339. doi: 10.1038/s41467-018-08219-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Hu X., Jia Z., Lin Z., Wang Y., Peng H., Zhao H., Wan X., Yang C. Mendelian randomization for causal inference accounting for pleiotropy and sample structure using genome-wide summary statistics. bioRxiv. 2021 doi: 10.1101/2021.03.11.434915. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Li Y.R., Keating B.J. Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Med. 2014;6 doi: 10.1186/s13073-014-0091-5. 91–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.DerSimonian R., Laird N. Meta-analysis in clinical trials. Contr. Clin. Trials. 1986;7:177–188. doi: 10.1016/0197-2456(86)90046-2. [DOI] [PubMed] [Google Scholar]
- 17.Evangelou E., Ioannidis J.P.A. Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 2013;14:379–389. doi: 10.1038/nrg3472. [DOI] [PubMed] [Google Scholar]
- 18.Han B., Eskin E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 2011;88:586–598. doi: 10.1016/j.ajhg.2011.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lee C.H., Eskin E., Han B. Increasing the power of meta-analysis of genome-wide association studies to detect heterogeneous effects. Bioinformatics. 2017;33:i379–i388. doi: 10.1093/bioinformatics/btx242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Morris A.P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 2011;35:809–822. doi: 10.1002/gepi.20630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Turley P., Walters R.K., Maghzian O., Okbay A., James J., Lee M.A.F., Nguyen-Viet T.A., Wedow R., Zacher M., Furlotte N.A., et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 2018;50:229–237. doi: 10.1038/s41588-017-0009-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Turley P., Martin A.R., Goldman G., Li H., Kanai M., Walters R.K., Jala J.B., Lin K., Millwood I.Y., Carey C.E., et al. Multi-ancestry meta-analysis yields novel genetic discoveries and ancestry-specific associations. bioRxiv. 2021 doi: 10.1101/2021.04.23.441003. Preprint at. [DOI] [Google Scholar]
- 23.Cai M., Xiao J., Zhang S., Wan X., Zhao H., Chen G., Yang C. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am. J. Hum. Genet. 2021;108:632–655. doi: 10.1016/j.ajhg.2021.03.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Luo Y., Li X., Wang X., Gazal S., Mercader J.M., Neale B.M., Florez J.C., Auton A., Price A.L., Finucane H.K., et al. Estimating heritability and its enrichment in tissue-specific gene sets in admixed populations. Hum. Mol. Genet. 2021;30:1521–1534. doi: 10.1093/hmg/ddab130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Stephens M., Balding D.J. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 2009;10:681–690. doi: 10.1038/nrg2615. [DOI] [PubMed] [Google Scholar]
- 26.Loh P.-R., Bhatia G., Gusev A., Finucane H.K., Bulik-Sullivan B.K., Pollack S.J., de Candia T.R., Lee S.H., Wray N.R., Kendler K.S., et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 2015;47:1385–1392. doi: 10.1038/ng.3431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Shi H., Mancuso N., Spendlove S., Pasaniuc B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 2017;101:737–751. doi: 10.1016/j.ajhg.2017.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang Y., Lu Q., Ye Y., Huang K., Liu W., Wu Y., Zhong X., Li B., Yu Z., Travers B.G., et al. SUPERGNOVA: local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits. Genome Biol. 2021;22:262–330. doi: 10.1186/s13059-021-02478-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Werme J., van der Sluis S., Posthuma D., de Leeuw C.A. An integrated framework for local genetic correlation analysis. Nat. Genet. 2022;54:274–282. doi: 10.1038/s41588-022-01017-y. [DOI] [PubMed] [Google Scholar]
- 30.Shi H., Gazal S., Kanai M., Koch E.M., Schoech A.P., Siewert K.M., Kim S.S., Luo Y., Amariuta T., Huang H., et al. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat. Commun. 2021;12:1098–1115. doi: 10.1038/s41467-021-21286-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Guo H., Li J.J., Lu Q., Hou L. Detecting local genetic correlations with scan statistics. Nat. Commun. 2021;12:2033–2113. doi: 10.1038/s41467-021-22334-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.van Rheenen W., Peyrot W.J., Schork A.J., Lee S.H., Wray N.R. Genetic correlations of polygenic disease traits: from theory to practice. Nat. Rev. Genet. 2019;20:567–581. doi: 10.1038/s41576-019-0137-z. [DOI] [PubMed] [Google Scholar]
- 33.Loh P.-R., Tucker G., Bulik-Sullivan B.K., Vilhjalmsson B.J., Finucane H.K., Salem R.M., Chasman D.I., Ridker P.M., Neale B.M., Berger B., Patterson N., et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 2015;47:284–290. doi: 10.1038/ng.3190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Yang J., Lee S.H., Goddard M.E., Visscher P.M. Gcta: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 2011;88:76–82. doi: 10.1016/j.ajhg.2010.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wen X., Stephens M. Using linear predictors to impute allele frequencies from summary or pooled genotype data. Ann. Appl. Stat. 2010;4:1158. doi: 10.1214/10-aoas338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bulik-Sullivan B., Finucane H.K., Anttila V., Gusev A., Day F.R., Loh P.-R., Duncan L., Perry J.R.B., Patterson N., Robinson E.B., et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015;47:1236–1241. doi: 10.1038/ng.3406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Willer C.J., Li Y., Abecasis G.R. Metal: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Wright S. The interpretation of population structure by f-statistics with special regard to systems of mating. Evolution. 1965;19:395. doi: 10.2307/2406450. [DOI] [Google Scholar]
- 39.Xi B., Shen Y., Reilly K.H., Zhao X., Cheng H., Hou D., Wang X., Mi J. Sex-dependent associations of genetic variants identified by GWAS with indices of adiposity and obesity risk in a Chinese children population. Clin. Endocrinol. 2013;79:523–528. doi: 10.1111/cen.12091. [DOI] [PubMed] [Google Scholar]
- 40.Link J.C., Reue K. Genetic basis for sex differences in obesity and lipid metabolism. Annu. Rev. Nutr. 2017;37:225–245. doi: 10.1146/annurev-nutr-071816-064827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bian L., Traurig M., Hanson R.L., Marinelarena A., Kobes S., Kobes S., Muller Y.L., Muller Y.L., Malhotra A., Huang K., et al. Map2k3 is associated with body mass index in American Indians and Caucasians and may mediate hypothalamic inflammation. Hum. Mol. Genet. 2013;22:4438–4449. doi: 10.1093/hmg/ddt291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Winkler T.W., Justice A.E., Graff M., Barata L., Feitosa M.F., Chu S., Czajkowski J., Esko T., Fall T., Kilpeläinen T.O., et al. The influence of age and sex on genetic associations with adult body size and shape: a large-scale genome-wide interaction study. PLoS Genet. 2015;11 doi: 10.1371/journal.pgen.1005378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Locke A.E., Kahali B., Berndt S.I., Justice A.E., Pers T.H., Day F.R., Powell C., Vedantam S., Buchkovich M.L., Yang J., et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. doi: 10.1038/nature14177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Akiyama M., Okada Y., Kanai M., Takahashi A., Momozawa Y., Ikeda M., Iwata N., Ikegawa S., Hirata M., Matsuda K., et al. Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nat. Genet. 2017;49:1458–1467. doi: 10.1038/ng.3951. [DOI] [PubMed] [Google Scholar]
- 45.Satoda M., Zhao F., Diaz G.A., Burn J., Goodship J., Davidson H.R., Pierpont M.E.M., Gelb B.D. Mutations in TFAP2B cause Char syndrome, a familial form of patent ductus arteriosus. Nat. Genet. 2000;25:42–46. doi: 10.1038/75578. [DOI] [PubMed] [Google Scholar]
- 46.Zhao F., Weismann C.G., Satoda M., Pierpont M.E.M., Sweeney E., Thompson E.M., Gelb B.D. Novel TFAP2B mutations that cause Char syndrome provide a genotype-phenotype correlation. Am. J. Hum. Genet. 2001;69:695–703. doi: 10.1086/323410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Gong J., Nishimura K.K., Fernandez-Rhodes L., Haessler J., Bien S., Graff M., Lim U., Lu Y., Gross M., Fornage M., et al. Trans-ethnic analysis of metabochip data identifies two new loci associated with BMI. Int. J. Obes. 2018;42:384–390. doi: 10.1038/ijo.2017.304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wang H., Zhang F., Zeng J., Wu Y., Kemper K.E., Xue A., Zhang M., Joseph E Powell J.E., Goddard M.E., Wray N.R., et al. Genotype-by-environment interactions inferred from genetic effects on phenotypic variability in the UK Biobank. Sci. Adv. 2019;5:eaaw3538. doi: 10.1126/sciadv.aaw3538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Sakaue S., Kanai M., Tanigawa Y., Karjalainen J., Kurki M., Koshiba S., Narita A., Konuma T., Yamamoto K., Akiyama M., et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 2021;53:1415–1424. doi: 10.1038/s41588-021-00931-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Rautureau Y., Deschambault V., Higgins M.-È., Rivas D., Mecteau M., Geoffroy P., Miquel G., Uy K., Sanchez R., Lavoie V., et al. ADCY9 (adenylate cyclase type 9) inactivation protects from atherosclerosis only in the absence of CETP (cholesteryl ester transfer protein) Circulation. 2018;138:1677–1692. doi: 10.1161/circulationaha.117.031134. [DOI] [PubMed] [Google Scholar]
- 51.Tardif J.-C., Rhéaume E., Perreault L.-.P.L., Grégoire J.C., Feroz Zada Y., Asselin G., Provost S., Barhdadi A., Rhainds D., L’Allier P.L., Ibrahim R., et al. Pharmacogenomic determinants of the cardiovascular effects of dalcetrapib. Circ. Cardiovasc. Genet. 2015;8:372–382. doi: 10.1161/circgenetics.114.000663. [DOI] [PubMed] [Google Scholar]
- 52.Zhu Z., Guo Y., Shi H., Liu C.-L., Panganiban R.A., Chung W., O’Connor L.J., Himes B.E., Gazal S., Hasegawa K., et al. Shared genetic and experimental links between obesity-related traits and asthma subtypes in UK Biobank. J. Allergy Clin. Immunol. 2020;145:537–549. doi: 10.1016/j.jaci.2019.09.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Wen W., Zheng W., Okada Y., Takeuchi F., Tabara Y., Hwang J.-Y., Dorajoo R., Li H., Tsai F.-J., Yang X., et al. Meta-analysis of genome-wide association studies in East Asian-ancestry populations identifies four new loci for body mass index. Hum. Mol. Genet. 2014;23:5492–5504. doi: 10.1093/hmg/ddu248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Finucane H.K., Reshef Y.A., Anttila V., Slowikowski K., Gusev A., Byrnes A., Gazal S., Loh P.-R., Lareau C., Shoresh N., et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 2018;50:621–629. doi: 10.1038/s41588-018-0081-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Fehrmann R.S.N., Karjalainen J.M., Krajewska M., Harm-Jan W., Maloney D., Simeonov A., Pers T.H., Hirschhorn J.N., Jansen R.C., Schultes E.A., et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat. Genet. 2015;47:115–125. doi: 10.1038/ng.3173. [DOI] [PubMed] [Google Scholar]
- 56.Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.-R., Anttila V., Xu H., Zang C., Farh K., et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Lu Q., Powles R.L., Wang Q., He B.J., Zhao H. Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies. PLoS Genet. 2016;12:e1005947. doi: 10.1371/journal.pgen.1005947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Chen M.-H., Raffield L.M., Mousas A., Sakaue S., Huffman J.E., Moscati A., Trivedi B., Jiang T., Akbari P., Vuckovic D., et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell. 2020;182:1198–1213. doi: 10.1016/j.cell.2020.06.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Chen J., Spracklen C.N., Marenne G., Varshney A., Corbin L.J., Luan J., Willems S.M., Wu Y., Zhang X., Horikoshi M., et al. The trans-ancestral genomic architecture of glycemic traits. Nat. Genet. 2021;53:840–860. doi: 10.1038/s41588-021-00852-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Graham S.E., Clarke S.L., Wu K.H.H., Kanoni S., Zajac G.J.M., Ramdas S., Surakka I., Ntalla I., Vedantam S., Winkler T.W., et al. The power of genetic diversity in genome-wide association studies of lipids. Nature. 2021;600:675–679. doi: 10.1038/s41586-021-04064-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Gazal S., Finucane H.K., Furlotte N.A., Loh P.-R., Palamara P.F., Liu X., Schoech A., Bulik-Sullivan B., Neale B.M., Gusev A., et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 2017;49:1421–1427. doi: 10.1038/ng.3954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Myers S., Bottolo L., Freeman C., McVean G., Donnelly P. A fine-scale map of recombination rates and hotspots across the human genome. Science. 2005;310:321–324. doi: 10.1126/science.1117196. [DOI] [PubMed] [Google Scholar]
- 63.Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.-R., Bhatia G., Do R., et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Xiao J., Cai M., Hu X., Wan X., Chen G., Yang C. Xpxp: improving polygenic prediction by cross-population and cross-phenotype analysis. Bioinformatics. 2022;38:1947–1955. doi: 10.1093/bioinformatics/btac029. [DOI] [PubMed] [Google Scholar]
- 65.Yang S., Zhou X. Accurate and scalable construction of polygenic scores in large biobank data sets. Am. J. Hum. Genet. 2020;106:679–693. doi: 10.1016/j.ajhg.2020.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Atkinson E.G., Maihofer A.X., Kanai M., Martin A.R., Karczewski K.J., Santoro M.L., Ulirsch J.C., Kamatani Y., Okada Y., Finucane H.K., et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat. Genet. 2021;53:195–204. doi: 10.1038/s41588-020-00766-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Joseph K.P. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 2014;94:559–573. doi: 10.1016/j.ajhg.2014.03.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Yang C., Wan X., Lin X., Chen M., Zhou X., Liu J. Comm: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. Bioinformatics. 2019;35:1644–1652. doi: 10.1093/bioinformatics/bty865. [DOI] [PubMed] [Google Scholar]
- 69.Cai M., Chen L.S., Liu J., Yang C. Igrex for quantifying the impact of genetically regulated expression on phenotypes. NAR Genom. Bioinformatics. 2020;2:lqaa010. doi: 10.1093/nargab/lqaa010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Shi X., Chai X., Yang Y., Cheng Q., Jiao Y., Chen H., Huang J., Yang C., Liu J. A tissue-specific collaborative mixed model for jointly analyzing multiple tissues in transcriptome-wide association studies. Nucleic Acids Res. 2020;48:e109. doi: 10.1093/nar/gkaa767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J., Ziller M.J., et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Lu Q., Li B., Ou D., Erlendsdottir M., Powles R.L., Jiang T., Hu Y., Chang D., Jin C., Dai W., et al. A powerful approach to estimating annotation-stratified genetic covariance via GWAS summary statistics. Am. J. Hum. Genet. 2017;101:939–964. doi: 10.1016/j.ajhg.2017.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Zhu X., Stephens M. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat. Commun. 2018;9:4361–4414. doi: 10.1038/s41467-018-06805-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Chung D., Yang C., Li C., Gelernter J., Zhao H. Gpa: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS Genet. 2014;10:e1004787. doi: 10.1371/journal.pgen.1004787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Ming J., Dai M., Cai M., Wan X., Liu J., Yang C. Lsmm: a statistical approach to integrating functional annotations with genome-wide association studies. Bioinformatics. 2018;34:2788–2796. doi: 10.1093/bioinformatics/bty187. [DOI] [PubMed] [Google Scholar]
- 76.Kichaev G., Bhatia G., Loh P.-R., Gazal S., Burch K., Freund M.K., Schoech A., Pasaniuc B., Price A.L. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 2019;104:65–75. doi: 10.1016/j.ajhg.2018.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Ming J., Wang T., Yang C. Lpm: a latent probit model to characterize the relationship among complex traits using summary statistics from multiple GWASs and functional annotations. Bioinformatics. 2020;36:2506–2514. doi: 10.1093/bioinformatics/btz947. [DOI] [PubMed] [Google Scholar]
- 78.Kichaev G., Yang W.-Y., Lindstrom S., Hormozdiari F., Eskin E., Price A.L., Kraft P., Pasaniuc B. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 2014;10:e1004722. doi: 10.1371/journal.pgen.1004722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Li Y., Kellis M. Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases. Nucleic Acids Res. 2016;44:e144. doi: 10.1093/nar/gkw627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Hu Y., Lu Q., Powles R., Yao X., Yang C., Fang F., Xu X., Zhao H. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput. Biol. 2017;13:e1005589. doi: 10.1371/journal.pcbi.1005589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Márquez-Luna C., Gazal S., Loh P.-R., Kim S.S., Furlotte N., Adam A., Price A.L. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andme data sets. Nat. Commun. 2021;12:1–11. doi: 10.1038/s41467-021-25171-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Amariuta T., Ishigaki K., Sugishita H., Ohta T., Koido M., Dey K.K., Matsuda K., Murakami Y., Price A.L., Kawakami E., Terao C., Raychaudhuri S. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat. Genet. 2020;52:1346–1354. doi: 10.1038/s41588-020-00740-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Yang C., Li C., Wang Q., Chung D., Zhao H. Implications of pleiotropy: challenges and opportunities for mining big data in biomedicine. Front. Genet. 2015;6:229. doi: 10.3389/fgene.2015.00229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Speed D., Hemani G., Johnson M.R., Balding D.J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 2012;91:1011–1021. doi: 10.1016/j.ajhg.2012.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Yang L., Neale B.M., Liu L., Lee S.H., Wray N.R., Ji N., Li H., Qian Q., Wang D., Li J., et al. Polygenic transmission and complex neuro developmental network for attention deficit hyperactivity disorder: genome-wide association study of both common and rare variants. Am. J. Med. Genet. Part B: Neuropsychiatric Genetics. 2013;162:419–430. doi: 10.1002/ajmg.b.32169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Zeng J., De Vlaming R., Wu Y., Robinson M.R., Lloyd-Jones L.R., Yengo L., Yap C.X., Xue A., Sidorenko J., McRae A.F., et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 2018;50:746–753. doi: 10.1038/s41588-018-0101-4. [DOI] [PubMed] [Google Scholar]
- 87.Speed D., Holmes J., Balding D.J. Evaluating and improving heritability models using summary statistics. Nat. Genet. 2020;52:458–462. doi: 10.1038/s41588-020-0600-y. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The publicly available GWAS summary statistics for meta-analysis were obtained from the links summarized in Tables S1 and S5. The UK Biobank data are from UK Biobank resource under application number 30186. The LOG-TRAM software and source codes in this study were publicly available in the GitHub repository of LOG-TRAM (https://github.com/YangLabHKUST/LOG-TRAM).






