A gene-level methylome-wide association analysis identifies novel Alzheimer’s disease genes

Chong Wu; Jonathan Bradley; Yanming Li; Lang Wu; Hong-Wen Deng

doi:10.1093/bioinformatics/btab045

. 2021 Feb 1;37(14):1933–1940. doi: 10.1093/bioinformatics/btab045

A gene-level methylome-wide association analysis identifies novel Alzheimer’s disease genes

Chong Wu ^1,^✉, Jonathan Bradley ², Yanming Li ³, Lang Wu ⁴, Hong-Wen Deng ⁵

Editor: Peter Robinson

PMCID: PMC8337007 PMID: 33523132

Abstract

Motivation

Transcriptome-wide association studies (TWAS) have successfully facilitated the discovery of novel genetic risk loci for many complex traits, including late-onset Alzheimer’s disease (AD). However, most existing TWAS methods rely only on gene expression and ignore epigenetic modification (i.e. DNA methylation) and functional regulatory information (i.e. enhancer-promoter interactions), both of which contribute significantly to the genetic basis of AD.

Results

We develop a novel gene-level association testing method that integrates genetically regulated DNA methylation and enhancer–target gene pairs with genome-wide association study (GWAS) summary results. Through simulations, we show that our approach, referred to as the CMO (cross methylome omnibus) test, yielded well controlled type I error rates and achieved much higher statistical power than competing methods under a wide range of scenarios. Furthermore, compared with TWAS, CMO identified an average of 124% more associations when analyzing several brain imaging-related GWAS results. By analyzing to date the largest AD GWAS of 71 880 cases and 383 378 controls, CMO identified six novel loci for AD, which have been ignored by competing methods.

Availabilityand implementation

The data used in this work were obtained from the following publicly available datasets: IGAP1, GWAX, UK Biobank, a 2019 meta-analyzed AD GWAS results and a imaging-derived phenotype GWAS results. The data resources are summarized in Supplementary Table S7. We used the publicly available software and tools for competing methods. All codes used to generate results that are reported in this manuscript and software for our newly proposed method CMO are available at https://github.com/ChongWuLab/CMO.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Alzheimer’s disease (AD) is a progressive loss of memory and cognition, for which no effective treatment or cure currently exists (Canter et al., 2016). The main hallmarks of AD are amyloid plaques and neurofibrillary tangles, which might develop 20 years or more before the onset of rapid neurodegeneration and cognitive symptoms (Canter et al., 2016). Effective and early assessment of AD risk is critical to identify individuals at high risk and decrease the public health burden of AD. While Aβ and tau levels in cerebrospinal fluid (CSF) are considered to be potential markers for assessing AD risk, lumbar puncture is painful and routine check of CSF is difficult (Lee et al., 2019). On the other hand, several studies (Rahman et al., 2020; Xu et al., 2010) have suggested specific genes in blood can serve as candidate biomarkers for risk assessment. Thus, a better understanding of AD-associated genes in blood, beyond the scope of current understanding, may lead to a more powerful risk assessment.

One strategy to identify associated genes is to use gene-level integrative association tests (Gamazon et al., 2015; Gusev et al., 2016; Hu et al., 2019; Wu and Pan, 2018a, 2019; Xu et al., 2017a, b), which aggregate potential regulatory effects of individual genetic variants across a gene of interest to test gene-trait associations. One such family of tests is transcriptome-wide association studies (TWAS) (Gamazon et al., 2015; Gusev et al., 2016; Hu et al., 2019; Xu et al., 2017b), which integrate expression reference panels (i.e. expression quantitative trait loci [eQTL] cohorts with matched individual-level expression and genotype data) with genome-wide association studies (GWAS) data to discover gene-trait associations. TWAS have identified many novel trait-associated genes, and have deepened our understanding of genetic regulation in many complex traits, including AD (Raj et al., 2018).

There is an exciting opportunity to develop novel and powerful integrative association tests by incorporating additional functional and epigenetic information. Specifically, there has been an increasing interest in the role of epigenetic modifications (such as DNA methylation [DNAm]) that interact between genome and environment in complex disease etiology (De Jager et al., 2014; Lunnon and Mill, 2013). DNAm is essential for mammalian development and mediates many critical cellular processes, including gene regulation, genomic imprinting and X chromosome inactivation (Smith and Meissner, 2013). Recent epigenome-wide association studies (EWAS) (Rakyan et al., 2011) have led to a rapid increase in identifying robust associations between DNAm and complex traits, including AD (Roubroeks et al., 2017). Critically, many of the identified significant CpG sites are different from the well-established AD genetic risk loci identified by GWAS. This highlights the potential utility of EWAS in characterizing additional insights into AD pathogenesis and risk assessment (Watson et al., 2016). However, the causal relationship between DNAm and AD is difficult to establish because the DNAm levels are affected by environmental and lifestyle factors (Alegría-Torres et al., 2011). To reduce limitations such as reverse causation and unmeasured confounders that are commonly encountered in conventional EWAS, methylome-wide analyses (MWAS) (Baselmans et al., 2019; Freytag et al., 2018) have been proposed to test associations between genetically regulated DNAm and the trait of interest. However, existing methods are largely focused on individual CpG site by testing each CpG site separately. Additionally, we and others have shown that integrating enhancer-promoter interactions will lead to improved statistical power for gene-level association tests (Wu and Pan, 2018b) as enhancers are key gene-regulatory DNA sequences that control gene expression by engaging in physical contacts with their cognate genes (Furlong and Levine, 2018). Furthermore, our previous work (Wu and Pan, 2019) have shown the promising performance of integrating DNAm quantitative trait loci (mQTLs) located in enhancers, promoters and the gene body region with GWAS results. However, it is still unclear how to effectively integrate genetically regulated DNAm and promoter-enhancer interactions with GWAS results.

In this work, we propose a novel computationally efficient gene-level association testing method, referred to as cross-methylome omnibus (CMO) test. CMO integrates genetically regulated DNAm in enhancers, promoters and the gene body to identify additional disease associated genes that are ignored by existing methods. In an applied analysis for AD risk, we focused on a set of prediction models for blood tissue to generate candidates that may facilitate risk assessment for AD. Through simulations and analysis of several brain imaging-derived phenotypes, we demonstrate that CMO achieves high statistical power while controlling the type 1 error rate well. Additionally, by reanalyzing several AD GWAS summary datasets (Jansen et al., 2019; Lambert et al., 2013; Liu et al., 2017), we demonstrate that our approach can reproducibly identify additional AD-associated genes that are not able to be identified by the competing methods. These newly identified genes in blood may serve as promising candidates for risk assessment of AD.

2 Materials and methods

2.1 Overview of CMO test

We build upon previous work (Baselmans et al., 2019; Freytag et al., 2018; Gamazon et al., 2015; Gusev et al., 2016; Wu and Pan, 2019; Xu et al., 2017b) to develop a novel integrative gene-based test for identifying novel genes that may influence the trait of interest through DNAm pathways. CMO involves three main steps. First, CMO links CpG sites located in enhancers, promoters and the gene body (including both exons and introns) to a target gene because DNAm in enhancers and promoters may play vital roles in gene regulation (Lu et al., 2014) and integrating such information may improve the statistical power. Of note, CMO integrates and thus utilizes much more comprehensive enhancer-promoter interaction information from the GeneHancer (Fishilevich et al., 2017), which is different from our previous work (Wu and Pan, 2018b, 2019) that only integrates two sets of enhancer-promoter interaction data for illustration purposes. Second, by leveraging existing DNAm prediction models (built from reference datasets that have both genetic and DNAm data), CMO tests associations between genetically regulated DNAm of each linked CpG and the trait of interest using a weighted gene-based test. Because the underlying genetic structure is unknown, different CpG-based tests may be more powerful under different scenarios. To offer consistently high statistical power, we apply the aggregated Cauchy association test (ACAT) (Liu et al., 2019; Liu and Xie, 2020) to combine the results from a set of widely used tests. Third, CMO applies the ACAT method to combine statistical evidence from multiple CpG sites for a target gene to test whether the target gene is associated with the trait of interest.

2.2 Details of CMO test

The aim of our method CMO is to identify associated genes that affect the trait of interest through DNA methylation pathways. We consider only one gene for illustration because the same procedure can be applied for each gene sequentially. The null hypothesis is that the gene to be tested is not associated with the trait of interest. CMO involves the following three steps.

Step 1: Linking CpG sites to a target gene. CpG sites associated with gene expression levels are enriched in enhancers, promoters and gene bodies (Gutierrez-Arcelus et al., 2015). Through the three-dimensional structure of the genome, distal enhancers engage in physical contact with their target-gene promoters and thus play important regulatory roles (Schoenfelder and Fraser, 2019). For example, for many genes, increased DNA methylation levels in enhancer regions are associated with expression levels (Lu et al., 2014) and chromatin accessibility (Thurman et al., 2012). These facts motivated us to link CpG sites that are located in the enhancers, promoters and the gene body to their target genes.

Specifically, we define enhancers, promoters and gene body as follows. A gene body region includes all the introns and exons of a target gene. To include cis-acting regulatory regions and alleviate the burden of determining gene direction, two promoters of a target gene are defined as a 500 bp extension (Andersson et al., 2014) on either side of the gene body region beyond its transcription start site (TSS) and transcription end site (TES), respectively. We use an integrated database called GeneHancer (Fishilevich et al., 2017) to determine enhancer regions for each gene. By integrating reported enhancers from four genome-wide databases (the ENCODE, the Ensembl regulatory build, the FANTOM and the VISTA Enhancer browser) and linking enhancers to their target genes via several complementary approaches, GeneHancer provides comprehensive enhancer-gene information (Fishilevich et al., 2017).

The genomic coordinates of genes are obtained from the human genome assembly GRCh37/hg19. We use the Illumina HumanMethylation450 v1.2 manifest file to determine the position of CpG sites and map/link CpG sites in the gene body, two promoters and enhancers regions to its target gene accordingly. We assume p CpG sites have been mapped to a target gene.

Step 2: Testing for each linked CpG site. Similar to conventional TWAS framework, Baselmans et al. (2019) built prediction models for predicting genetically regulated DNAm. Specifically, at a false discovery rate (FDR) of 5%, there were 151 729 CpG sites with at least one significant DNAm quantitative trait locus (mQTL). For these 151 729 CpG sites, a Lasso model was applied with local single nucleotide polymorphisms (SNPs) (within 250 Kb extension) were chosen as predictors, and DNA methylation level was defined to be the outcome. Baselmans et al. (2019) further provided the mQTL-derived weights $W = (W_{1}, \dots, W_{k})'$ and corresponding LD matrix V for local SNPs as a publicly available resource, on which CMO was based.

We focus on one CpG site for illustration. Through GWAS summary data, we obtain the corresponding Z score vector $Z = (Z_{1}, \dots, Z_{k})'$ for the CpG site being considered. Briefly, $Z_{j} = \hat{β_{j}} / {SE}_{j}$ , where ${\hat{β}}_{j}$ and ${SE}_{j}$ are the estimated marginal effect and its standard deviation for SNP j, respectively. Under the null hypothesis (no association), Z follows a normal distribution with mean 0, that is $Z \sim N (0, V)$ , where V is provided by Baselmans et al. (2019).

To test the association between the estimated genetically regulated DNAm and the trait of interest, Baselmans et al. (2019) proposed the MWAS test statistic

T_{MWAS} = (\sum_{j = 1}^{k} W_{j} Z_{j}) / \sqrt{W' VW},

where W_j is the mQTL-derived weight for SNP j, provided by Baselmans et al. (2019). MWAS can be viewed as a standardized weighted burden test. The statistic ( $T_{MWAS}$ ) has an asymptotic standard normal distribution, and its P-value can be computed analytically.

By over-weighting the methylation-associated SNPs, MWAS improves the statisticaal power and reduces the concern of the reverse-causality (Baselmans et al., 2019). However, MWAS also has similar limitations as that of TWAS (Wainberg et al., 2019; Xu et al., 2017b). Specifically, the burden test is derived under the over-simplifying assumption that all (weighted) SNPs have the same effect sizes. Thus, the burden test (i.e. MWAS) will lose power if the effect directions of the (weighted) SNPs are different or the effect sizes are sparse (i.e. with many 0’s). Other widely used tests, such as SSU (Pan, 2009) and ACAT (Liu et al., 2019; Liu and Xie, 2020), will be powerful under many alternatives (Pan et al., 2014).

To further improve the power, besides applying the burden test, we apply the weighted SSU test and ACAT. The SSU test statistic is,

T_{SSU} = \sum_{j = 1}^{k} {(W_{j} Z_{j})}^{2} .

$T_{SSU}$ follows a mixture of chi-square distribution asymptotically under the null, and its P-value can be calculated analytically (Pan, 2009). The ACAT test statistic is,

T_{ACAT} = \sum_{j = 1}^{k} | W_{j} | tan {(0.5 - p_{j}) π},

where p_j is the P-value for SNP j. We use the absolute value of W_j as the weight for SNP j since ACAT requires non-negative weights. Briefly, ACAT first transforms the original P-values to standard Cauchy random variables under the null hypothesis and then uses the weighted summation of transformed P-values as the test statistic. T_ACAT can be approximated using a Cauchy distribution and its P-value can be approximated by,

P value \approx 1 / 2 - \arctan (T_{ACAT} / w) / π,

where $w = \sum_{i = j}^{k} | W_{j} |$ . This approximation is accurate, computationally efficient, and is valid for arbitrary correlation structure among the SNPs (Liu and Xie, 2020).

Because the optimal test depends on the underlying truth (Pan et al., 2014), which varies in practice, we apply the ACAT method to combine the above three tests, offering high statistical power with respect to the sparsity of causal variants. Specifically, we define the test statistics as,

T = \frac{1}{3} [tan {(0.5 - p_{MWAS}) π} + tan {(0.5 - p_{SSU}) π}

+ tan {(0.5 - p_{ACAT}) π}],

where $p_{MWAS}, p_{SSU}$ and $p_{ACAT}$ are the P-values for the MWAS, SSU and ACAT tests, respectively. Then the corresponding P-value is calculated as $0.5 - {\arctan (T)} / π$ . We repeat the above procedure for each linked CpG site of a target gene and obtain $P = (p_{1}, \dots, p_{p})'$ for the p CpG sites defined in Step 1.

Step 3: Integrating over multiple CpG sites in a target gene. To robustly aggregate information from different CpG sites, we propose an omnnibus test called cross-methylome omnibus (CMO) tests that applies the ACAT method to combine information from the linked CpG sites. The test statistics is

T_{CMO} = \frac{1}{p} \sum_{j = 1}^{p} tan {(0.5 - p_{j}) π} .

$T_{CMO}$ can be approximated by a standard Cauchy distribution well and the P-value can be calculated as $0.5 - {\arctan (T_{CMO})} / π$ . Of note, the approximation error goes to zero as the P-value of CMO goes to zero (Liu et al., 2019; Liu and Xie, 2020). In other words, the P-value approximation is particularly accurate for the genes with significant results. More importantly, through the ACAT method, we integrate results CpG sites in enhancers, promoters and gene body regions efficiently, achieving high power under a wide range of scenarios.

2.3 Simulation settings

We conducted simulations to compare the performance of CMO and competing methods regarding the type 1 error rates and statistical power. Specifically, we used data from UK Biobank (application number 48240) and randomly chose 10 000 independent White British individuals. The imputed genetic data for 5911 cis-SNPs [with minor allele frequency (MAF) $> 1 %$ , Hardy-Weinberg P-value $> 10^{- 6}$ and imputation ‘info’ score > 0.4] of the most significant AD-associated gene APOE were used in the simulation studies.

We first simulated both genetically regulated DNAm values at different CpG sites and genetically regulated gene expressions at different tissues. We randomly selected m CpG sites and expressions in m tissues. Of note, we selected an equal number of CpG sites and tissues to fairly compare the CMO with TWAS. Then the cis component of DNAm values in CpG i and cis component of gene expression in tissue j was generated by $C_{i} = X {\hat{W}}_{i}^{C}$ and $E_{j} = X {\hat{W}}_{j}^{E}$ , respectively. X is the n × k centered and standardized genotype matrix. To consider the estimation errors on the effect sizes, we generated ${\hat{W}}_{i}^{C}$ and ${\hat{W}}_{j}^{E}$ by the following two steps. First, we obtained the originally estimated effect sizes (i.e. weights) ${\tilde{W}}_{i}^{C}$ and ${\tilde{W}}_{j}^{E}$ from Baselmans et al. (2019) and Gusev et al. (2016), respectively. Second, we randomly selected a pre-specified proportion (denoted by q) of weight elements in the original ${\tilde{W}}_{i}^{C}$ and ${\tilde{W}}_{j}^{E}$ , and set them to be zero to obtain the final ${\hat{W}}_{i}^{C}$ and ${\hat{W}}_{j}^{E}$ , respectively.

Next, we simulated the quantitative trait by $Y = β \sum_{i = 1}^{m} C_{i} + α \sum_{j = 1}^{m} E_{j} + ϵ$ , where $β = h_{c} / s d (\sum_{i = 1}^{m} C_{i})$ is the effect of CpG sites on the trait, $α = h_{g} / s d (\sum_{j = 1}^{m} E_{j})$ is the effect of gene expressions on the trait. $ϵ \sim N (0, σ^{2})$ is random noise, and $σ^{2}$ is determined by the pre-specified heritability of all the CpG sites and gene expression in different tissues being considered. We used sd(x) to denote the standard deviation of x and h_c and h_g controlled the relative heritability contributions of all CpG sites and gene expressions, respectively. We performed an association scan on simulated data $(Y, X)$ and computed Z score for each SNP by a linear regression. To compare with ‘E + G + Methyl’, we simulated mQTL data. Briefly, for each CpG site located in enhancers, promoters and gene body regions, we generated DNA methylation levels for 742 subjects [to match the mQTL data (Gaunt et al., 2016) used in ‘E + G + Methyl’] and set the heritability to be 0.05. Then we ran linear regression for each SNP-CpG site pair and determined the mQTL by $P < 1 \times 10^{- 7}$ , the cutoff used in ‘E + G + Methyl’.

To evaluate type 1 error, we set h_c = 0 and h_g = 0, i.e. no association between the gene and trait. To evaluate power, we considered several combinations of h_c and h_g. For example, we set $h_{c} = - 1$ and h_g = 1 as methylation usually represses gene expression. To evaluate the effects of integrating enhancers information, we further considered the situations with all causal CpG sites located in enhancer regions only and the situations with none causal CpG site located in enhancer regions. We also considered diverse disease architectures by varying heritability (i.e. 0.005 and 0.01). We repeated simulations 1 000 000 times under the null and 1000 times under the alternatives, and compared our proposed CMO test with ‘E + G + Methyl’ (SSU), standard TWAS and a multi-tissue TWAS, called MultiXcan (Barbeira et al., 2019). Furthermore, we considered taking the union of CpG sites in gene body and promoter regions while applying an additional Bonferroni correction, denoted as union of MWAS or MWAS for simplicity. Because using the union of TWAS/MWAS with an additional Bonferroni correction may be too conservative, we also investigated combining the results of TWAS and MWAS by the ACAT method (denoted by TWAS2 and MWAS2, respectively). Statistical power was calculated as the proportion of 1000 repeated simulations with P-value reaching the genome-wide significance threshold $0.05 / 20, 000 = 2.5 \times 10^{- 6}$ .

2.4 Application to GWAS data

We applied CMO and competing methods to the summary results from all the 21 GWAS of Susceptibility Weighted Imaging (Elliott et al., 2018) (N = 8428) to further compare the performance of different methods.

To further illustrate the proposed methods and deepen our understanding of genetic regulation in AD, we applied different gene-based association tests to identify AD risk genes by reanalyzing three sets of AD GWAS summary datasets: a meta-analyzed AD GWAS dataset with 17 008 AD cases and 37 154 controls (Lambert et al., 2013), denoted as IGAP1; a genome-wide association study by proxy (GWAX) results with 14 482 (proxy) AD cases and 100 082 (proxy) controls the UK Biobank (Liu et al., 2017), denoted as GWAX; and the largest available AD GWAS data of 71 880 (proxy) AD cases and 383 378 (proxy) controls of European ancestry (Jansen et al., 2019). Of note, the proxy AD cases were defined by family history and showed quite strong genetic correlation with AD ( $r_{g} = 0.81$ ) (Jansen et al., 2019).

We performed a multi-stage gene-level association study for AD. In the performance evaluation stage, we identified significantly associated genes in the IGAP1 dataset. Next, we replicated our findings by independent data from GWAX and then further validated by GWAS Catalog (Buniello et al., 2019). The P-values for the replication rate were calculated by a hypergeometric test, with the background probabilities estimated from all the genes being tested. In the end, we applied CMO and competing methods to the largest available AD GWAS dataset (Jansen et al., 2019) for new discovery. We further performed the ‘Core Analysis’ in Ingenuity Pathway Analysis (IPA) (Krämer et al., 2014) for the identified associated genes to assess the enriched pathways, diseases and networks.

We applied Bonferroni correction to determine the P-value cutoffs for gene-level association tests. Specifically, when focusing on the common gene set that can be analyzed by all methods, we used the same Bonferroni correction cutoff; otherwise, we used 0.05 divided by the total number of genes tested for each method.

3 Results

3.1 Simulation studies

One key advancement of CMO is that it achieves high power to identify new outcome associated genes that could not be identified by competing methods. To evaluate the type 1 error rate and statistical power under different settings, we performed simulations studies using randomly selected 10 000 independent White British individuals from UK Biobank (application number 48240). We observed that the type 1 error rate was well-controlled under different significance levels (Table 1, Supplementary Fig. S1).

Table 1.

Type 1 error rate under different significance levels

α	CMO	TWAS	MultiXcan	MWAS	E + G + Methyl
$5 \times 10^{- 2}$	$4.22 \times 10^{- 2}$	$2.95 \times 10^{- 2}$	$5.11 \times 10^{- 2}$	$4.00 \times 10^{- 2}$	$2.87 \times 10^{- 2}$
$1 \times 10^{- 2}$	$8.75 \times 10^{- 3}$	$6.01 \times 10^{- 3}$	$1.05 \times 10^{- 2}$	$9.14 \times 10^{- 3}$	$5.91 \times 10^{- 3}$
$5 \times 10^{- 3}$	$4.39 \times 10^{- 3}$	$3.07 \times 10^{- 3}$	$5.32 \times 10^{- 3}$	$4.81 \times 10^{- 3}$	$2.92 \times 10^{- 3}$
$1 \times 10^{- 3}$	$8.92 \times 10^{- 4}$	$6.12 \times 10^{- 4}$	$1.14 \times 10^{- 3}$	$1.05 \times 10^{- 3}$	$5.97 \times 10^{- 4}$
$5 \times 10^{- 4}$	$4.39 \times 10^{- 4}$	$3.05 \times 10^{- 4}$	$6.08 \times 10^{- 4}$	$5.52 \times 10^{- 4}$	$3.06 \times 10^{- 4}$
$1 \times 10^{- 4}$	$9.70 \times 10^{- 5}$	$6.00 \times 10^{- 5}$	$1.14 \times 10^{- 4}$	$1.27 \times 10^{- 4}$	$7.20 \times 10^{- 5}$
$5 \times 10^{- 5}$	$5.50 \times 10^{- 5}$	$2.70 \times 10^{- 5}$	$6.00 \times 10^{- 5}$	$6.50 \times 10^{- 5}$	$3.90 \times 10^{- 5}$

Open in a new tab

Note: We repeated simulations one million (10⁶) times.

Furthermore, the statistical power of our proposed CMO test was much higher than competing methods under most scenarios we considered. For example, when three CpG sites and three tissues were associated with the trait (m = 3) and DNAm levels were negatively associated with gene expression ( $h_{c} = - 1$ , h_g = 1), CMO was much more powerful than competing methods (Fig. 1). The power difference between CMO and competing methods became even larger when half of the weights were non-informative (q = 0.5). This is because the burden test (i.e. TWAS/MWAS) is known to lose power when the effect sizes are sparse. Importantly, CMO was more powerful than ‘E + G + Methyl’ because ‘E + G + Methyl’ only used the significant mQTL to construct the test while CMO incorporated the genetically predicted DNA methylation levels.

Fig. 1. — CMO improved statistical power. Each method is indicated in the bar plot. We set m = 3, $h_{c} = - 1$ and *h_g* = 1

One concern is that it may be too conservative to use the union of TWAS or MWAS with an additional Bonferroni correction. To investigate this, we applied the ACAT method to combine the results from TWAS or MWAS (denoted by TWAS2 and MWAS2, respectively). While TWAS2 indeed improved the power, the improvement was minimal and the power was lower than that of MultiXcan, a method that combines multi-tissue TWAS results through the principal component analysis (Fig. 1). As taking the union of TWAS with an additional Bonferroni correction has been widely used in literature (Barbeira et al., 2019; Gamazon et al., 2019), we will use the union of TWAS or MWAS with an additional Bonferroni correction in real data applications.

Figure 1 shows that the CMO was more powerful than the ‘Gene Body CMO’, a modified CMO without integrating CpG sites in the enhancer regions. This indicates that integrating enhancer information indeed improved power when the causal CpG sites were randomly selected from enhancers, promoters and gene body regions. To further investigate the effectiveness of incorporating enhancer regions, we considered the following two situations: all the causal CpG sites resided in enhancer regions; and none of the causal CpG sites resided in enhancer regions. When all the causal CpG sites resided in enhancer regions, CMO was much more powerful than the ‘Gene Body CMO’ (Supplementary Figs S2 and S3). In contrast, when none of the causal CpG sites resided in enhancer regions, the power of CMO only slightly lower than that of ‘Gebe Body CMO’, showcasing that CMO can maintain high power even when enhancer information is irrelevant (Supplementary Figs S2 and S3). Thus, we recommend integrating enhancer information to improve power because CpG sites in enhancer regions may play regulatory roles.

The above findings were consistent under many different scenarios, including varying the number of causal CpG sites and tissues ( $m = 1$ and m = 5; Supplementary Figs S4 and S5), the effect directions of CpG sites (Supplementary Figs S6–S8) and the estimation errors in weights (q = 0.3 and q = 0.7; Supplementary Figs S9–S11).

In the end, we investigated the situations that none of CpG sites was associated with the trait directly (h_c = 0 and h_g = 1). These situations should favor TWAS and MultiXcan as SNPs affect the outcome only through gene expression pathways. However, CMO still achieved comparable (sometimes even higher) power as that of TWAS and MultiXcan (Supplementary Figs S12–S14). This is because the predicted DNA methylation models and gene expression models are correlated, and the effects of genetic variants may be mediated through DNA methylation to transcription (Wu et al., 2018).

In summary, we showcase our proposed test CMO achieves high power under a wide range of scenarios with well-controlled Type 1 error rates.

3.2 CMO identifies more associations than competing methods

We applied CMO to the summary results from all the 21 GWAS of Susceptibility Weighted Imaging (SWI) reported earlier (Elliott et al., 2018) (N = 8, 428) to further evaluate the statistical power of different integrative gene-based tests. The QQ plots indicate that CMO controls the Type 1 error rates (Supplementary Fig. S15). The genomic control factors of CMO were conservative (Supplementary Fig. S15). This is because the P-value of CMO is accurate only when the P-value is highly significant and the genomic control factor is constructed by the median of the empirically observed test statistics, which corresponds to a P-value around 0.5 under the null. The CMO results were compared with those of TWAS and MultiXcan. Figure 2 shows that CMO identified substantially more associations, showing 124.2% improvement compared with TWAS and 91.4% improvement compared with MultiXcan. Additionally, our CMO test identified 268.8% more associated genes than the naive test (Union of MWAS), which combines the results across CpG sites in the gene body and promoter regions and applies an additional Bonferroni correction. More importantly, CMO identified 33.6% more associated genes than the ‘Gene Body CMO’, highlighting that incorporating enhancer information improves the power of CMO. CMO also identified 453.3% more associated genes than ‘E + G + Methyl’ because CMO combines the genetically predicted models while ‘E + G + Methyl’ only focuses on mQTLs.

Fig. 2. — CMO identified more associations for 21 Susceptibility weighted imaging (SWI) related traits. The left and right panel show the numbers of significant genes identified in all available genes and the common set of 6171 genes, respectively. Each method is indicated in the bar plot. The union of TWAS is a method that considers all tissues while applying an additional Bonferroni correction. The union of MWAS is a method that combines results across CpG sites in the gene body and promoter regions while applying an additional Bonferroni correction. ‘E + G + Methyl’ combines the results of the Hippo and MCF-7 cell line

Different methods test different sets of genes, consequently, we also compared methods over a common set of 6171 genes that could be analyzed by all the methods. Again, we showed that CMO identified substantially more associations than competing methods (Fig. 2). Of note, such improvement was observed consistently across all traits analyzed (Supplementary Tables S1 and S2).

3.3 CMO identifies replicable associated genes for AD

We performed a multi-stage gene-level association study for AD (Methods) to further demonstrate CMO’s effectiveness and improve our understanding of the genetic basis of AD. Specifically, we first considered applications to the smaller stage 1 GWAS summary statistics from the International Genomics of Alzheimer’s Project (Lambert et al., 2013) (IGAP1; N = 54 162) for discovery. CMO, Union of MWAS, Union of TWAS and MultiXcan identified 87, 32, 35 and 32 genome-wide significant genes, respectively (Fig. 3). Of note, perhaps because DNA methylation could regulate gene expression and CMO incorporated enhancer-promoter interactions, CMO identified much more significant genes than that of TWAS and MultiXcan. In addition, ‘Gene Body CMO’ and ‘E + G + Methyl’ identified 70 and 15 significant genes, respectively. As expected, the majority of genes identified by ‘Gene Body CMO’ and ‘E + G + Methyl’ (86.3%; 63 out of 73) have been identified by CMO. When we focused on the common set of 11 708 genes that could be analyzed by the following methods: TWAS, MultiXcan, CMO and MWAS; CMO still identified substantially more associations than the competing methods (Supplementary Fig. S16).

Fig. 3. — CMO identified more significant genes in the IGAP1 dataset. TWAS stands for TWAS that considers all tissues while applying an additional Bonferroni correction. MWAS is a method that combines results across CpG sites in the gene body and promoter regions while applying an additional Bonferroni correction

For replication, we used independent summary results from the GWAS by proxy of UK Biobank data (Liu et al., 2017) (GWAX; N = 114, 564). Even though GWAX used the ‘proxy’ AD phenotype based on family history, the replication rate was high: out of 87 significant genes identified by CMO, 12 were replicated under the Bonferroni-corrected significance threshold ( $P = 2.3 \times 10^{- 28}$ by a hypergeometric test) and 16 were replicated under a relaxed P-value cutoff 0.05 ( $P = 1.2 \times 10^{- 7}$ by a hypergeometric test). Additionally, by searching the GWAS Catalog 1.0 (Version r2019–12–16) (Buniello et al., 2019), we found that 53 out of 87 (60.9%) genes were reported by existing studies ( $P = 6.4 \times 10^{- 50}$ by a hypergeometric test). Overall, these two complementary replication efforts demonstrate that the CMO approach is highly robust and particularly more powerful.

3.4 CMO identifies novel associated genes for AD

We next applied CMO to the largest available AD GWAS with up to 455 258 individuals of European ancestry (Jansen et al., 2019) for novel gene discovery. In total, CMO identified 159 genome-wide significant genes (Fig. 4 and Supplementary Table S3), which was substantially more than competing methods (Supplementary Fig. S17 and Supplementary Tables S4–S6). Importantly, 109 out of 159 CMO identified associations were unidentified by MWAS (Supplementary Fig. S17), and were driven by the genetically predicted CpG sites in enhancer regions. This highlights the importance of integrating enhancer-promoter interaction information in a CMO test.

Fig. 4. — Manhattan plot in a 2019 meta-analyzed AD GWAS dataset (Jansen *et al.*, 2019) (N = 455, 258, CMO test). For visualization purposes, P-values are truncated at $1 \times 10^{- 30}$ . The horizontal line marks the genome-wide significance threshold $3 \times 10^{- 6}$ . The most significant gene at each locus is labeled

Most significant genes identified by CMO are located at known AD risk loci (Supplementary Table S3) (Jansen et al., 2019; Kunkle et al., 2019). These include the CR1 locus in chromosome 1, BIN1, INPP5D loci in chromosome 2, CD2AP locus in chromosome 6, CLU locus in chromosome 8, MS4A6A locus in chromosome 11, SLC24A4 locus in chromosome 14, ADAM10 locus in chromosome 15, KAT8 locus in chromosome 16, ABI3 locus inn chromosome 17, APOE locus in chromosome 19 and CASS4 locus in chromosome 20.

Ingenuity pathway analysis (IPA) further validates our findings and provides new insights. For example, the top disease suggested by ‘Disease and Functions’ module in IPA was late-onset AD ( $p = 4.1 \times 10^{- 15}$ by a Fisher’s exact test), which involves ten associated genes: ABI3, ADAM10, APOE, BIN1, CD2AP, CLU, CR1, EPHA1, MS4A4A, PICALM. The other top suggested diseases were adhesion of blood cells ( $p = 3.4 \times 10^{- 9}$ ) and cancer ( $p = 4.1 \times 10^{- 7}$ ). Cancer and AD have inverse relationship in many common biological mechanism like P53, estrogen and growth factors (Shafi, 2016). Interestingly, filgrastim, a drug that stimulates the growth of white blood cells, was linked to the identified genes ( $p = 1.5 \times 10^{- 5}$ . This finding indicates a drug repurposing opportunity for using filgrastim to treat AD, which may deserve further investigation.

Beyond these confirmatory results, we also identified six novel loci for AD, which are at least ±500 kb away from genome-wide significant risk SNPs identified in the 2019 meta-analyzed AD GWAS dataset (including ZKSCAN5 [ $p = 1.08 \times 10^{- 13}$ ], FZD3 [ $p = 1.52 \times 10^{- 16}$ ], BNIP2 [ $p = 5.57 \times 10^{- 9}$ ], ZNF720 [ $p = 9.22 \times 10^{- 7}$ ], IL34 [ $1.6 \times 10^{- 6}$ ] and ZNF404 [ $p < 1 \times 10^{- 20}$ ]). In comparison, Union of TWAS, MultiXcan, MWAS identified 3, 1 and 0 novel genes, respectively (Supplementary Fig. S18). FZD3 expresses a receptor required for the Wnt signaling pathway (Zhang et al., 2019). Wnt signaling pathway involves the development of the central nervous system, and the loss of Wnt signaling is associated with the neurotoxicity of the amyloid-β. Recall, amyloid-β is an established major player for AD pathogenesis (Inestrosa et al., 2012). Another example is IL34, which is reported to be associated with tau protein, a biomarker for AD (Deming et al., 2017). Critically, these six novel genes have not been identified by the competing methods, showcasing the power of our proposed approach.

4 Discussion

Integrating genetic regulatory information when performing a gene-level association test usually improves the statistical power because most identified GWAS variants are located in non-coding regions and act by affecting gene regulation. Both simulations and real data applications with summary-level GWAS results demonstrate that the potential power gain of our proposed method CMO. Unlike existing integrative approaches, CMO is the first method to integrate genetically predicted DNAm values and enhancer-promoter interaction with GWAS association results. Compared to TWAS and its extension MultiXcan, our approach generally identifies substantially more associated and overlooked genes for AD, which can be further validated by an independent study. By analyzing data of to date the largest AD GWAS of 71 880 (proxy) cases and 383 378 (proxy) controls of European ancestry, we found associated genes identified by CMO were enriched in late-onset AD pathway and were linked to filgrastim, a drug that stimulates the growth of white blood cells. Furthermore, CMO identified six novel loci for AD, which can be further investigated for improving the risk assessment of AD.

CMO is a computationally efficient method, since P-values can be calculated analytically. For example, by a parallel computing strategy with 100 cores in a standard server (each core has 4 GB memory), CMO completed calculating P-values for all available genes in IGAP1 data within approximately 1.5 minutes. To facilitate data analysis for both clinical and statistical investigators, we have implemented our proposed method CMO into open-source software.

CMO is built upon many previous works. Briefly, we apply the ACAT method (Liu et al., 2019; Liu and Xie, 2020) to efficiently combine the CpG sites located in enhancers, promoters and gene body regions. We also adopt an adaptive test idea (Pan et al., 2014) to maintain high power across different scenarios. Furthermore, CMO can also be viewed as a substantial improvement of our previous method ‘E + G + Methyl’ (Wu and Pan, 2019). ‘E + G+ Methyl’ uses two enhancer database for illustration purposes and only integrates mQTLs. In constrast, CMO uses a much more comprehensive enhancer database called GeneHancer (Fishilevich et al., 2017) to define enhancer regions for a targeted gene and integrates DNA methylation prediction models. Through simulations and real data analyses, we show that CMO performed better than ‘E + G + Methyl’.

We note several limitations of the CMO method. First, CMO is an association-based test, and the significant genes identified by CMO do not imply causality. One may apply fine-mapping methods such as FOCUS (Mancuso et al., 2019) and FOGS (Wu and Pan, 2020) to prioritize putative causal genes. Second, the power of CMO depends on how accurate the genetically predicted DNAm values are. This is similar to the power of TWAS is limited by the quality of imputed gene expression. Currently, the prediction models were provided by Baselmans et al. (2019) and were constructed based on adults of European ancestry. We expect the power of CMO will be further improved when better prediction models are incorporated. For example, one can incorporate functional annotations (Li et al., 2020) in a weighted penalized regression framework to improve the DNA methylation prediction model performance and thus the power of CMO. Third, for simplicity, all CpG sites have the same weight in the current version of CMO. The power may be further improved by incorporating prior information, such as giving more weights for the CpG sites in promoter regions. Fourth, we focused on blood tissue for illustration and risk assessment. Similar ideas can be applied to tissue-specific analysis to better understand the etiology of AD. For example, we may use PsychENCODE (Wang et al., 2018) and ROS/MAP data (Bennett et al., 2012) to construct brain-specific DNAm methylation prediction models and incorporate brain-specific enhancer information. Firth, CMO is designed for independent samples. However, following previous works (Chen et al., 2016, 2019; Park et al., 2018), we can calculate Z scores and corresponding covariance matrix by fitting a mixed effect model. Then the CMO can be naturally extended to situations with related samples. Sixth, CMO is formulated for analyzing common variants. Following STAAR (Li et al., 2020), CMO can be extended to analyze whole genome sequencing data when the DNA methylation prediction models (with rare variants as predictors) are built. As rare and low-frequency variants play a crucial and universal role in improving the gene expression prediction models (Yang et al., 2020), we expect that including rare and low-frequency variants may further improve the performance of CMO. We leave these interesting topics for future research.

Supplementary Material

btab045_Supplementary_Data

Click here for additional data file.^{(731.8KB, zip)}

Acknowledgement

The authors thank Associate Editor and three reviewers for helpful and insightful comments, which greatly improves the quality of the work. They thank Yanfa Sun for performing IPA analysis. They thank the individuals involved in the UK Biobank and GWAS datasets for their participation and the research teams for their work on collecting, processing and sharing these datasets. This research was conducted using the UK Biobank recourse (application number 48240), subject to a data transfer agreement. They acknowledge all of the studies that made their GWAS summary results publicly available.

Funding

This research was supported by the National Institutes of Health (NIH) grant R03AG070669. H.W.D. was partially supported by the NIH grants [P20GM109036, R01AR069055, U19AG055373 and MH104680].

Conlict of Interest

None declared.

Contributor Information

Chong Wu, Department of Statistics, Florida State University, Tallahassee, FL 32306, USA.

Jonathan Bradley, Department of Statistics, Florida State University, Tallahassee, FL 32306, USA.

Yanming Li, Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, KS 66160, USA.

Lang Wu, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, Honolulu, HI 96813, USA.

Hong-Wen Deng, Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, New Orleans, LA 70112, USA.

References

Alegría-Torres J.A. et al. (2011) Epigenetics and lifestyle. Epigenomics, 3, 267–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
Andersson R. et al. ; The FANTOM Consortium. (2014) An atlas of active enhancers across human cell types and tissues. Nature, 507, 455–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barbeira A.N. et al. (2019) Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet., 15, e1007889. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baselmans B.M. et al. ; BIOS consortium. (2019) Multivariate genome-wide analyses of the well-being spectrum. Nat. Genet., 51, 445–451. [DOI] [PubMed] [Google Scholar]
Bennett D.A. et al. (2012) Overview and findings from the religious orders study. Curr. Alzheimer Res., 9, 628–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buniello A. et al. (2019) The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res., 47, D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Canter R.G. et al. (2016) The road to restoring neural circuits for the treatment of Alzheimer’s disease. Nature, 539, 187–196. [DOI] [PubMed] [Google Scholar]
Chen H. et al. (2016) Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet., 98, 653–666. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen H. et al. (2019) Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. Am. J. Hum. Genet., 104, 260–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
De Jager P.L. et al. (2014) Alzheimer’s disease: early alterations in brain DNA methylation at ank1, bin1, rhbdf2 and other loci. Nat. Neurosci., 17, 1156–1163. [DOI] [PMC free article] [PubMed] [Google Scholar]
Deming Y. et al. ; Alzheimer’s Disease Neuroimaging Initiative (ADNI). (2017) Genome-wide association study identifies four novel loci associated with Alzheimer’s endophenotypes and disease modifiers. Acta Neuropathol., 133, 839–856. [DOI] [PMC free article] [PubMed] [Google Scholar]
Elliott L.T. et al. (2018) Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature, 562, 210–216. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fishilevich S. et al. (2017) GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database, 2017, bax028. [DOI] [PMC free article] [PubMed] [Google Scholar]
Freytag V. et al. (2018) Genetic estimators of DNA methylation provide insights into the molecular basis of polygenic traits. Transl. Psychiatry, 8, 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
Furlong E.E., Levine M. (2018) Developmental enhancers and chromosome topology. Science, 361, 1341–1345. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gamazon E.R. et al. ; GTEx Consortium. (2015) A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet., 47, 1091–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gamazon E.R. et al. (2019) Multi-tissue transcriptome analyses identify genetic mechanisms underlying neuropsychiatric traits. Nat. Genet., 51, 933–940. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gaunt T.R. et al. (2016) Systematic identification of genetic influences on methylation across the human life course. Genome Biol., 17, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gusev A. et al. (2016) Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet., 48, 245–252. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gutierrez-Arcelus M. et al. (2015) Tissue-specific effects of genetic and epigenetic variation on gene regulation and splicing. PLoS Genet., 11, e1004958. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hu Y. et al. ; Alzheimer’s Disease Genetics Consortium. (2019) A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet., 51, 568–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
Inestrosa N.C. et al. (2012) Wnt signaling: role in Alzheimer disease and schizophrenia. J. Neuroimmune Pharmacol., 7, 788–807. [DOI] [PubMed] [Google Scholar]
Jansen I.E. et al. (2019) Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet., 51, 404–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
Krämer A. et al. (2014) Causal analysis approaches in ingenuity pathway analysis. Bioinformatics, 30, 523–530. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kunkle B.W. et al. ; Alzheimer Disease Genetics Consortium (ADGC). (2019) Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates aβ, tau, immunity and lipid processing. Nat. Genet., 51, 414–430. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lambert J.-C. et al. ; European Alzheimer's Disease Initiative (EADI). (2013) Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet., 45, 1452–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee J.C. et al. (2019) Diagnosis of Alzheimer’s disease utilizing amyloid and tau as fluid biomarkers. Exp. Mol. Med., 51, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li X. et al. ; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium. (2020) Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet., 52, 969–983. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu J.Z. et al. (2017) Case–control association mapping by proxy using family history of disease. Nat. Genet., 49, 325–331. [DOI] [PubMed] [Google Scholar]
Liu Y., Xie J. (2020) Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J. Am. Stat. Assoc., 115, 393–402. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Y. et al. (2019) ACAT: a fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet., 104, 410–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lu F. et al. (2014) Role of TET proteins in enhancer activity and telomere elongation. Genes Dev., 28, 2103–2119. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lunnon K., Mill J. (2013) Epigenetic studies in Alzheimer’s disease: current findings, caveats, and considerations for future studies. Am. J. Med. Genet. B Neuropsychiatric Genet., 162, 789–799. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mancuso N. et al. (2019) Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet., 51, 675–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pan W. (2009) Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genet. Epidemiol., 33, 497–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pan W. et al. (2014) A powerful and adaptive association test for rare variants. Genetics, 197, 1081–1095. [DOI] [PMC free article] [PubMed] [Google Scholar]
Park J.Y. et al. (2018) Adaptive SNP-SET association testing in generalized linear mixed models with application to family studies. Behav. Genet., 48, 55–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rahman M.R. et al. (2020) Identification of molecular signatures and pathways to identify novel therapeutic targets in ALZHEIMER’S disease: insights from a systems biomedicine perspective. Genomics, 112, 1290–1299. [DOI] [PubMed] [Google Scholar]
Raj T. et al. (2018) Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer’s disease susceptibility. Nat. Genet., 50, 1584–1592. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rakyan V.K. et al. (2011) Epigenome-wide association studies for common human diseases. Nat. Rev. Genet., 12, 529–541. [DOI] [PMC free article] [PubMed] [Google Scholar]
Roubroeks J.A. et al. (2017) Epigenetics and DNA methylomic profiling in Alzheimer’s disease and other neurodegenerative diseases. J. Neurochemistry, 143, 158–170. [DOI] [PubMed] [Google Scholar]
Schoenfelder S., Fraser P. (2019) Long-range enhancer–promoter contacts in gene expression control. Nat. Rev. Genet., 20, 437–455. [DOI] [PubMed] [Google Scholar]
Shafi O. (2016) Inverse relationship between Alzheimer’s disease and cancer, and other factors contributing to Alzheimer’s disease: a systematic review. BMC Neurol., 16, 236. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith Z.D., Meissner A. (2013) DNA methylation: roles in mammalian development. Nat. Rev. Genet., 14, 204–220. [DOI] [PubMed] [Google Scholar]
Thurman R.E. et al. (2012) The accessible chromatin landscape of the human genome. Nature, 489, 75–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wainberg M. et al. (2019) Opportunities and challenges for transcriptome-wide association studies. Nat. Genet., 51, 592–599. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang D. et al. ; PsychENCODE Consortium. (2018) Comprehensive functional genomic resource and integrative model for the human brain. Science, 362, eaat8464. [DOI] [PMC free article] [PubMed] [Google Scholar]
Watson C.T. et al. (2016) Genome-wide DNA methylation profiling in the superior temporal gyrus reveals epigenetic signatures associated with Alzheimer’s disease. Genome Med., 8, 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu C., Pan W. (2018a) Integrating eQTL data with GWAS summary statistics in pathway-based analysis with application to schizophrenia. Genet. Epidemiol., 42, 303–316. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu C., Pan W. (2018b) Integration of enhancer-promoter interactions with GWAS summary results identifies novel schizophrenia-associated genes and pathways. Genetics, 209, 699–709. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu C., Pan W. (2019) Integration of methylation QTL and enhancer–target gene maps with schizophrenia GWAS summary results identifies novel genes. Bioinformatics, 35, 3576–3583. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu C., Pan W. (2020) A powerful fine-mapping method for transcriptome-wide association studies. Hum. Genet., 139, 199–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu Y. et al. (2018) Integrative analysis of omics summary data reveals putative mechanisms underlying complex traits. Nat. Commun., 9, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu H. et al. (2010) Distinctive RNA expression profiles in blood associated with white matter hyperintensities in brain. Stroke, 41, 2744–2749. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu Z. et al. (2017a) Imaging-wide association study: integrating imaging endophenotypes in GWAS. Neuroimage, 159, 159–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu Z. et al. (2017b) A powerful framework for integrating EQTL and GWAS summary data. Genetics, 207, 893–902. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang T. et al. (2020) Integrating DNA sequencing and transcriptomic data for association analyses of low-frequency variants and lipid traits. Hum. Mol. Genet., 29, 515–526. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang L. et al. (2019) Silencing of long noncoding RNA sox21-as1 relieves neuronal oxidative stress injury in mice with Alzheimer’s disease by upregulating fzd3/5 via the wnt signaling pathway. Mol. Neurobiol., 56, 3522–3537. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btab045_Supplementary_Data

Click here for additional data file.^{(731.8KB, zip)}

[btab045-B1] Alegría-Torres J.A. et al. (2011) Epigenetics and lifestyle. Epigenomics, 3, 267–277. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B2] Andersson R. et al. ; The FANTOM Consortium. (2014) An atlas of active enhancers across human cell types and tissues. Nature, 507, 455–461. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B3] Barbeira A.N. et al. (2019) Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet., 15, e1007889. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B4] Baselmans B.M. et al. ; BIOS consortium. (2019) Multivariate genome-wide analyses of the well-being spectrum. Nat. Genet., 51, 445–451. [DOI] [PubMed] [Google Scholar]

[btab045-B5] Bennett D.A. et al. (2012) Overview and findings from the religious orders study. Curr. Alzheimer Res., 9, 628–645. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B6] Buniello A. et al. (2019) The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res., 47, D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B7] Canter R.G. et al. (2016) The road to restoring neural circuits for the treatment of Alzheimer’s disease. Nature, 539, 187–196. [DOI] [PubMed] [Google Scholar]

[btab045-B8] Chen H. et al. (2016) Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am. J. Hum. Genet., 98, 653–666. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B9] Chen H. et al. (2019) Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. Am. J. Hum. Genet., 104, 260–274. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B10] De Jager P.L. et al. (2014) Alzheimer’s disease: early alterations in brain DNA methylation at ank1, bin1, rhbdf2 and other loci. Nat. Neurosci., 17, 1156–1163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B11] Deming Y. et al. ; Alzheimer’s Disease Neuroimaging Initiative (ADNI). (2017) Genome-wide association study identifies four novel loci associated with Alzheimer’s endophenotypes and disease modifiers. Acta Neuropathol., 133, 839–856. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B12] Elliott L.T. et al. (2018) Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature, 562, 210–216. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B13] Fishilevich S. et al. (2017) GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database, 2017, bax028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B14] Freytag V. et al. (2018) Genetic estimators of DNA methylation provide insights into the molecular basis of polygenic traits. Transl. Psychiatry, 8, 31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B15] Furlong E.E., Levine M. (2018) Developmental enhancers and chromosome topology. Science, 361, 1341–1345. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B16] Gamazon E.R. et al. ; GTEx Consortium. (2015) A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet., 47, 1091–1098. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B17] Gamazon E.R. et al. (2019) Multi-tissue transcriptome analyses identify genetic mechanisms underlying neuropsychiatric traits. Nat. Genet., 51, 933–940. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B18] Gaunt T.R. et al. (2016) Systematic identification of genetic influences on methylation across the human life course. Genome Biol., 17, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B19] Gusev A. et al. (2016) Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet., 48, 245–252. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B20] Gutierrez-Arcelus M. et al. (2015) Tissue-specific effects of genetic and epigenetic variation on gene regulation and splicing. PLoS Genet., 11, e1004958. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B21] Hu Y. et al. ; Alzheimer’s Disease Genetics Consortium. (2019) A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet., 51, 568–576. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B22] Inestrosa N.C. et al. (2012) Wnt signaling: role in Alzheimer disease and schizophrenia. J. Neuroimmune Pharmacol., 7, 788–807. [DOI] [PubMed] [Google Scholar]

[btab045-B23] Jansen I.E. et al. (2019) Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet., 51, 404–413. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B24] Krämer A. et al. (2014) Causal analysis approaches in ingenuity pathway analysis. Bioinformatics, 30, 523–530. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B25] Kunkle B.W. et al. ; Alzheimer Disease Genetics Consortium (ADGC). (2019) Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates aβ, tau, immunity and lipid processing. Nat. Genet., 51, 414–430. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B26] Lambert J.-C. et al. ; European Alzheimer's Disease Initiative (EADI). (2013) Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet., 45, 1452–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B27] Lee J.C. et al. (2019) Diagnosis of Alzheimer’s disease utilizing amyloid and tau as fluid biomarkers. Exp. Mol. Med., 51, 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B28] Li X. et al. ; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium. (2020) Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet., 52, 969–983. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B29] Liu J.Z. et al. (2017) Case–control association mapping by proxy using family history of disease. Nat. Genet., 49, 325–331. [DOI] [PubMed] [Google Scholar]

[btab045-B30] Liu Y., Xie J. (2020) Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J. Am. Stat. Assoc., 115, 393–402. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B31] Liu Y. et al. (2019) ACAT: a fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet., 104, 410–421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B32] Lu F. et al. (2014) Role of TET proteins in enhancer activity and telomere elongation. Genes Dev., 28, 2103–2119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B33] Lunnon K., Mill J. (2013) Epigenetic studies in Alzheimer’s disease: current findings, caveats, and considerations for future studies. Am. J. Med. Genet. B Neuropsychiatric Genet., 162, 789–799. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B34] Mancuso N. et al. (2019) Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet., 51, 675–682. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B35] Pan W. (2009) Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genet. Epidemiol., 33, 497–507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B36] Pan W. et al. (2014) A powerful and adaptive association test for rare variants. Genetics, 197, 1081–1095. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B37] Park J.Y. et al. (2018) Adaptive SNP-SET association testing in generalized linear mixed models with application to family studies. Behav. Genet., 48, 55–66. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B38] Rahman M.R. et al. (2020) Identification of molecular signatures and pathways to identify novel therapeutic targets in ALZHEIMER’S disease: insights from a systems biomedicine perspective. Genomics, 112, 1290–1299. [DOI] [PubMed] [Google Scholar]

[btab045-B39] Raj T. et al. (2018) Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer’s disease susceptibility. Nat. Genet., 50, 1584–1592. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B40] Rakyan V.K. et al. (2011) Epigenome-wide association studies for common human diseases. Nat. Rev. Genet., 12, 529–541. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B41] Roubroeks J.A. et al. (2017) Epigenetics and DNA methylomic profiling in Alzheimer’s disease and other neurodegenerative diseases. J. Neurochemistry, 143, 158–170. [DOI] [PubMed] [Google Scholar]

[btab045-B42] Schoenfelder S., Fraser P. (2019) Long-range enhancer–promoter contacts in gene expression control. Nat. Rev. Genet., 20, 437–455. [DOI] [PubMed] [Google Scholar]

[btab045-B43] Shafi O. (2016) Inverse relationship between Alzheimer’s disease and cancer, and other factors contributing to Alzheimer’s disease: a systematic review. BMC Neurol., 16, 236. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B44] Smith Z.D., Meissner A. (2013) DNA methylation: roles in mammalian development. Nat. Rev. Genet., 14, 204–220. [DOI] [PubMed] [Google Scholar]

[btab045-B45] Thurman R.E. et al. (2012) The accessible chromatin landscape of the human genome. Nature, 489, 75–82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B46] Wainberg M. et al. (2019) Opportunities and challenges for transcriptome-wide association studies. Nat. Genet., 51, 592–599. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B47] Wang D. et al. ; PsychENCODE Consortium. (2018) Comprehensive functional genomic resource and integrative model for the human brain. Science, 362, eaat8464. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B48] Watson C.T. et al. (2016) Genome-wide DNA methylation profiling in the superior temporal gyrus reveals epigenetic signatures associated with Alzheimer’s disease. Genome Med., 8, 5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B49] Wu C., Pan W. (2018a) Integrating eQTL data with GWAS summary statistics in pathway-based analysis with application to schizophrenia. Genet. Epidemiol., 42, 303–316. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B50] Wu C., Pan W. (2018b) Integration of enhancer-promoter interactions with GWAS summary results identifies novel schizophrenia-associated genes and pathways. Genetics, 209, 699–709. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B51] Wu C., Pan W. (2019) Integration of methylation QTL and enhancer–target gene maps with schizophrenia GWAS summary results identifies novel genes. Bioinformatics, 35, 3576–3583. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B52] Wu C., Pan W. (2020) A powerful fine-mapping method for transcriptome-wide association studies. Hum. Genet., 139, 199–213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B53] Wu Y. et al. (2018) Integrative analysis of omics summary data reveals putative mechanisms underlying complex traits. Nat. Commun., 9, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B54] Xu H. et al. (2010) Distinctive RNA expression profiles in blood associated with white matter hyperintensities in brain. Stroke, 41, 2744–2749. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B55] Xu Z. et al. (2017a) Imaging-wide association study: integrating imaging endophenotypes in GWAS. Neuroimage, 159, 159–169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B56] Xu Z. et al. (2017b) A powerful framework for integrating EQTL and GWAS summary data. Genetics, 207, 893–902. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B57] Yang T. et al. (2020) Integrating DNA sequencing and transcriptomic data for association analyses of low-frequency variants and lipid traits. Hum. Mol. Genet., 29, 515–526. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab045-B58] Zhang L. et al. (2019) Silencing of long noncoding RNA sox21-as1 relieves neuronal oxidative stress injury in mice with Alzheimer’s disease by upregulating fzd3/5 via the wnt signaling pathway. Mol. Neurobiol., 56, 3522–3537. [DOI] [PubMed] [Google Scholar]

PERMALINK

A gene-level methylome-wide association analysis identifies novel Alzheimer’s disease genes

Chong Wu

Jonathan Bradley

Yanming Li

Lang Wu

Hong-Wen Deng

Roles

Abstract

Motivation

Results

Availabilityand implementation

Supplementary information

1 Introduction

2 Materials and methods

2.1 Overview of CMO test

2.2 Details of CMO test

2.3 Simulation settings

2.4 Application to GWAS data

3 Results

3.1 Simulation studies

Table 1.

Fig. 1.

3.2 CMO identifies more associations than competing methods

Fig. 2.

3.3 CMO identifies replicable associated genes for AD

Fig. 3.

3.4 CMO identifies novel associated genes for AD

Fig. 4.

4 Discussion

Supplementary Material

Acknowledgement

Funding

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases