Biobank-wide association scan identifies risk factors for late-onset Alzheimer’s disease and endophenotypes

Donghui Yan; Bowen Hu; Burcu F Darst; Shubhabrata Mukherjee; Brian W Kunkle; Yuetiva Deming; Logan Dumitrescu; Yunling Wang; Adam Naj; Amanda Kuzma; Yi Zhao; Hyunseung Kang; Sterling C Johnson; Cruchaga Carlos; Timothy J Hohman; Paul K Crane; Corinne D Engelman; Alzheimer’s Disease Genetics Consortium (ADGC); Qiongshi Lu

doi:10.7554/eLife.91360

. 2024 May 24;12:RP91360. doi: 10.7554/eLife.91360

Biobank-wide association scan identifies risk factors for late-onset Alzheimer’s disease and endophenotypes

Donghui Yan ^1,^†, Bowen Hu ^2,^†, Burcu F Darst ³, Shubhabrata Mukherjee ⁴, Brian W Kunkle ⁵, Yuetiva Deming ³, Logan Dumitrescu ⁶, Yunling Wang ¹, Adam Naj ⁷, Amanda Kuzma ⁷, Yi Zhao ⁷, Hyunseung Kang ², Sterling C Johnson ^8,^9,¹⁰, Cruchaga Carlos ¹¹, Timothy J Hohman ⁶, Paul K Crane ⁴, Corinne D Engelman ^3,^8,¹⁰; Alzheimer’s Disease Genetics Consortium (ADGC), Qiongshi Lu ^2,^12,^✉

Editors: Nicholas Mancuso¹³, Timothy E Behrens¹⁴

PMCID: PMC11126309 PMID: 38787369

Abstract

Rich data from large biobanks, coupled with increasingly accessible association statistics from genome-wide association studies (GWAS), provide great opportunities to dissect the complex relationships among human traits and diseases. We introduce BADGERS, a powerful method to perform polygenic score-based biobank-wide association scans. Compared to traditional approaches, BADGERS uses GWAS summary statistics as input and does not require multiple traits to be measured in the same cohort. We applied BADGERS to two independent datasets for late-onset Alzheimer’s disease (AD; n=61,212). Among 1738 traits in the UK biobank, we identified 48 significant associations for AD. Family history, high cholesterol, and numerous traits related to intelligence and education showed strong and independent associations with AD. Furthermore, we identified 41 significant associations for a variety of AD endophenotypes. While family history and high cholesterol were strongly associated with AD subgroups and pathologies, only intelligence and education-related traits predicted pre-clinical cognitive phenotypes. These results provide novel insights into the distinct biological processes underlying various risk factors for AD.

Research organism: Human

Introduction

Late-onset AD is a prevalent, complex, and devastating neurodegenerative disease without a current cure. Millions of people are currently living with AD worldwide, and the number is expected to grow rapidly as the population continues to age (Prince et al., 2013; Reitz and Mayeux, 2014). With the failure of numerous drug trials, it is of great interest to identify modifiable risk factors that can be potential targets in the therapeutics development for AD (Østergaard et al., 2015; Larsson et al., 2017; Norton et al., 2014). Epidemiological studies that directly test associations between measured risk factors and AD are difficult to conduct and interpret because identified associations are, in many cases, affected by confounding and reverse causality. Despite being ubiquitous challenges in risk factor studies for complex diseases, these issues are particularly critical for AD due to its extended pre-clinical stage – irreversible pathologic changes have already occurred in the decade or two prior to clinical symptoms (Jack et al., 2013). On the other hand, Mendelian randomization methods (Sleiman and Grant, 2010; Davey Smith and Hemani, 2014; Zhu et al., 2018) have been developed to identify causal risk factors for disease using data from GWAS. Despite the favorable theoretical properties in identifying causal relationships, these methods have limited statistical power, thereby not suitable for hypothesis-free screening of risk factors.

Motivated by transcriptome-wide association study – an analysis strategy that identifies genes whose genetically regulated expression values are associated with disease (Gamazon et al., 2015; Gusev et al., 2016; Hu et al., 2018), we seek a systematic and statistically powerful approach to identify risk factors using summary association statistics from large-scale GWAS. GWAS for late-onset AD has been successful, and dozens of associated loci have been identified to date (Lambert et al., 2013; Harold et al., 2009; Hollingworth et al., 2011; Naj et al., 2011; Seshadri et al., 2010; Jun et al., 2017). Although direct information on risk factors is limited in these studies, dense genotype data on a large number of samples, in conjunction with independent reference datasets for thousands of complex human traits such as the UK biobank (Bycroft et al., 2017), make it possible to genetically impute potential risk factors and test their associations with AD. This strategy allows researchers to study risk factors that are not directly measured in AD studies. Furthermore, it reduces the reverse causality because the imputation models are trained on independent, younger, and mostly dementia-free reference cohorts, thereby improving the interpretability of findings.

Here, we introduce BADGERS (Biobank-wide Association Discovery using GEnetic Risk Scores), a statistically powerful and computationally efficient method to identify associations between a disease of interest and a large number of genetically imputed complex traits using GWAS summary statistics. We applied BADGERS to identify associated risk factors for AD from 1738 heritable traits in the UK biobank and replicated our findings in independent samples. Furthermore, we performed multivariate conditional analysis, Mendelian randomization, and follow-up association analysis with a variety of AD biomarkers, pathologies, and pre-clinical cognitive phenotypes to provide mechanistic insights into our findings.

Results

Method overview

Here, we briefly introduce the BADGERS model. The workflow of BADGERS is shown in Figure 1. A brief flowchart including all the analyses we contained in the manuscript was shown in the supplementary material (Figure 1—figure supplement 1). Complete statistical details are discussed in the Methods section. BADGERS is a two-stage method to test associations between traits. First, polygenic risk scores (PRS) are trained to impute complex traits using genetic data. Next, we test the association between a disease or trait of interest and various genetically-imputed traits. Given a PRS model, the imputed trait can be denoted as

\hat{T} = X W

Figure 1—figure supplement 1. — BADGERS takes (a) Alzheimer’s disease genome-wide association studies (GWAS), (b) linkage disequilibrium (LD) reference panel, and (c) Human traits GWAS from the UK biobank as input. The generated result will be the (d) Association between Alzheimer’s disease and human traits. In graph (d), each triangle represents one human trait, and different colors represent different trait categories.

where $X_{N \times M}$ is the genotype matrix for $N$ individuals in a GWAS, and $W_{M \times 1}$ is the Mx1 matrix denotes pre-calculated weight values on SNPs in the PRS model. Then, we test the association between measured trait $Y$ and imputed trait $\hat{T}$ via a univariate linear model.

Y = α + \hat{T} γ + δ

The test statistic for $γ$ can be expressed as:

Z = \frac{\hat{γ}}{s e (\hat{γ})} \approx W^{T} Γ \tilde{Z}

where $\tilde{Z}$ is the vector of SNP-level association z-scores for trait $Y$ , and $Γ$ is a diagonal matrix with the j^th diagonal element being the ratio between the standard deviation of the j^th SNP and that of $\hat{T}$ .

This model can be further generalized to perform multivariate analysis. If $K$ imputed traits are included in the analysis, we use a similar notation as in univariate analysis:

{\hat{T}}^{*} = X W^{*}

Here, $W_{M \times K}^{*}$ is a matrix and each column of $W^{*}$ is the pre-calculated weight values on SNPs for each imputed trait. Then, the associations between $Y$ and $K$ imputed traits ${\hat{T}}_{i} (1 \leq i \leq K)$ are tested via a multivariate linear model.

Y = α^{*} + {\hat{T}}^{*} γ^{*} + δ^{*}

where $γ^{*} = {(γ_{1}, \dots, γ_{K})}^{T}$ is the vector of regression coefficients. The z-score for $γ_{i} (1 \leq i \leq K)$ can be denoted as:

Z_{i} = \frac{{\hat{γ}}_{i}}{s e ({\hat{γ}}_{i})} \approx \frac{1}{\sqrt{U_{i i}}} I_{i}^{T} U {(W^{*})}^{T} Θ \tilde{Z}

where $U$ is the inverse variance-covariance matrix of ${\hat{T}}^{*}$ ; $I_{i}$ is the $K \times 1$ vector with the i^th element being 1 and all other elements equal to 0; is a $M \times M$ diagonal matrix with the i^th diagonal element being $\sqrt{v a r (X_{i})}$ ; and $\tilde{Z}$ is defined the same as the univariate case as the vector of SNP-level association z-scores for trait $Y$ .

Simulations

We used real genotype data from the Genetic Epidemiology Research on Adult Health and Aging (GERA) to conduct simulation analyses (Methods). First, we evaluated the performance of our method on data simulated under the null hypothesis. We tested the associations between randomly simulated traits and 1738 PRS from the UK biobank and did not observe inflation in type-I error (Supplementary file 1). Similar results were also observed when we simulated traits that are heritable but not directly associated with any PRS. Since BADGERS only uses summary association statistics and externally estimated linkage disequilibrium (LD) as input, we also compared effect estimates in BADGERS with those of traditional regression analysis based on individual-level data. Regression coefficient estimates and association p-values from these two methods were highly consistent in both simulation settings (Figure 2A and Figure 2—figure supplements 1–3), showing minimal information loss in summary statistics compared to individual-level data indicating highly consistent performance compared to methods based on individual-level data. To evaluate the statistical power of BADGERS, we simulated traits by combining effects from randomly selected PRS and a polygenic background (Methods). We set the effect size of PRS to be 0.02, 0.015, 0.01, 0.008, and 0.005. BADGERS showed comparable statistical power to the regression analysis based on individual-level genotype and phenotype data (Figure 2B, Supplementary file 1). Overall, our results suggest that using summary association statistics and externally estimated LD as a proxy for individual-level genotype and phenotype data does not inflate type-I error rate or decrease power. The performance of BADGERS is comparable to regression analysis based on individual-level data. We also studied if more sophisticated polygenic risk prediction methods could potentially lead to higher statistical power in downstream association tests. We compared the performance of PRS based on marginal effect sizes with that of LDpred, a more sophisticated PRS model that jointly estimates SNP effects via a Bayesian framework (Vilhjálmsson et al., 2015). Imputation models based on multivariate analysis indeed improved the results. When using marginal PRS to impute traits, the correlation between ${\hat{γ}}_{i}$ and $γ_{i}$ was 0.79. This correlation improved to 0.91 when using LDpred PRS (Figure 2—figure supplement 4). However, such improvement did not substantially affect the statistical power in association testing. Using marginal PRS, our analysis achieved a statistical power of 86% to identify associations at a type-I error rate of 0.05, and the power was 88% when using multivariate effect estimates to calculate PRS. These results suggest that while more sophisticated PRS methods can improve the results in BADGERS, simple PRS based on marginal effects also shows reasonably good performance.

Figure 2. — Biobank-wide Association Discovery using GEnetic Risk Scores (BADGERS) and regression analysis based on individual-level data showed (A) highly consistent effect size estimates for 1738 polygenic risk scores (PRS) in simulation and (B) comparable statistical power (setting 3).

Figure 2—figure supplement 1. — Biobank-wide Association Discovery using GEnetic Risk Scores (BADGERS) and regression analysis based on individual-level data showed (A) highly consistent effect size estimates for 1738 polygenic risk scores (PRS) in simulation and (B) comparable statistical power (setting 3).

Identify risk factors for late-onset AD among 1738 heritable traits in the UK biobank

We applied BADGERS to conduct a biobank-wide association scan (BWAS) for AD risk factors from 1738 heritable traits (p<0.05; Methods) in the UK biobank. We repeated the analysis on two independent GWAS datasets for AD and further combined the statistical evidence via meta-analysis (Figure 3—figure supplement 1). We used stage-I association statistics from the International Genomics of Alzheimer’s Project (IGAP; n=54,162) as the discovery phase, then replicated the findings using 7050 independent samples from the Alzheimer’s Disease Genetics Consortium (ADGC). We identified 50 significant trait-AD associations in the discovery phase after correcting for multiple testing, among which 14 had p<0.05 in the replication analysis. Despite the considerably smaller sample size in the replication phase, top traits identified in the discovery stage showed strong enrichment for p<0.05 in the replication analysis (enrichment = 2.5, p=2.2e-5; hypergeometric test). In the meta-analysis, a total of 48 traits reached Bonferroni-corrected statistical significance and showed consistent effect directions in the discovery and replication analyses (Figure 3 and Supplementary file 2).

Figure 3. — Meta-analysis p-values for 1738 heritable traits in the UK biobank are shown in the figure. p-values are truncated at 1e-15 for visualization purposes. The horizontal line marks the Bonferroni-corrected significance threshold (i.e. p=0.05/1738). Positive associations point upward, and negative associations point downward.

Figure 3—figure supplement 1. — Meta-analysis p-values for 1738 heritable traits in the UK biobank are shown in the figure. p-values are truncated at 1e-15 for visualization purposes. The horizontal line marks the Bonferroni-corrected significance threshold (i.e. p=0.05/1738). Positive associations point upward, and negative associations point downward.

Unsurprisingly, many identified associations were related to dementia and cognition. Family history of AD and dementia showed the most significant associations with AD (p=3.7e-77 and 5.2e-28 for illnesses of mother and father, respectively). Having any dementia diagnosis is also strongly and positively associated (p=8.5e-11). In addition, we observed consistent and negative associations between better performance in cognition test and AD risk. These traits include fluid intelligence score (p=2.4e-14), time to complete round in cognition test (p=2.8e-9), correct final attempt (p=9.1e-11), and many others. Consistently, education attainment showed strong associations with AD. Age completed full time education (p=2.5e-7) was associated with lower AD risk. Four out of seven traits based on a survey about education and qualifications were significantly associated with AD (Figure 3—figure supplement 2). Higher education such as having a university degree (p=4.4e-12), A levels/AS levels or equivalent (p=6.9e-9), and professional qualifications (p=7.1e-6) were associated with lower AD risk. In contrast, choosing ‘none of the above’ in this survey was associated with a higher risk (p=1.6e-11). Other notable strong associations include high cholesterol (p=2.5e-15; positive), lifestyle traits such as cheese intake (p=2.5e-10; negative), occupation traits such as job involving heavy physical work (p=2.7e-10; positive), anthropometric traits including height (p=5.3e-7; negative), and traits related to pulmonary function, e.g., forced expiratory volume in 1 s (FEV1; p=1.9e-6; negative). Detailed information on all associations is summarized in Supplementary file 2.

Multivariate conditional analysis identifies independently associated risk factors

Of note, associations identified in the marginal analysis are not guaranteed to be independent. We observed clear correlational structures among the identified traits (Figure 4). For example, PRS of various intelligence and cognition-related traits are strongly correlated, and consumption of cholesterol-lowering medication is correlated with self-reported high cholesterol. To account for the correlations among traits and identify risk factors that are independently associated with AD, we performed multivariate conditional analysis using GWAS summary statistics (Methods). First, we applied hierarchical clustering to the 48 traits we identified in marginal association analysis and divided these traits into 15 representative clusters. The traits showing the most significant marginal association in each cluster were included in the multivariate analysis (Figure 4—figure supplement 1). Similar to the marginal analysis, we analyzed IGAP and ADGC data separately and combined the results using meta-analysis (Supplementary file 2). All 15 representative traits remained nominally significant (p<0.05) and showed consistent effect directions between marginal and conditional analyses (Supplementary file 1). However, several traits showed substantially reduced effect estimates and inflated p-values in multivariate analysis, including fluid intelligence score, mother still alive, unable to work because of sickness or disability, duration of moderate activity, and intake of cholesterol-lowering spread. Interestingly, major trait categories that showed the strongest marginal associations with AD (i.e. family history, high cholesterol, and education/cognition) were independent from each other. Paternal and maternal family history also showed independent associations with AD, consistent with the low correlation between their genetic risk scores (correlation = 0.052).

Figure 4. — Trait categories and association directions with Alzheimer’s disease (AD) are annotated. The dendrogram indicates the results of hierarchical clustering. We used 1000 genome samples with European ancestry to calculate PRS and evaluate their correlations. Label ‘irnt’ means that trait values were standardized using rank-based inverse normal transformation in the genome-wide association study (GWAS) analysis.

Figure 4—figure supplement 1. — Trait categories and association directions with Alzheimer’s disease (AD) are annotated. The dendrogram indicates the results of hierarchical clustering. We used 1000 genome samples with European ancestry to calculate PRS and evaluate their correlations. Label ‘irnt’ means that trait values were standardized using rank-based inverse normal transformation in the genome-wide association study (GWAS) analysis.

Influence of the APOE region on identified associations

Furthermore, we evaluated the impact of APOE on identified associations. We removed the extended APOE region (chr19: 45,147,340–45,594,595; hg19) from summary statistics of the 48 traits showing significant marginal associations with AD and repeated the analysis. We observed a substantial drop in the significance level of many traits, especially family history of AD, dementia diagnosis, and high cholesterol (Figure 5, Figure 5—figure supplement 1, and Supplementary file 2). 38 out of 48 traits remained significant under stringent Bonferroni correction after APOE removal. Interestingly, the associations between AD and almost all cognition/intelligence traits were virtually unchanged, suggesting a limited role of APOE in these associations.

Figure 5. — The horizontal and vertical axes denote association p-values before and after removal of the *APOE* region, respectively. Original p-values (i.e. the x-axis) were truncated at 1e-20 for visualization purposes.

Figure 5—figure supplement 1. — The horizontal and vertical axes denote association p-values before and after removal of the *APOE* region, respectively. Original p-values (i.e. the x-axis) were truncated at 1e-20 for visualization purposes.

Causal inference via Mendelian randomization

Next, we investigated the evidence for causality among identified associations. We performed Mendelian randomization (MR-IVW; Methods) in IGAP and ADGC datasets separately and meta-analyzed the results on the complete set of 1738 heritable traits from the UK biobank. A total of 48 traits reached Bonferroni-corrected statistical significance and showed consistent effect directions in the discovery and replication analyses using BADGERS. In contrast, MR-IVW only identified nine traits with Bonferroni-corrected statistical significance. Among these nine traits, seven were also identified by BADGERS (Supplementary file 1). The signs of all significant causal effects identified by MR-IVW were consistent with results from BADGERS. The most significant effect was family history (p=1.1e-233 and 1.7e-69 for maternal and paternal history, respectively). Dementia diagnosis (p=9.1e-7), high cholesterol (p=4.1e-6), A levels/AS levels education (p=1.7e-4), and time spent watching television (p=2.4e-4) were also among the top significant effects. Of note, the fluid intelligence score, one of the most significant associations identified by BADGERS, did not reach statistical significance in MR (p=0.06), which may be explained by its polygenic genetic architecture. It is also worth noting that if we scan all 1738 traits using BADGERS and then apply MR-IVW on the 48 Bonferroni-corrected significant traits, 23 could reach nominal significance (p<0.05) in MR, and seven could reach significance under Bonferroni correction (p<0.05/48; Supplementary file 1).

We also compared BADGERS with another more recent method GSMR (Zhu et al., 2018). Due to the smaller sample size in the ADGC dataset, we only applied GSMR to the IGAP summary statistics. In total, 18 traits reached statistical significance under Bonferroni correction (Supplementary file 1). However, these results showed only moderate consistency with MR-IVW and BADGERS. Among the 18 significant traits, only 1 trait, maternal family history of Alzheimer’s disease and dementia, overlapped with significant traits identified by MR-IVW. Six out of 18 traits overlapped with significant traits identified by BADGERS. Among the 18 significant traits, eight are related to body fat mass and two are related to educational attainment. The most significant effect was illnesses of mother (p=2.4e-294). College or University degree (p=4.84e-6), education; none of the above (p=3.6e-4), A levels/AS levels education (p=3.8e-6), and time spent watching television (p=4.0e-3) were also among top significant effects. Notably, GSMR failed to identify paternal family history or high cholesterol as risk factors for Alzheimer’s disease. If we only consider the 48 significant traits identified by BADGERS, 11 were nominally significant (p<0.05). However, 23 traits did not have enough significant SNPs to perform the GSMR analysis (at least 10 SNPs are required). The signs of all significant causal effects identified by GSMR were consistent between association effects in BADGERS.

Additionally, we included GSMR analysis results after removing APOE region from the 48 identified traits. Only maternal family history reached Bonferroni-corrected statistical significance, further demonstrating the lack of statistical power in MR when performing biobank-wide scans (Supplementary file 1).

Associations with AD subgroups, biomarkers, and pathologies

To further investigate the mechanistic pathways for the identified risk factors, we applied BADGERS to a variety of AD subgroups, biomarkers, and neuropathologic features (Supplementary file 1). Overall, 29 significant associations were identified under a false discovery rate (FDR) cutoff of 0.05, and these endophenotypes showed distinct association patterns with AD risk factors (Figure 6; Figure 6—figure supplement 1). First, we tested the associations between the 48 AD-associated traits and five AD subgroups defined in the Executive Prominent Alzheimer’s Disease (EPAD) study, i.e., memory, language, visuospatial, none, and mix (Methods) (Mukherjee et al., 2018; Crane et al., 2017). Maternal family history of AD and dementia was strongly and consistently associated with all five EPAD subgroups (Supplementary file 2), with memory subgroup showing the strongest association (p=3.3e-16), which is consistent with the higher frequency of APOE ε4 in this subgroup (Mukherjee et al., 2018). Paternal family history was not strongly associated with any subgroups, but the effect directions were consistent. Interestingly, intelligence and cognition-related traits such as correct final attempt in cognitive test (p=2.7e-5) and fluid intelligence score (p=6.8e-5) were specifically associated with the ‘none’ subgroup – AD samples without relative impairment in any of the four cognitive domains. High cholesterol and related traits were associated with language, memory, and mix (i.e. AD samples with relative impairment in two or more domains) subgroups but showed weaker associations with the visuospatial and none subgroups.

Figure 6. — Asterisks denote significant associations based on an false discovery rate (FDR) cutoff of 0.05. p-values are truncated at 1e-5 for visualization purposes.

Figure 6—figure supplement 1. — Asterisks denote significant associations based on an false discovery rate (FDR) cutoff of 0.05. p-values are truncated at 1e-5 for visualization purposes.

Next, we extended our analysis to three biomarkers of AD in cerebrospinal fluid (CSF): amyloid beta (Aβ₄₂), tau, and phosphorylated tau (ptau₁₈₁) (Deming et al., 2017). Somewhat surprisingly, AD risk factors did not show strong associations with Aβ₄₂ and tau (Supplementary file 2). Maternal family history of AD and dementia was associated with ptau₁₈₁ (p=4.2e-4), but associations were absent for Aβ₄₂ and tau. It has been recently suggested that CSF biomarkers have a sex-specific genetic architecture (Deming et al., 2018). However, no association passed an FDR cutoff of 0.05 in our sex-stratified analyses (Supplementary file 2).

Furthermore, we applied BADGERS to a variety of neuropathologic features of AD and related dementias (Methods), including neuritic plaques (NPs), neurofibrillary tangles (NFTs), cerebral amyloid angiopathy (CAA), lewy body disease (LBD), hippocampal sclerosis (HS), and vascular brain injury (VBI) (Beecham et al., 2014). Family history of AD/dementia (p=3.8e-8, maternal; p=1.4e-5, paternal) and high cholesterol (p=2.1e-5) were strongly associated with NFT Braak stages (Supplementary file 2). NP also showed very similar association patterns with these traits (p=2.7e-19, maternal family history; p=2.6e-7, paternal family history; p=0.001, high cholesterol). The other neuropathologic features did not show strong associations. Of note, despite not being statistically significant, family history of AD/dementia was negatively associated with VBI, and multiple intelligence traits were positively associated with LBD, showing distinct patterns with other pathologies (Figure 6—figure supplement 2). We also note that various versions of the same pathologies all showed consistent associations in our analyses (Figure 6—figure supplement 2). The complete association results for all the endophenotypes and all the traits are summarized in Supplementary file 2. We further identified the influence of the APOE region in these results. The association results for all the endophenotypes with APOE Region being removed are summarized in Supplementary file 2.

Associations with cognitive traits in a pre-clinical cohort

Finally, we studied the associations between AD risk factors and pre-clinical cognitive phenotypes using 1198 samples from the Wisconsin Registry for Alzheimer’s Prevention (WRAP), a longitudinal study of initially dementia-free middle-aged adults (Johnson et al., 2018). Assessed phenotypes include mild cognitive impairment (MCI) status and three cognitive composite scores for executive function, delayed recall, and learning (Methods). A total of 12 significant associations reached an FDR cutoff of 0.05 (Supplementary file 2). Somewhat surprisingly, parental history and high cholesterol, the risk factors that showed the strongest associations with various AD endophenotypes, were not associated with MCI or cognitive composite scores in WRAP. Instead, education and intelligence-related traits strongly predicted pre-clinical cognition (Figure 7). A-levels education and no education both showed highly significant associations with delayed recall (p=4.0e-5 and 7.7e-7) and learning (p=7.6e-6 and 5.0e-8). No education was also associated with higher risk of MCI (p=2.5e-4). Additionally, fluid intelligence score was positively associated with the learning composite score (p=7.5e-4), and time to complete round in cognition test was negatively associated with the executive function (p=1.1e-5).

Figure 7. — Error bars denote the standard error of effect estimates. N=1,198.

Discussion

In this work, we introduced BADGERS, a new method to perform association scans at the biobank scale using genetic risk scores and GWAS association statistics. Through simulations, we demonstrated that our method provides consistent effect estimates and similar statistical power compared to regression analysis based on individual-level data. Additionally, we applied BADGERS to two large and independent GWAS datasets for late-onset AD. In our analyses, we used GWAS summary statistics from the UK biobank, one of the largest genetic cohort in the world, to generate PRS for complex traits. We estimated heritability for all available traits in the UK biobank and only included traits with nominally significant heritability (p<0.05) in our analyses. The GWAS summary statistics for Alzheimer’s disease were also obtained from the largest available study – International Genomics of Alzheimer’s Project (IGAP) and we further sought replication using a large, independent dataset from the Alzheimer’s Disease Genetics Consortium (ADGC). Overall, we are confident that these quality control procedures largely controlled the false findings in our study. Among 1738 heritable traits in the UK biobank, we identified 48 traits showing statistically significant associations with AD. These traits covered a variety of categories, including family history, cholesterol, intelligence, education, occupation, and lifestyle. Although many of the identified traits are genetically correlated, multivariate conditional analysis confirmed multiple strong and independent associations for AD. Family history showing strong associations with AD is not a surprise, and many other associations are supported by the literature as well. The protective effect of higher educational and occupational attainment on the risk and onset of dementia is well studied (Valenzuela and Sachdev, 2006; Stern, 2012). Cholesterol buildup is also known to associate with β-amyloid plaques in the brain and higher AD risk (Reed et al., 2014; Djelti et al., 2015; Simons et al., 2001).

More interestingly, these identified traits had distinct association patterns with various AD subgroups, biomarkers, pathologies, and pre-clinical cognitive traits. Five cognitively-defined AD subgroups were consistently associated with maternal family history, but only the group without substantial relative impairment in any domain (i.e. EPAD_none) was associated with intelligence and education. In addition, family history and high cholesterol were strongly associated with classic AD neuropathologies, including NP and NFT, while intelligence and educational attainment predicted pre-clinical cognitive scores and MCI. These results suggest that various AD risk factors may affect the disease course at different time points and via distinct biological processes, and genetically predicted risk factors for clinical AD include at least two separate components. While some risk factors (e.g. high cholesterol and APOE) may directly contribute to the accumulation of pathologies, other factors (e.g. intelligence and education) may buffer the adverse effect of brain pathology on cognition (Stern, 2012). One possible scenario is that family history and high cholesterol are the fundamental causes of AD while education level and intelligence are the parameters of such factors. While if one didn’t have such a factor in the first stage, they are protected from getting AD, if someone with such factor and also has high score in education attainment or intelligence, they can also get rid of the possibility of getting AD. We also investigated the influence of APOE on the identified associations. Effects of family history and high cholesterol were substantially reduced after APOE removal. In contrast, associations with cognition and education were virtually unchanged. These results suggest that various AD risk factors may affect the disease course at different time points and via distinct biological processes. While some risk factors (e.g. high cholesterol and APOE) may directly contribute to the accumulation of pathologies, other factors (e.g. intelligence and education) reduce the adverse effect of brain pathology on cognition (Stern, 2012).

Furthermore, we note that the association results in BADGERS need to be interpreted with caution. Although PRS-based association analysis is sometimes treated as causal inference in the literature (Paternoster et al., 2017), we do not see BADGERS as a tool to identify causal factors. Key assumptions in causal inference are in many cases, violated when analyzing complex, highly polygenic traits, which may lead to complications when interpreting results. In our analysis, BADGERS showed superior statistical power than MR-IVW – among 1738 heritable traits, 48 reached Bonferroni significance in BADGERS, 9 and 18 traits reached Bonferroni significance in MR-IVW and GSMR, respectively. Among the 48 traits identified by BADGERS, 23 reached nominal statistical significance in MR-IVW and 11 were nominally significant in GSMR. BADGERS is a statistically powerful and computationally efficient method for identifying associations between a disease of interest and genetically imputed complex traits. Due to the capability of utilizing PRS with a large number of SNPs to impute complex traits, BADGERS has substantially improved statistical power compared to MR methods. And because of this, it can serve as a hypothesis-free method to screen for candidate risk factors from biobank-scale datasets with an overwhelming number of traits. After a list of candidate risk factors is identified using BADGERS, MR methods can be applied to carefully demonstrate causality. We envision BADGERS as a tool to prioritize associations among a large number of candidate risk factors so that robust causal inference methods can be applied to carefully assess causal effects. In addition, BADGERS requires a reference panel to provide LD estimates as a summary statistics-based method. If the population in the reference panel does not match that of the GWAS, it may create bias in the analysis. Our simulation results suggest that 1000 Genomes European samples is sufficient for our analysis when the GWAS was also conducted on European samples. Our implemented BADGERS software is flexible on the choice of LD reference panel. It allows users to change the reference dataset when they see fit.

What’s more, environmental factors may play a big role in the identified associations. There is little doubt that the environment could influence many complex traits, including the ones highlighted by the reviewer. However, this does not necessarily mean that these traits cannot also have a genetic component (or be genetically heritable). we summarized the heritability estimates for the 48 traits identified in our BADGERS meta-analysis of two independent datasets for Alzheimer’s disease (Supplementary file 1), and all of them have nominally significant heritability estimates (p<0.05) based on our selection criteria. Nevertheless, we do acknowledge that the high heritability of these traits is influenced by correlations with other traits. For example, job involving heavy manual or physical work is genetically correlated with educational attainment (Figure 3), which indicates that the association between this trait and Alzheimer’s disease may not be direct. Therefore, it is important to note that association results from BADGERS analysis need to be interpreted with caution.

Limited sample size in AD endophenotypes is another limitation in our study. We have used data from the largest available GWAS for CSF biomarkers and neuropathologies. Still, the small sample size made it challenging to assess the effects of traits that were weakly associated with AD. When an independent validation dataset is available, it would be of interest to assess the prediction accuracy of PRS on each trait. However, external validation datasets rarely exist in real applications. In that case, the users may choose to use heritability estimates to filter traits with a substantial genetic component. Furthermore, in the BADGERS framework, PRS are independent variables in the regression analysis. If the PRS has limited predictive power, such noise is similar to measurement errors in standard regression analysis. This may decrease the statistical power in association tests but does not inflate the type-I error rate. Finally, emerging evidence has highlighted the sex-specific genetic architecture of AD (Deming et al., 2018; Hohman et al., 2018). In our analysis, maternal family history of AD showed stronger associations with various phenotypes than paternal family history. However, we note that this may be explained by the sample size difference in the UK biobank (N_case = 28,507 and 15,022 for samples with maternal and paternal family history, respectively). We also performed sex-stratified analyses for CSF biomarkers but identified limited associations, possibly due to the small sample size. Overall, sex-specific effects of risk factors remain to be investigated in the future using larger datasets. In total, BADGERS requires the training data for genetic prediction models and the downstream disease GWAS to be independent but of similar genetic ancestry. Development of methods that are more robust to sample overlap and diverse genetic ancestry remains an open problem for future research.

In conclusion, BADGERS is a statistically powerful method to identify associated risk factors for complex diseases. Large-scale biobanks continue to provide rich data on various human traits that may be of interest in disease research. Our method uses GWAS to bridge large biobanks with studies on specific diseases, lessens the limitation of insufficient disease cases in biobanks and lack of risk factor measurements in disease studies, and provides a statistically justified approach to identifying risk factors for disease. We have demonstrated the effectiveness of BADGERS through extensive simulations, a two-stage BWAS for late-onset AD, and various follow-up analyses on identified risk factors. Our results provided new insights into the genetic basis of AD, and revealed distinct mechanisms for the involvement of risk factors in AD etiologies. The ever-growing sample size in GWAS and biobanks, in conjunction with increasingly accessible summary association statistics, makes BADGERS a powerful and valuable tool in human genetics research.

Methods

BADGERS framework

The goal of this method is to study the association between $Y$ , a measured trait in the study, and $\hat{T}$ , a trait imputed from genetic data via a linear prediction model:

\hat{T} = X W

Here, $X_{N \times M}$ is the genotype matrix for $N$ individuals in a study of trait $Y$ . $W_{M \times 1}$ is the pre-calculated weight values on SNPs in the imputation model. $M$ denotes the number of SNPs. We use $Y$ , a $N \times 1$ vector, to denote the trait values measured on the same group of individuals. We test the association between $Y$ and $\hat{T}$ via a linear model.

Y = α + \hat{T} γ + δ

where $α$ is the intercept, $δ$ is the term for random noise, and regression coefficient $γ$ is the parameter of interest. The ordinary least squares (OLS) estimator for $γ$ can be denoted as,

\begin{array}{ll} \hat{γ} = \frac{c o v (\hat{T}, Y)}{v a r (\hat{T})} = \frac{c o v (X W, Y)}{v a r (\hat{T})} = \frac{1}{v a r (\hat{T})} W^{T} (\begin{matrix} c o v (X_{1}, Y) \\ ⋮ \\ c o v (X_{M}, Y) \end{matrix}) \end{array}

Here, $X_{j}$ is the j^th column of $X$ . Additionally, we derive the formula for the standard error of $\hat{γ}$ :

s e (\hat{γ}) = \sqrt{\frac{v a r (δ)}{N \times v a r (\hat{T})}} \approx \sqrt{\frac{v a r (Y)}{N \times v a r (\hat{T})}}

The approximation in this formula is based on the assumption that trait $Y$ has complex etiology and imputed trait $\hat{T}$ only explains a small proportion of its phenotypic variance. When an accurate estimate of $v a r (δ)$ is difficult to obtain, this approximation approach provides conservative results and controls type-I error in the analysis.

In practice, individual-level genotype (i.e. $X$ ) and phenotype data (i.e. $Y$ ) may not be accessible due to policy and privacy concerns. Therefore, it is of practical interest to perform the aforementioned association analysis using summary association statistics. Standard genetic association analysis tests the association between trait $Y$ and each SNP via the following linear model:

Y = μ_{j} + X_{j} β_{j} + ε_{j} (1 \leq j \leq M)

The OLS estimator for $β_{j}$ and its standard error have the following forms.

{\hat{β}}_{j} = \frac{c o v (X_{j}, Y)}{v a r (X_{j})}

s e ({\hat{β}}_{j}) = \sqrt{\frac{v a r (ε_{j})}{N \times v a r (X_{j})}} \approx \sqrt{\frac{v a r (Y)}{N \times v a r (X_{j})}}

Again, the approximation is based on the empirical observation in complex trait genetics – each SNP explains little variability of $Y$ (Manolio et al., 2009).

Next, we derive the test statistic (i.e. z-score) for $γ$ :

\begin{array}{ll} Z & = \frac{\hat{γ}}{s e (\hat{γ})} \\ \approx \sqrt{\frac{N}{v a r (Y) \times v a r (\hat{T})} W^{T}} (\begin{matrix} c o v (X_{1}, Y) \\ ⋮ \\ c o v (X_{M}, Y) \end{matrix}) \\ \approx \sqrt{\frac{1}{v a r (\hat{T})} W^{T}} (\begin{matrix} \frac{\sqrt{v a r (X_{1})} \hat{β_{1}}}{s e (\hat{β_{1}})} \\ ⋮ \\ \frac{\sqrt{v a r (X_{M})} \hat{β_{M}}}{s e (\hat{β_{M}})} \end{matrix}) \\ = W^{T} Γ \tilde{Z} \end{array}

where $Γ$ is a diagonal matrix with the j^th diagonal element being

Γ_{j j} = \sqrt{\frac{v a r (X_{j})}{v a r (\hat{T})}}

and $\tilde{Z}$ is the vector of SNP-level z-scores obtained from the GWAS of trait $Y$ , i.e.,

{\tilde{Z}}_{j} = \frac{{\hat{β}}_{j}}{s e ({\hat{β}}_{j})}

Without access to individual-level genotype data, $v a r (X_{j})$ and $v a r (\hat{T})$ need to be estimated using an external panel with a similar ancestry background. We use $\tilde{X}$ to denote the genotype matrix from an external cohort, then $v a r (X_{j})$ can be approximated using the sample variance of ${\tilde{X}}_{j}$ . Variance of $\hat{T}$ can be approximated as follows

v a r (\hat{T}) \approx W^{T} \tilde{D} W

where $\tilde{D}$ is the variance-covariance matrix of all SNPs estimated using $\tilde{X}$ . However, when the number of SNPs is large in the imputation model for trait $T$ , calculation of $\tilde{D}$ is computationally intractable. Instead, we use an equivalent but computationally more efficient approach. We first impute trait $T$ in the external panel using the same imputation model

\tilde{T} = \tilde{X} W

Then, $v a r (\hat{T})$ can be approximated by sample variance $v a r (\tilde{T})$ .

Thus, we can test the association between $Y$ and $\hat{T}$ without having access to individual-level genotype and phenotype data from the GWAS. The required input variables for BADGERS include a linear imputation model for trait $T$ , SNP-level summary statistics from a GWAS of trait $Y$ , and an external panel of genotype data. With these, the association test can be performed.

Multivariate analysis in BADGERS

To adjust for potential confounding effects, it may be of interest to include multiple imputed traits in the same BADGERS model. We still use $Y$ to denote the measured trait of interest. The goal is to perform a multiple regression analysis using $K$ imputed traits (i.e. ${\hat{T}}_{1}$ ,..., ${\hat{T}}_{K}$ ) as predictor variables:

Y = {\hat{T}}^{*} γ^{*} + δ^{*}

Here, we use ${\hat{T}}^{*} = ({\hat{T}}_{1}, \dots, {\hat{T}}_{K})$ to denote a $N \times K$ matrix for $K$ imputed traits. Regression coefficients $γ^{*} = {(γ_{1}, \dots, γ_{K})}^{T}$ are the parameters of interest. To simplify algebra, we also assume trait $Y$ and all SNPs in the genotype matrix $X$ are centered so there is no intercept term in the model, but the conclusions apply to the general setting. Similar to univariate analysis, traits ${\hat{T}}_{1}, \dots, {\hat{T}}_{K}$ are imputed from genetic data via linear prediction models:

{\hat{T}}^{*} = X W^{*}

where $W_{M \times K}^{*}$ are imputation weights assigned to SNPs. The i^th column of $W$ denotes the imputation model for trait $T_{i}$ . Then, the OLS estimator ${\hat{γ}}^{*}$ and its variance-covariance matrix can be denoted as follows:

{\hat{γ}}^{*} = {({({\hat{T}}^{*})}^{T} {\hat{T}}^{*})}^{- 1} {({\hat{T}}^{*})}^{T} Y

c o v ({\hat{γ}}^{*}) \approx v a r (Y) {({({\hat{T}}^{*})}^{T} {\hat{T}}^{*})}^{- 1}

The approximation is based on the assumption that imputed traits ${\hat{T}}_{1}, \dots, {\hat{T}}_{K}$ collectively explain little variance in $Y$ , which is reasonable in complex trait genetics if $K$ is not too large. We further denote:

U : = N {({({\hat{T}}^{*})}^{T} {\hat{T}}^{*})}^{- 1} = {(\begin{array}{ccc} v a r ({\hat{T}}_{1}) & \dots & c o v ({\hat{T}}_{1}, {\hat{T}}_{K}) \\ ⋮ & ⋱ & ⋮ \\ c o v ({\hat{T}}_{K}, {\hat{T}}_{1}) & \dots & v a r ({\hat{T}}_{K}) \end{array})}^{- 1}

All elements in matrix $U$ can be approximated using a reference panel $\tilde{X}$ (Dudbridge, 2013):

c o v ({\hat{T}}_{i}, {\hat{T}}_{j}) \approx c o v ({\tilde{T}}_{i}, {\tilde{T}}_{j})

Therefore, the z-score for $γ_{k} (1 \leq k \leq K)$ is

\begin{array}{ll} Z_{k} & = \frac{\hat{γ_{k}}}{s e (\hat{γ_{k}})} \\ = \frac{I_{k}^{T} U (W^{*})^{T} X^{T} Y}{\sqrt{N U_{k k} v a r (Y)}} \\ = \frac{1}{\sqrt{U_{k k}}} I_{k}^{T} U {(W^{*})}^{T} Θ \tilde{Z} \end{array}

where $I_{k}$ is the $K \times 1$ vector with the k^th element being 1 and all other elements equal to 0, is a $M \times M$ diagonal matrix with the i^th diagonal element being $\sqrt{v a r (X_{i})}$ , and similar to the notation in univariate analysis, $\tilde{Z}$ is the vector of SNP-level z-scores from the GWAS of trait $Y$ . Given imputation models for $K$ traits (i.e. $W^{*}$ ), GWAS summary statistics for trait $Y$ (i.e. $\tilde{Z}$ ), and an external genetic dataset to estimate $U$ and , multivariate association analysis can be performed without genotype and phenotype data from the GWAS.

Genetic prediction

Any linear prediction model can be used in the BADGERS framework. With access to individual-level genotype and phenotype data, the users can train their preferred statistical learning models, e.g., penalized regression or linear mixed model. When only GWAS summary statistics are available for risk factors (i.e. $T$ ), PRS can be used for imputation. We used PRS to impute complex traits in all analyses throughout the paper. Of note, more advanced PRS methods that explicitly model LD (Vilhjálmsson et al., 2015) and functional annotations (Hu et al., 2017) to improve prediction accuracy have been developed. However, additional independent datasets may be needed if there are tuning parameters in PRS. In general, higher imputation accuracy will improve statistical power in association testing (Hu et al., 2018). The BADGERS software allows users to choose their preferred imputation model.

Simulation settings

We simulated quantitative traits using genotype data of 62,313 individuals from the GERA cohort (dbGap accession: phs000674). Summary association statistics were generated using PLINK (Purcell et al., 2007). We ran BADGERS on summary statistics based on the simulated traits and PRS of 1738 traits in the UK biobank. To compare BADGERS with the traditional approach that uses individual-level data as input, we also directly regressed simulated traits on the PRS of UK biobank traits to estimate association effects.

Setting 1

We simulated quantitative trait values as i.i.d. samples from normal distribution with mean 0 and variance 1. In this setting, simulated trait values were independent from genotype data.

Setting 2

We simulated quantitative trait values based on an additive random effect model commonly used in heritability estimation (Yang et al., 2015). We fixed heritability to be 0.1. In this setting, the simulated trait is associated with SNPs, but is not directly related to PRS of UK biobank traits.

Setting 3

We selected 100 traits from 1738 UK-Biobank traits to calculate PRS on GERA data. For each of these 100 PRS, we simulated a quantitative trait by summing up the effect of PRS, a polygenic genetic background, and a noise term.

Y = X β + ρ P + ε

Here, $X$ denotes the genotype of samples; $β$ is the effect size of each variant; $P$ is the PRS of one of the selected traits; $ρ$ is the effect size of PRS; and $ε$ is the error term following a standard normal distribution. The polygenic background and random noise (i.e. $X β + ε$ ) were simulated using the same model described in setting 2. This term and the PRS were normalized separately. The standardized effect size (i.e. $ρ$ ) was set as 0.02, 0.015, 0.01, 0.008, and 0.005 in our simulations. In this setting, simulated traits are directly associated with SNPs and PRS. For each value of $ρ$ , statistical power was calculated as the proportion of significant results (p<0.05) out of 100 traits.

Setting 4

We simulated 100 quantitative traits $T_{1}, \dots, T_{100}$ based on an additive random effect model commonly used with heritability fixed as 0.1. And the response traits $Y_{1}, \dots, Y_{100}$ were simulated by adding a noise term to $T .$

Y_{i} = γ_{i} T_{i} + ε_{i}

Where $γ_{i} \sim N (0, 2)$ , and $ε_{i} \sim N (0, V a r (T_{i})) .$ The dataset was split into two subsets, one with 31,162 (subset 1) and another with 31,163 samples (subset 2). Marginal summary statistics correspond to $T_{i}$ ’s and $Y_{i}$ ’s were derived using subset 1 and subset 2, respectively. We applied LDpred to jointly estimate all SNPs’ effects using marginal summary statistics from subset 1. Then, we ran BADGERS to identify associations between 100 pairs of $Y_{i}$ and $T_{i}$ using two methods to impute $T_{i}$ ’s (i.e. marginal PRS and LDpred).

GWAS datasets

Summary statistics for 4357 UK biobank traits were generated by Dr. Benjamin Neale’s group and were downloaded from (http://www.nealelab.is/uk-biobank). AD summary statistics from the IGAP stage-I analysis were downloaded from the IGAP website (http://web.pasteur-lille.fr/en/recherche/u744/igap/igap_download.php). ADGC phase 2 summary statistics were generated by first analyzing individual datasets using logistic regression adjusting for age, sex, and the first three principal components in the program SNPTest v2 (Marchini et al., 2007). Meta-analysis of the individual dataset results was then performed using the inverse-variance weighted approach (Willer et al., 2010).

GWAS summary statistics for neuropathologic features of AD and related dementias were obtained from the ADGC. Details on these data have been previously reported (Beecham et al., 2014). We analyzed a total of 13 neuropathologic features, including four NP traits, two traits for NFT Braak stages, three traits for LBD, CAA, HS, and two VBI traits. Among different versions of the same pathology, we picked one dataset for each pathologic feature to show in our primary analyses. Six AD subgroups were defined in the recent EPAD paper (Mukherjee et al., 2018) on the basis of relative performance in memory, executive functioning, visuospatial functioning, and language at the time of Alzheimer’s diagnosis. Four subgroups include AD samples with an isolated substantial relative impairment in one of four domains; the ‘none’ subgroup includes samples without substantial relative impairment; the ‘mix’ subgroup includes samples with relative impairment in multiple domains. Each domain was compared with healthy controls in case-control association analyses. We did not include the executive functioning subgroup in our analysis due to its small sample size in cases. Detailed information about the design of CSF biomarker GWAS and the recent sex-stratified analysis has been described previously (Deming et al., 2017; Deming et al., 2018). Details on the association statistics for AD subgroups, CSF biomarkers, and neuropathological features are summarized in Supplementary file 2.

Analysis of GWAS summary statistics

We applied LD score regression implemented in the LDSC software (Bulik-Sullivan et al., 2015) to estimate the heritability of each trait. Among 4357 traits, we selected 1738 with nominally significant heritability (p<0.05) to include in our analyses. We removed SNPs with association p-values greater than 0.01 from each of the 1738 summary statistics files, clumped the remaining SNPs using a LD cutoff of 0.1 and a radius of 1 Mb in PLINK (Purcell et al., 2007), and built PRS for each trait using the effect size estimates of remaining SNPs.

Throughout the paper, we used samples of European ancestry in the 1000 Genomes Project as a reference panel to estimate LD (Abecasis et al., 2012). In univariate analyses, we tested marginal associations between each PRS and AD using the IGAP stage-I dataset and replicated the findings using the ADGC summary statistics. Association results in two stages were combined using an inverse variance-weighted meta-analysis (Willer et al., 2010). A stringent Bonferroni-corrected significance threshold was used to identify AD-associated risk factors. For associations between identified risk factors and AD endophenotypes, we used an FDR cutoff of 0.05 to claim statistical significance. We applied hierarchical clustering to the covariance of 48 traits we identified from marginal association analysis, then divided the result into 15 clusters and selected one most significant trait from each cluster and used them to perform multivariate conditional analysis. We analyzed IGAP and ADGC datasets separately, and combined the results using meta-analysis.

We used MR-IVW approach (Burgess et al., 2013) implemented in the Mendelian Randomization R package (Yavorska and Burgess, 2017) to study the causal effects of 48 risk factors identified by BADGERS. For each trait, we selected instrumental SNP variables as the top 30 most significant SNPs after clumping all SNPs using a LD cutoff of 0.1.

Analysis of WRAP data

WRAP is a longitudinal study of initially dementia-free middle-aged adults that allows for the enrollment of siblings and is enriched for a parental history of AD. Details of the study design and methods used have been previously described (Johnson et al., 2018; Sager et al., 2005). After quality control, a total of 1198 participants whose genetic ancestry was primarily of European descent were included in our analysis. On average, participants were 53.7 years of age (SD = 6.6) at baseline and had a bachelor’s degree, and 69.8% (n=836) were female. Participants had two to six longitudinal study visits, with an average of 4.3 visits, leading to a total of 5184 observations available for analysis.

DNA samples were genotyped using the Illumina Multi-Ethnic Genotyping Array at the University of Wisconsin Biotechnology Center. Thirty-six blinded duplicate samples were used to calculate a concordance rate of 99.99%, and discordant genotypes were set to missing. Imputation was performed with the Michigan Imputation Server v1.0.3 (Das et al., 2016), using the Haplotype Reference Consortium (HRC) v. r1.1 2016 (McCarthy et al., 2016) as the reference panel and Eagle2 v2.3 (Loh et al., 2016) for phasing. Variants with a quality score R² <0.80, MAF <0.001, or that were out of HWE were excluded, leading to 10,499,994 imputed and genotyped variants for analyses. Data cleaning and file preparation were completed using PLINK v1.9 (Chang et al., 2015) and VCFtools v0.1.14 (Danecek et al., 2011). Coordinates are based on the hg19 genome build. Due to the sibling relationships present in the WRAP cohort, genetic ancestry was assessed and confirmed using Principal Components Analysis in Related Samples (PC-AiR), a method that makes robust inferences about population structure in the presence of relatedness (Conomos et al., 2015).

Composite scores were calculated for executive function, delayed recall, and learning based on a previous analysis (Clark et al., 2016). Each composite score was calculated from three neuropsychological tests, which were each converted to z-scores using baseline means and standard deviations. These z-scores were then averaged to derive executive function and delayed recall composite scores at each visit for each individual. Cognitive impairment status was determined based on a consensus review by a panel of dementia experts. Resulting cognitive statuses included cognitively normal, early MCI, clinical MCI, impairment that was not MCI, or dementia, as previously defined (Koscik et al., 2016). Participants were considered cognitively impaired if their worst consensus conference diagnosis was early MCI, clinical MCI, or dementia (n=387). Participants were considered cognitively stable if their consensus conference diagnosis was cognitively normal across all visits (n=803).

The 48 PRSs were developed within the WRAP cohort using PLINK v1.9 (Chang et al., 2015) and tested for associations with the three composite scores (i.e. executive function, delayed recall, and learning) and cognitive impairment statuses. MCI status was tested using logistic regression models in R, while all other associations, which utilized multiple study visits, were tested using linear mixed regression models implemented in the lme4 package in R (Bates et al., 2015). All models included fixed effects for age and sex, and cognitive composite scores additionally included a fixed effect for practice effect (using visit number). Mixed models included random intercepts for within-subject correlations due to repeated measures and within-family correlations due to the enrollment of siblings.

Software availability

The BADGERS software is freely available at https://github.com/qlu-lab/BADGERS, copy archived at qlu-lab, 2024.

Acknowledgements

This project was supported by the Clinical and Translational Science Award (CTSA) program, through the NIH National Center for Advancing Translational Sciences (NCATS), grant UL1TR000427. Support for this research was also provided by the University of Wisconsin-Madison Office of the Chancellor and the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation. BFD was supported by an NLM training grant to the Computation and Informatics in Biology and Medicine Training Program [NLM 5T15LM007359]. This research was also supported by the NIH [grants R01AG054047, R01AG27161, UL1TR000427, and P2C HD047873], Helen Bader Foundation, Northwestern Mutual Foundation, Extendicare Foundation, and the State of Wisconsin. The authors thank the University of Wisconsin Madison Biotechnology Center Gene Expression Center for providing Illumina Infinium genotyping services. We thank the International Genomics of Alzheimer’s Project (IGAP) for providing summary results data for these analyses. The investigators within IGAP contributed to the design and implementation of IGAP and/or provided data but did not participate in analysis or writing of this report. IGAP was made possible by the generous participation of the subjects and their families. The i-Select chips were funded by the French National Foundation on Alzheimer’s disease and related disorders. EADI was supported by the LABEX (laboratory of excellence program investment for the future) DISTALZ grant, Inserm, Institut Pasteur de Lille, Université de Lille 2, and the Lille University Hospital. GERAD was supported by the Medical Research Council (Grant n° 503480), Alzheimer’s Research UK (Grant n° 503176), the Wellcome Trust (Grant n° 082604/2/07/Z), and German Federal Ministry of Education and Research (BMBF): Competence Network Dementia (CND) grant n° 01GI0102, 01GI0711, 01GI0420. CHARGE was partly supported by the NIH/NIA grant R01 AG033193 and the NIA AG081220 and AGES contract N01–AG–12100, the NHLBI grant R01 HL105756, the Icelandic Heart Association, and the Erasmus Medical Center and Erasmus University. ADGC was supported by the NIH/NIA grants: U01 AG032984, U24 AG021886, U01 AG016976, and the Alzheimer’s Association grant ADGC–10–196,728. We thank contributors who collected samples used in this study, as well as patients and their families, whose help and participation made this work possible; Data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24-AG041689-01). We are also grateful for ADGC and its investigators for providing GWAS summary statistics for various AD phenotypes. The full acknowledgement to ADGC is included in the Supplementary file 1.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Qiongshi Lu, Email: qlu@biostat.wisc.edu.

Nicholas Mancuso, University of Southern California, United States.

Timothy E Behrens, University of Oxford, United Kingdom.

Funding Information

This paper was supported by the following grants:

National Center for Advancing Translational Sciences Clinical and Translational Science Award (CTSA) program, UL1TR000427 to Bowen Hu, Yunling Wang, Qiongshi Lu, Donghui Yan.
U.S. National Library of Medicine Computation and Informatics in Biology and Medicine Training Program, NLM 5T15LM007359 to Burcu F Darst.
National Institutes of Health R01AG054047 to Corinne D Engelman, Qiongshi Lu.
National Institutes of Health R01AG27161 to Sterling C Johnson.
National Institutes of Health UL1TR000427 to Burcu F Darst.
National Institutes of Health P2C HD047873 to Corinne D Engelman.
NIH/NIA U01 AG032984 to .
NIH/NIA U01 AG016976 to .
NIH/NIA U01 AG016976 to .
Alzheimer’s Association ADGC–10–196,728 to .

Additional information

Competing interests

No competing interests declared.

Author contributions

Data curation, Software, Formal analysis, Validation, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Data curation, Formal analysis, Validation, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Data curation, Writing – review and editing.

Data curation, Formal analysis, Writing – review and editing.

Data curation, Writing – review and editing.

Data curation.

Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Additional files

Supplementary file 1. Simulation result; Result from Mendelian randomization and GSMR; Acknowledgements to Alzheimer’s Disease Genetics Consortium (ADGC).

elife-91360-supp1.docx^{(64.5KB, docx)}

Supplementary file 2. Full association result between UK-biobank traits and Alzheimer’s disease/endophenotypes.

elife-91360-supp2.xlsx^{(2.5MB, xlsx)}

MDAR checklist

elife-91360-mdarchecklist1.docx^{(99.6KB, docx)}

Data availability

The current manuscript is a computational study, so no data have been generated for this manuscript. The modeling code is available at https://github.com/qlu-lab/BADGERS, copy archived at qlu-lab, 2024.

The following previously published datasets were used:

Lambert JC. 2013. NG00036 - IGAP Summary Statistics- Lambert et al. (2013) NIAGADS database. ng00036

Naj AC. 2011. Alzheimer's Disease Genetics Consortium (ADGC) Collection. NIAGADS database. NG00027

References

Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. Journal of Statistical Software. 2015;67:1–48. doi: 10.18637/jss.v067.i01. [DOI] [Google Scholar]
Beecham GW, Hamilton K, Naj AC, Martin ER, Huentelman M, Myers AJ, Corneveaux JJ, Hardy J, Vonsattel JP, Younkin SG, Bennett DA, De Jager PL, Larson EB, Crane PK, Kamboh MI, Kofler JK, Mash DC, Duque L, Gilbert JR, Gwirtsman H, Buxbaum JD, Kramer P, Dickson DW, Farrer LA, Frosch MP, Ghetti B, Haines JL, Hyman BT, Kukull WA, Mayeux RP, Pericak-Vance MA, Schneider JA, Trojanowski JQ, Reiman EM, Schellenberg GD, Montine TJ. Genome-wide association meta-analysis of neuropathologic features of Alzheimer’s disease and related dementias. PLOS Genetics. 2014;10:e1004606. doi: 10.1371/journal.pgen.1004606. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium. Patterson N, Daly MJ, Price AL, Neale BM. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]
Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic Epidemiology. 2013;37:658–665. doi: 10.1002/gepi.21758. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K. Genome-wide genetic data on~ 500,000 UK Biobank participants. bioRxiv. 2017 doi: 10.1101/166298. [DOI]
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clark LR, Racine AM, Koscik RL, Okonkwo OC, Engelman CD, Carlsson CM, Asthana S, Bendlin BB, Chappell R, Nicholas CR, Rowley HA, Oh JM, Hermann BP, Sager MA, Christian BT, Johnson SC. Beta-amyloid and cognitive decline in late middle age: Findings from the Wisconsin Registry for Alzheimer’s Prevention study. Alzheimer’s & Dementia. 2016;12:805–814. doi: 10.1016/j.jalz.2015.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Conomos MP, Miller MB, Thornton TA. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genetic Epidemiology. 2015;39:276–293. doi: 10.1002/gepi.21896. [DOI] [PMC free article] [PubMed] [Google Scholar]
Crane PK, Trittschuh E, Mukherjee S, Saykin AJ, Sanders RE, Larson EB, McCurry SM, McCormick W, Bowen JD, Grabowski T, Moore M, Bauman J, Gross AL, Keene CD, Bird TD, Gibbons LE, Mez J. Incidence of cognitively defined late-onset Alzheimer’s dementia subgroups from a prospective cohort study. Alzheimer’s & Dementia. 2017;13:1307–1316. doi: 10.1016/j.jalz.2017.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh P-R, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C. Next-generation genotype imputation service and methods. Nature Genetics. 2016;48:1284–1287. doi: 10.1038/ng.3656. [DOI] [PMC free article] [PubMed] [Google Scholar]
Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human Molecular Genetics. 2014;23:R89–R98. doi: 10.1093/hmg/ddu328. [DOI] [PMC free article] [PubMed] [Google Scholar]
Deming Y, Li Z, Kapoor M, Harari O, Del-Aguila JL, Black K, Carrell D, Cai Y, Fernandez MV, Budde J, Ma S, Saef B, Howells B, Huang K-L, Bertelsen S, Fagan AM, Holtzman DM, Morris JC, Kim S, Saykin AJ, De Jager PL, Albert M, Moghekar A, O’Brien R, Riemenschneider M, Petersen RC, Blennow K, Zetterberg H, Minthon L, Van Deerlin VM, Lee VM-Y, Shaw LM, Trojanowski JQ, Schellenberg G, Haines JL, Mayeux R, Pericak-Vance MA, Farrer LA, Peskind ER, Li G, Di Narzo AF, Alzheimer’s Disease Neuroimaging Initiative (ADNI) Alzheimer Disease Genetic Consortium (ADGC) Kauwe JSK, Goate AM, Cruchaga C. Genome-wide association study identifies four novel loci associated with Alzheimer’s endophenotypes and disease modifiers. Acta Neuropathologica. 2017;133:839–856. doi: 10.1007/s00401-017-1685-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Deming Y, Dumitrescu L, Barnes LL, Thambisetty M, Kunkle B, Gifford KA, Bush WS, Chibnik LB, Mukherjee S, De Jager PL, Kukull W, Huentelman M, Crane PK, Resnick SM, Keene CD, Montine TJ, Schellenberg GD, Haines JL, Zetterberg H, Blennow K, Larson EB, Johnson SC, Albert M, Moghekar A, Del Aguila JL, Fernandez MV, Budde J, Hassenstab J, Fagan AM, Riemenschneider M, Petersen RC, Minthon L, Chao MJ, Van Deerlin VM, Lee VM-Y, Shaw LM, Trojanowski JQ, Peskind ER, Li G, Davis LK, Sealock JM, Cox NJ, Alzheimer’s Disease Neuroimaging Initiative (ADNI) Alzheimer Disease Genetics Consortium (ADGC) Goate AM, Bennett DA, Schneider JA, Jefferson AL, Cruchaga C, Hohman TJ. Sex-specific genetic predictors of Alzheimer’s disease biomarkers. Acta Neuropathologica. 2018;136:857–872. doi: 10.1007/s00401-018-1881-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Djelti F, Braudeau J, Hudry E, Dhenain M, Varin J, Bièche I, Marquer C, Chali F, Ayciriex S, Auzeil N, Alves S, Langui D, Potier M-C, Laprevote O, Vidaud M, Duyckaerts C, Miles R, Aubourg P, Cartier N. CYP46A1 inhibition, brain cholesterol accumulation and neurodegeneration pave the way for Alzheimer’s disease. Brain. 2015;138:2383–2398. doi: 10.1093/brain/awv166. [DOI] [PubMed] [Google Scholar]
Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLOS Genetics. 2013;9:e1003348. doi: 10.1371/journal.pgen.1003348. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, GTEx Consortium. Nicolae DL, Cox NJ, Im HK. A gene-based association method for mapping traits using reference transcriptome data. Nature Genetics. 2015;47:1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, Jansen R, de Geus EJC, Boomsma DI, Wright FA, Sullivan PF, Nikkola E, Alvarez M, Civelek M, Lusis AJ, Lehtimäki T, Raitoharju E, Kähönen M, Seppälä I, Raitakari OT, Kuusisto J, Laakso M, Price AL, Pajukanta P, Pasaniuc B. Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, Hamshere ML, Pahwa JS, Moskvina V, Dowzell K, Williams A, Jones N, Thomas C, Stretton A, Morgan AR, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Morgan K, Brown KS, Passmore PA, Craig D, McGuinness B, Todd S, Holmes C, Mann D, Smith AD, Love S, Kehoe PG, Hardy J, Mead S, Fox N, Rossor M, Collinge J, Maier W, Jessen F, Schürmann B, Heun R, van den Bussche H, Heuser I, Kornhuber J, Wiltfang J, Dichgans M, Frölich L, Hampel H, Hüll M, Rujescu D, Goate AM, Kauwe JSK, Cruchaga C, Nowotny P, Morris JC, Mayo K, Sleegers K, Bettens K, Engelborghs S, De Deyn PP, Van Broeckhoven C, Livingston G, Bass NJ, Gurling H, McQuillin A, Gwilliam R, Deloukas P, Al-Chalabi A, Shaw CE, Tsolaki M, Singleton AB, Guerreiro R, Mühleisen TW, Nöthen MM, Moebus S, Jöckel K-H, Klopp N, Wichmann H-E, Carrasquillo MM, Pankratz VS, Younkin SG, Holmans PA, O’Donovan M, Owen MJ, Williams J. Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer’s disease. Nature Genetics. 2009;41:1088–1093. doi: 10.1038/ng.440. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hohman TJ, Dumitrescu L, Barnes LL, Thambisetty M, Beecham G, Kunkle B, Gifford KA, Bush WS, Chibnik LB, Mukherjee S, De Jager PL, Kukull W, Crane PK, Resnick SM, Keene CD, Montine TJ, Schellenberg GD, Haines JL, Zetterberg H, Blennow K, Larson EB, Johnson SC, Albert M, Bennett DA, Schneider JA, Jefferson AL, Alzheimer’s Disease Genetics Consortium and the Alzheimer’s Disease Neuroimaging Initiative Sex-Specific association of apolipoprotein e with cerebrospinal fluid levels of Tau. JAMA Neurology. 2018;75:989–998. doi: 10.1001/jamaneurol.2018.0821. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hollingworth P, Harold D, Sims R, Gerrish A, Lambert JC, Carrasquillo MM, Abraham R, Hamshere ML, Pahwa JS, Moskvina V, Dowzell K, Jones N, Stretton A, Thomas C, Richards A, Ivanov D, Widdowson C, Chapman J, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Brown KS, Passmore PA, Craig D, McGuinness B, Todd S, Holmes C, Mann D, Smith AD, Beaumont H, Warden D, Wilcock G, Love S, Kehoe PG, Hooper NM, Vardy E, Hardy J, Mead S, Fox NC, Rossor M, Collinge J, Maier W, Jessen F, Rüther E, Schürmann B, Heun R, Kölsch H, van den Bussche H, Heuser I, Kornhuber J, Wiltfang J, Dichgans M, Frölich L, Hampel H, Gallacher J, Hüll M, Rujescu D, Giegling I, Goate AM, Kauwe JSK, Cruchaga C, Nowotny P, Morris JC, Mayo K, Sleegers K, Bettens K, Engelborghs S, De Deyn PP, Van Broeckhoven C, Livingston G, Bass NJ, Gurling H, McQuillin A, Gwilliam R, Deloukas P, Al-Chalabi A, Shaw CE, Tsolaki M, Singleton AB, Guerreiro R, Mühleisen TW, Nöthen MM, Moebus S, Jöckel KH, Klopp N, Wichmann HE, Pankratz VS, Sando SB, Aasly JO, Barcikowska M, Wszolek ZK, Dickson DW, Graff-Radford NR, Petersen RC, van Duijn CM, Breteler MMB, Ikram MA, DeStefano AL, Fitzpatrick AL, Lopez O, Launer LJ, Seshadri S, CHARGE consortium. Berr C, Campion D, Epelbaum J, Dartigues JF, Tzourio C, Alpérovitch A, Lathrop M, EADI1 consortium. Feulner TM, Friedrich P, Riehle C, Krawczak M, Schreiber S, Mayhaus M, Nicolhaus S, Wagenpfeil S, Steinberg S, Stefansson H, Stefansson K, Snaedal J, Björnsson S, Jonsson PV, Chouraki V, Genier-Boley B, Hiltunen M, Soininen H, Combarros O, Zelenika D, Delepine M, Bullido MJ, Pasquier F, Mateo I, Frank-Garcia A, Porcellini E, Hanon O, Coto E, Alvarez V, Bosco P, Siciliano G, Mancuso M, Panza F, Solfrizzi V, Nacmias B, Sorbi S, Bossù P, Piccardi P, Arosio B, Annoni G, Seripa D, Pilotto A, Scarpini E, Galimberti D, Brice A, Hannequin D, Licastro F, Jones L, Holmans PA, Jonsson T, Riemenschneider M, Morgan K, Younkin SG, Owen MJ, O’Donovan M, Amouyel P, Williams J. Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer’s disease. Nature Genetics. 2011;43:429–435. doi: 10.1038/ng.803. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hu Y, Lu Q, Powles R, Yao X, Yang C, Fang F, Xu X, Zhao H. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLOS Computational Biology. 2017;13:e1005589. doi: 10.1371/journal.pcbi.1005589. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, Yu Z, Li B, Muchnik S, Shi Y, Kunkle BW, Mukherjee S, Natarajan P, Naj A, Kuzma A, Zhao Y, Crane PK, Zhao H, Alzheimer’s Disease Genetics Consortium A Statistical Framework for Cross-Tissue Transcriptome-Wide Association Analysis. bioRxiv. 2018 doi: 10.1101/286013. [DOI] [PMC free article] [PubMed]
Jack CR, Jr, Knopman DS, Jagust WJ, Petersen RC, Weiner MW, Aisen PS, Shaw LM, Vemuri P, Wiste HJ, Weigand SD, Lesnick TG, Pankratz VS, Donohue MC, Trojanowski JQ. Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. The Lancet Neurology. 2013;12:207–216. doi: 10.1016/S1474-4422(12)70291-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Johnson SC, Koscik RL, Jonaitis EM, Clark LR, Mueller KD, Berman SE, Bendlin BB, Engelman CD, Okonkwo OC, Hogan KJ, Asthana S, Carlsson CM, Hermann BP, Sager MA. The Wisconsin Registry for Alzheimer’s Prevention: A review of findings and current directions. Alzheimer’s & Dementia. 2018;10:130–142. doi: 10.1016/j.dadm.2017.11.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jun GR, Chung J, Mez J, Barber R, Beecham GW, Bennett DA, Buxbaum JD, Byrd GS, Carrasquillo MM, Crane PK, Cruchaga C, De Jager P, Ertekin-Taner N, Evans D, Fallin MD, Foroud TM, Friedland RP, Goate AM, Graff-Radford NR, Hendrie H, Hall KS, Hamilton-Nelson KL, Inzelberg R, Kamboh MI, Kauwe JSK, Kukull WA, Kunkle BW, Kuwano R, Larson EB, Logue MW, Manly JJ, Martin ER, Montine TJ, Mukherjee S, Naj A, Reiman EM, Reitz C, Sherva R, St George-Hyslop PH, Thornton T, Younkin SG, Vardarajan BN, Wang LS, Wendlund JR, Winslow AR, Haines J, Mayeux R, Pericak-Vance MA, Schellenberg G, Lunetta KL, Farrer LA. Transethnic genome-wide scan identifies novel Alzheimer’s disease loci. Alzheimer’s & Dementia. 2017;13:727–738. doi: 10.1016/j.jalz.2016.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Koscik RL, Berman SE, Clark LR, Mueller KD, Okonkwo OC, Gleason CE, Hermann BP, Sager MA, Johnson SC. Intraindividual cognitive variability in middle age predicts cognitive impairment 8-10 years later: Results from the wisconsin registry for alzheimer’s prevention. Journal of the International Neuropsychological Society. 2016;22:1016–1025. doi: 10.1017/S135561771600093X. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lambert JC, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, Bellenguez C, DeStafano AL, Bis JC, Beecham GW, Grenier-Boley B, Russo G, Thorton-Wells TA, Jones N, Smith AV, Chouraki V, Thomas C, Ikram MA, Zelenika D, Vardarajan BN, Kamatani Y, Lin CF, Gerrish A, Schmidt H, Kunkle B, Dunstan ML, Ruiz A, Bihoreau MT, Choi SH, Reitz C, Pasquier F, Cruchaga C, Craig D, Amin N, Berr C, Lopez OL, De Jager PL, Deramecourt V, Johnston JA, Evans D, Lovestone S, Letenneur L, Morón FJ, Rubinsztein DC, Eiriksdottir G, Sleegers K, Goate AM, Fiévet N, Huentelman MW, Gill M, Brown K, Kamboh MI, Keller L, Barberger-Gateau P, McGuiness B, Larson EB, Green R, Myers AJ, Dufouil C, Todd S, Wallon D, Love S, Rogaeva E, Gallacher J, St George-Hyslop P, Clarimon J, Lleo A, Bayer A, Tsuang DW, Yu L, Tsolaki M, Bossù P, Spalletta G, Proitsi P, Collinge J, Sorbi S, Sanchez-Garcia F, Fox NC, Hardy J, Deniz Naranjo MC, Bosco P, Clarke R, Brayne C, Galimberti D, Mancuso M, Matthews F, Cohorts for Heart and Aging Research in Genomic Epidemiology. Moebus S, Mecocci P, Del Zompo M, Maier W, Hampel H, Pilotto A, Bullido M, Panza F, Caffarra P, Nacmias B, Gilbert JR, Mayhaus M, Lannefelt L, Hakonarson H, Pichler S, Carrasquillo MM, Ingelsson M, Beekly D, Alvarez V, Zou F, Valladares O, Younkin SG, Coto E, Hamilton-Nelson KL, Gu W, Razquin C, Pastor P, Mateo I, Owen MJ, Faber KM, Jonsson PV, Combarros O, O’Donovan MC, Cantwell LB, Soininen H, Blacker D, Mead S, Mosley TH, Bennett DA, Harris TB, Fratiglioni L, Holmes C, de Bruijn RF, Passmore P, Montine TJ, Bettens K, Rotter JI, Brice A, Morgan K, Foroud TM, Kukull WA, Hannequin D, Powell JF, Nalls MA, Ritchie K, Lunetta KL, Kauwe JS, Boerwinkle E, Riemenschneider M, Boada M, Hiltuenen M, Martin ER, Schmidt R, Rujescu D, Wang LS, Dartigues JF, Mayeux R, Tzourio C, Hofman A, Nöthen MM, Graff C, Psaty BM, Jones L, Haines JL, Holmans PA, Lathrop M, Pericak-Vance MA, Launer LJ, Farrer LA, van Duijn CM, Van Broeckhoven C, Moskvina V, Seshadri S, Williams J, Schellenberg GD, Amouyel P. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nature Genetics. 2013;45:1452–1458. doi: 10.1038/ng.2802. [DOI] [PMC free article] [PubMed] [Google Scholar]
Larsson SC, Traylor M, Malik R, Dichgans M, Burgess S, Markus HS, CoSTREAM Consortium, on behalf of the International Genomics of Alzheimer’s Project Modifiable pathways in Alzheimer’s disease: Mendelian randomisation analysis. BMJ. 2017;359:j5375. doi: 10.1136/bmj.j5375. [DOI] [PMC free article] [PubMed] [Google Scholar]
Loh P-R, Danecek P, Palamara PF, Fuchsberger C, A Reshef Y, K Finucane H, Schoenherr S, Forer L, McCarthy S, Abecasis GR, Durbin R, L Price A. Reference-based phasing using the Haplotype Reference Consortium panel. Nature Genetics. 2016;48:1443–1448. doi: 10.1038/ng.3679. [DOI] [PMC free article] [PubMed] [Google Scholar]
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TFC, McCarroll SA, Visscher PM. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics. 2007;39:906–913. doi: 10.1038/ng2088. [DOI] [PubMed] [Google Scholar]
McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, Luo Y, Sidore C, Kwong A, Timpson N, Koskinen S, Vrieze S, Scott LJ, Zhang H, Mahajan A, Veldink J, Peters U, Pato C, van Duijn CM, Gillies CE, Gandin I, Mezzavilla M, Gilly A, Cocca M, Traglia M, Angius A, Barrett JC, Boomsma D, Branham K, Breen G, Brummett CM, Busonero F, Campbell H, Chan A, Chen S, Chew E, Collins FS, Corbin LJ, Smith GD, Dedoussis G, Dorr M, Farmaki A-E, Ferrucci L, Forer L, Fraser RM, Gabriel S, Levy S, Groop L, Harrison T, Hattersley A, Holmen OL, Hveem K, Kretzler M, Lee JC, McGue M, Meitinger T, Melzer D, Min JL, Mohlke KL, Vincent JB, Nauck M, Nickerson D, Palotie A, Pato M, Pirastu N, McInnis M, Richards JB, Sala C, Salomaa V, Schlessinger D, Schoenherr S, Slagboom PE, Small K, Spector T, Stambolian D, Tuke M, Tuomilehto J, Van den Berg LH, Van Rheenen W, Volker U, Wijmenga C, Toniolo D, Zeggini E, Gasparini P, Sampson MG, Wilson JF, Frayling T, de Bakker PIW, Swertz MA, McCarroll S, Kooperberg C, Dekker A, Altshuler D, Willer C, Iacono W, Ripatti S, Soranzo N, Walter K, Swaroop A, Cucca F, Anderson CA, Myers RM, Boehnke M, McCarthy MI, Durbin R, Haplotype Reference Consortium A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics. 2016;48:1279–1283. doi: 10.1038/ng.3643. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mukherjee S, Mez J, Trittschuh E, Saykin AJ. Genetic Data and Cognitively-Defined Late-Onset Alzheimer’s Disease Subgroups. bioRxiv. 2018 doi: 10.1101/367615. [DOI] [PMC free article] [PubMed]
Naj AC, Jun G, Beecham GW, Wang L-S, Vardarajan BN, Buros J, Gallins PJ, Buxbaum JD, Jarvik GP, Crane PK, Larson EB, Bird TD, Boeve BF, Graff-Radford NR, De Jager PL, Evans D, Schneider JA, Carrasquillo MM, Ertekin-Taner N, Younkin SG, Cruchaga C, Kauwe JSK, Nowotny P, Kramer P, Hardy J, Huentelman MJ, Myers AJ, Barmada MM, Demirci FY, Baldwin CT, Green RC, Rogaeva E, St George-Hyslop P, Arnold SE, Barber R, Beach T, Bigio EH, Bowen JD, Boxer A, Burke JR, Cairns NJ, Carlson CS, Carney RM, Carroll SL, Chui HC, Clark DG, Corneveaux J, Cotman CW, Cummings JL, DeCarli C, DeKosky ST, Diaz-Arrastia R, Dick M, Dickson DW, Ellis WG, Faber KM, Fallon KB, Farlow MR, Ferris S, Frosch MP, Galasko DR, Ganguli M, Gearing M, Geschwind DH, Ghetti B, Gilbert JR, Gilman S, Giordani B, Glass JD, Growdon JH, Hamilton RL, Harrell LE, Head E, Honig LS, Hulette CM, Hyman BT, Jicha GA, Jin L-W, Johnson N, Karlawish J, Karydas A, Kaye JA, Kim R, Koo EH, Kowall NW, Lah JJ, Levey AI, Lieberman AP, Lopez OL, Mack WJ, Marson DC, Martiniuk F, Mash DC, Masliah E, McCormick WC, McCurry SM, McDavid AN, McKee AC, Mesulam M, Miller BL, Miller CA, Miller JW, Parisi JE, Perl DP, Peskind E, Petersen RC, Poon WW, Quinn JF, Rajbhandary RA, Raskind M, Reisberg B, Ringman JM, Roberson ED, Rosenberg RN, Sano M, Schneider LS, Seeley W, Shelanski ML, Slifer MA, Smith CD, Sonnen JA, Spina S, Stern RA, Tanzi RE, Trojanowski JQ, Troncoso JC, Van Deerlin VM, Vinters HV, Vonsattel JP, Weintraub S, Welsh-Bohmer KA, Williamson J, Woltjer RL, Cantwell LB, Dombroski BA, Beekly D, Lunetta KL, Martin ER, Kamboh MI, Saykin AJ, Reiman EM, Bennett DA, Morris JC, Montine TJ, Goate AM, Blacker D, Tsuang DW, Hakonarson H, Kukull WA, Foroud TM, Haines JL, Mayeux R, Pericak-Vance MA, Farrer LA, Schellenberg GD. Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer’s disease. Nature Genetics. 2011;43:436–441. doi: 10.1038/ng.801. [DOI] [PMC free article] [PubMed] [Google Scholar]
Norton S, Matthews FE, Barnes DE, Yaffe K, Brayne C. Potential for primary prevention of Alzheimer’s disease: an analysis of population-based data. The Lancet. Neurology. 2014;13:788–794. doi: 10.1016/S1474-4422(14)70136-X. [DOI] [PubMed] [Google Scholar]
Østergaard SD, Mukherjee S, Sharp SJ, Proitsi P, Lotta LA, Day F, Perry JRB, Boehme KL, Walter S, Kauwe JS, Gibbons LE, Alzheimer’s Disease Genetics Consortium. GERAD1 Consortium. EPIC-InterAct Consortium. Larson EB, Powell JF, Langenberg C, Crane PK, Wareham NJ, Scott RA. Associations between potentially modifiable risk factors and alzheimer disease: a mendelian randomization study. PLOS Medicine. 2015;12:e1001841. doi: 10.1371/journal.pmed.1001841. [DOI] [PMC free article] [PubMed] [Google Scholar]
Paternoster L, Tilling K, Davey Smith G. Genetic epidemiology and Mendelian randomization for informing disease therapeutics: Conceptual and methodological challenges. PLOS Genetics. 2017;13:e1006944. doi: 10.1371/journal.pgen.1006944. [DOI] [PMC free article] [PubMed] [Google Scholar]
Prince M, Bryce R, Albanese E, Wimo A, Ribeiro W, Ferri CP. The global prevalence of dementia: A systematic review and metaanalysis. Alzheimer’s & Dementia. 2013;9:63. doi: 10.1016/j.jalz.2012.11.007. [DOI] [PubMed] [Google Scholar]
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
qlu-lab Badgers. swh:1:rev:d6d1d181549d3df29639bd736be4a39d5a9d44aaSoftware Heritage. 2024 https://archive.softwareheritage.org/swh:1:dir:e973443627f31ba29bcf4c4d176d59daeee8ee56;origin=https://github.com/qlu-lab/BADGERS;visit=swh:1:snp:cdc16e40d72e70869fb348b6bc08f67ad49b2bf8;anchor=swh:1:rev:d6d1d181549d3df29639bd736be4a39d5a9d44aa
Reed B, Villeneuve S, Mack W, DeCarli C, Chui HC, Jagust W. Associations between serum cholesterol levels and cerebral amyloidosis. JAMA Neurology. 2014;71:195–200. doi: 10.1001/jamaneurol.2013.5390. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reitz C, Mayeux R. Alzheimer disease: epidemiology, diagnostic criteria, risk factors and biomarkers. Biochemical Pharmacology. 2014;88:640–651. doi: 10.1016/j.bcp.2013.12.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sager MA, Hermann B, La Rue A. Middle-aged children of persons with Alzheimer’s disease: APOE genotypes and cognitive function in the Wisconsin Registry for Alzheimer’s Prevention. Journal of Geriatric Psychiatry and Neurology. 2005;18:245–249. doi: 10.1177/0891988705281882. [DOI] [PubMed] [Google Scholar]
Seshadri S, Fitzpatrick AL, Ikram MA, DeStefano AL, Gudnason V, Boada M, Bis JC, Smith AV, Carassquillo MM, Lambert JC, Harold D, Schrijvers EMC, Ramirez-Lorca R, Debette S, Longstreth WT, Janssens ACJW, Pankratz VS, Dartigues JF, Hollingworth P, Aspelund T, Hernandez I, Beiser A, Kuller LH, Koudstaal PJ, Dickson DW, Tzourio C, Abraham R, Antunez C, Du Y, Rotter JI, Aulchenko YS, Harris TB, Petersen RC, Berr C, Owen MJ, Lopez-Arrieta J, Varadarajan BN, Becker JT, Rivadeneira F, Nalls MA, Graff-Radford NR, Campion D, Auerbach S, Rice K, Hofman A, Jonsson PV, Schmidt H, Lathrop M, Mosley TH, Au R, Psaty BM, Uitterlinden AG, Farrer LA, Lumley T, Ruiz A, Williams J, Amouyel P, Younkin SG, Wolf PA, Launer LJ, Lopez OL, van Duijn CM, Breteler MMB, CHARGE Consortium. GERAD1 Consortium. EADI1 Consortium Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA. 2010;303:1832–1840. doi: 10.1001/jama.2010.574. [DOI] [PMC free article] [PubMed] [Google Scholar]
Simons M, Keller P, Dichgans J, Schulz JB. Cholesterol and Alzheimer’s disease: is there a link? Neurology. 2001;57:1089–1093. doi: 10.1212/wnl.57.6.1089. [DOI] [PubMed] [Google Scholar]
Sleiman PMA, Grant SFA. Mendelian randomization in the era of genomewide association studies. Clinical Chemistry. 2010;56:723–728. doi: 10.1373/clinchem.2009.141564. [DOI] [PubMed] [Google Scholar]
Stern Y. Cognitive reserve in ageing and Alzheimer’s disease. The Lancet. Neurology. 2012;11:1006–1012. doi: 10.1016/S1474-4422(12)70191-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Valenzuela MJ, Sachdev P. Brain reserve and dementia: a systematic review. Psychological Medicine. 2006;36:441–454. doi: 10.1017/S0033291705006264. [DOI] [PubMed] [Google Scholar]
Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, Genovese G, Loh PR, Bhatia G, Do R, Hayeck T, Won HH, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study. Kathiresan S, Pato M, Pato C, Tamimi R, Stahl E, Zaitlen N, Pasaniuc B, Belbin G, Kenny EE, Schierup MH, De Jager P, Patsopoulos NA, McCarroll S, Daly M, Purcell S, Chasman D, Neale B, Goddard M, Visscher PM, Kraft P, Patterson N, Price AL. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. American Journal of Human Genetics. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang J, Bakshi A, Zhu Z, Hemani G, Vinkhuyzen AAE, Lee SH, Robinson MR, Perry JRB, Nolte IM, van Vliet-Ostaptchouk JV, Snieder H, LifeLines Cohort Study. Esko T, Milani L, Mägi R, Metspalu A, Hamsten A, Magnusson PKE, Pedersen NL, Ingelsson E, Soranzo N, Keller MC, Wray NR, Goddard ME, Visscher PM. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nature Genetics. 2015;47:1114–1120. doi: 10.1038/ng.3390. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yavorska OO, Burgess S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. International Journal of Epidemiology. 2017;46:1734–1739. doi: 10.1093/ije/dyx034. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu Z, Zheng Z, Zhang F, Wu Y, Trzaskowski M, Maier R, Robinson MR, McGrath JJ, Visscher PM, Wray NR, Yang J. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nature Communications. 2018;9:224. doi: 10.1038/s41467-017-02317-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.91360.2.sa0

eLife assessment

Nicholas Mancuso ¹

In the last 15 years, large-scale association studies (GWAS) have served to estimate the association between genome-wide common variants and a large number of disparate traits and diseases in humans. This valuable method provides a new way to find correlations between the genetic component of a phenotype of interest, and all this wealth of genetic information. This software adds as a new tool to investigate genetic correlation between traits, and to generate new mechanistic hypotheses and dissect the role of the observed associations in disease heterogeneity. The results of the application of their method are solid and generally agree with what others have seen using similar AD and UKB data.

eLife. doi: 10.7554/eLife.91360.2.sa1

Reviewer #1 (Public Review):

Anonymous

The major aim of the paper was a method for determining genetic associations between two traits using common variants tested in genome-wide association studies. The work includes a software implementation and application of their approach. The results of the application of their method generally agree with what others have seen using similar AD and UKB data.

The paper has several distinct portions. The first is a method for testing genetic associations between two or more traits using genome-wide association tests statistics. The second is a python implementation of the method. The last portion is the results of their method using GWAS from AD and UK Biobank.

Regarding the method, it seems like it has similarities to LDSC, and it is not clear how it differs from LDSC or other similar methods. The implementation of the method used python 2.7 (or at least was reportedly tested using that version) that was retired in 2020. The implementation was committed between Wed Oct 3 15:21:49 2018 to Mon Jan 28 09:18:09 2019 using data that existed at the time so it was a bit surprising it used python 2.7 since it was initially going to be set for end-of-life in 2015. Anyway, trying to run the package resulted in unmet dependency errors, which I think are related to an internal package not getting installed. I would expect that published software could be installed using standard tooling for the language, and, ideally, software should have automated testing of key portions.

Regarding the main results, they find what has largely been shown by others using the same data or similar data, which add prima facie validity to the work The portions of the work dealing with AD subgroups, pathology, biomarkers, and cognitive traits of interest. I was puzzled why the authors suggested surprise regarding parental history and high cholesterol not associated with MCI or cognitive composite scores since the this would seem like the likely fallout of selection of the WRAP cohort. The discussion paragraph that started "What's more, environmental factors may play a big role in the identified associations." confused me. I think what the authors are referring to are how selection, especially in a biobank dataset, can induce correlations, which is not what I think of as an environmental effect.

Overall, the work has merit, but I am left without a clear impression of the improvement in the approach over similar methods. Likewise, the results are interesting, but similar findings are described with the data that was used in the study, which are over 5 years old at the time of this review.

eLife. doi: 10.7554/eLife.91360.2.sa2

Reviewer #2 (Public Review):

Anonymous

Summary:

Yan, Hu, and colleagues introduce BADGERS, a new method for biobank-wide scanning to find associations between a phenotype of interest, and the genetic component of a battery of candidate phenotypes. Briefly, BADGERS capitalizes on publicly available weights of genetic variants for a myriad of traits to estimate polygenic risk scores for each trait, and then identify associations with the trait of interest. Of note, the method works using summary statistics for the trait of interest, which is especially beneficial for running in population-based cohorts that are not enriched for any particular phenotype (ie. with few actual cases of the phenotype of interest).

Here, they apply BADGERS on Alzheimer's disease (AD) as the trait of interest, and a battery of circa 2,000 phenotypes with publicly available precalculated genome-wide summary statistics from the UK Biobank. They run it on two AD cohorts, to discover at least 14 significant associations between AD and traits. These include expected associations with dementia, cognition (educational attainment), and socioeconomic status-related phenotypes. Through multivariate modelling, they distinguish between (1) clearly independent components associated with AD, from (2) by-product associations that are inflated in the original bivariate analysis. Analyses stratified according to APOE inclusion show that this region does not seem to play a role in the association of some of the identified phenotypes. Of note, they observe overlap but significant differences in the associations identified with BADGERS and other Mendelian randomization (MR), hinting at BADGERS being more powerful than classical top variant-based MR approaches. They then extend BADGERS to other AD-related phenotypes, which serves to refine the hypotheses about the underlying mechanisms accounting for the genetic correlation patterns originally identified for AD. Finally, they run BADGERS on a pre-clinical cohort with mild cognitive impairment. They observe important differences in the association patterns, suggesting that this preclinical phenotype (at least in this cohort) has a different genetic architecture than general AD.

Strengths:

BADGERS is an interesting new addition to a stream of attempts to "squeeze" biobank data beyond pure association studies for diagnosis. Increasingly available biobank cohorts do not usually focus on specific diseases. However, they tend to be data-rich, opening for deep explorations that can be useful to refine our knowledge of the latent factors that lead to diagnosis. Indeed, the possibility of running genetic correlation studies in specific sub-settings of interest (e.g. preclinical cohorts) is arguably the most interesting aspect of BADGERS. Classical methods like LDSC or two-sample MR capitalize on publicly available summary statistics from large cohorts, or having access to individual genotype data of large cohorts to ensure statistical power. Seemingly, BADGERS provides a balanced opportunity to dissect the correlation between traits of interest in settings with small sample size in which other methods do not work well.

Weaknesses:

However, the increased statistical power is just hinted, and for instance, they do not explore if LDSC would have identified these associations. Although I suspect that is the case, this evidence is important to ensure that the abovementioned balance is right. Finally, as discussed by the authors, the reliance on polygenic risk scoring necessarily undermines the causality evidence gained through BADGERS. In this sense, BADGERS provides an alternative to strict instrumental-variable based analysis, which can be particularly useful to generate new mechanistic hypotheses.

In summary, after 15 years of focus on diagnosis that would require having individual access to large patient cohorts, BADGERS can become an excellent tool to dig into trait heterogeneity, especially if it turns out to be more powerful than other available methodologies.

eLife. 2024 May 24;12:RP91360. doi: 10.7554/eLife.91360.2.sa3

Author response

Donghui Yan ¹, Bowen Hu ², Burcu F Darst ³, Shubhabrata Mukherjee ⁴, Brian W Kunkle ⁵, Yuetiva Deming ⁶, Logan Dumitrescu ⁷, Yunling Wang ⁸, Adam Naj ⁹, Amanda Kuzma ¹⁰, Yi Zhao ¹¹, HYUNSEUNG KANG ¹², Sterling Johnson ¹³, Cruchaga Carlos ¹⁴, Timothy J Hohman ¹⁵, Paul K Crane ¹⁶, Corinne D Engelman ¹⁷, Qiongshi Lu ¹⁸

We thank eLife and the reviewers for the thoughtful summary and valuable review of our manuscript. We largely agree with the summary and review and have provided our responses to the comments below. We believe BADGER is a significant new tool for identifying associated risk factors for complex diseases, and the associations we observed in the analysis provide insights into the genetic basis of Alzheimer's disease.

Reviewer #1 (Public Review):

The major aim of the paper was a method for determining genetic associations between two traits using common variants tested in genome-wide association studies. The work includes a software implementation and application of their approach. The results of the application of their method generally agree with what others have seen using similar AD and UKB data.

The paper has several distinct portions. The first is a method for testing genetic associations between two or more traits using genome-wide association tests statistics. The second is a python implementation of the method. The last portion is the results of their method using GWAS from AD and UK Biobank.

We thank the reviewer for the conclusion and positive comments.

Regarding the method, it seems like it has similarities to LDSC, and it is not clear how it differs from LDSC or other similar methods. The implementation of the method used python 2.7(or at least was reportedly tested using that version) that was retired in 2020. The implementation was committed between Wed Oct 3 15:21:49 2018 to Mon Jan 28 09:18:092019 using data that existed at the time so it was a bit surprising it used python 2.7 since it was initially going to be set for end-of-life in 2015. Anyway, trying to run the package resulted in unmet dependency errors, which I think are related to an internal package not getting installed. I would expect that published software could be installed using standard tooling for the language, and, ideally, software should have automated testing of key portions.

We thank the reviewer for their comments. To clarify, the primary difference between our proposed method, BADGERS, and LDSC lies in their respective objectives and applications. LDSC is designed to estimate heritability and genetic correlations between traits by utilizing GWAS summary statistics, thereby aiding in the elucidation of the genetic architecture of complex traits and diseases. Conversely, BADGERS is specifically developed to explore causal relationships between risk factors, such as biomarkers, and diseases of interest. It employs genetic variants as variables to deduce causality, thereby addressing the challenges of confounding and reverse causation that are common in observational studies. Although BADGERS utilizes the LD reference panel derived from LDSC, the LD reference panel is used to obtain the predicted trait expression. The ultimate goal is to focus on linking biobank traits with Alzheimer’s disease and building causal relationships instead of identifying genetic architecture.

Regarding the technical aspects mentioned, we acknowledge the concerns about the use of Python 2.7 and the issues encountered during the package installation. We are in the process of updating the software to ensure compatibility with current versions of Python and to enhance the installation process with standard tooling and automated testing for a more user-friendly experience. We have provided tests for each portion of the software so the user can test if the software is working properly.

Regarding the main results, they find what has largely been shown by others using the same data or similar data, which add prima facie validity to the work The portions of the work dealing with AD subgroups, pathology, biomarkers, and cognitive traits of interest. I was puzzled why the authors suggested surprise regarding parental history and high cholesterol not associated with MCI or cognitive composite scores since the this would seem like the likely fallout of selection of the WRAP cohort. The discussion paragraph that started "What's more, environmental factors may play a big role in the identified associations." confused me. I think what the authors are referring to are how selection, especially in a biobank dataset, can induce correlations, which is not what I think of as an environmental effect.

We thank the reviewer very much for their comment. We're glad that our findings align with existing research using similar data, increasing the validity of our work and the proposed BADGER algorithm. Your point about the lack of association between parental history, high cholesterol, and mild cognitive impairment (MCI) or cognitive composite scores in the WRAP cohort is well-taken. We agree that the selection criteria of the WRAP cohort may influence these findings, as it consists of individuals with a specific risk profile for Alzheimer's disease. This selection could indeed mitigate the observed association between these factors and cognitive outcomes, which we initially found surprising.

Regarding the environmental factors, we appreciate your clarification and understand the confusion. Our intention was to discuss the potential for selection bias and confounding factors in biobank datasets for the identified associations, which might not necessarily be direct environmental effects.

Overall, the work has merit, but I am left without a clear impression of the improvement in the approach over similar methods. Likewise, the results are interesting, but similar findings are described with the data that was used in the study, which are over 5 years old at the time of this review.

We thank the reviewer a lot for their endorsement of the BADGER framework. We believe that our method, BADGER, improves on existing approaches by effectively linking genetic data with the detailed phenotypic information in biobanks and large disease GWAS. This enhances our ability to detect associations without needing individual-level data, offering clearer insights while reducing issues like reverse causality and confounding factors.

Even though the IGAP dataset is over five years old, it remains one of the largest publicly available datasets for Alzheimer’s Disease. Likewise, the UK biobank is one of the largest publicly available human traits datasets, which researchers continue to use. These datasets' continued utility demonstrates their value in the research community. Additionally, the versatility of the BADGER framework makes it suitable for future research investigating the relationship between human traits and various diseases using different datasets.

Reviewer #2 (Public Review):

Summary:

Yan, Hu, and colleagues introduce BADGERS, a new method for biobank-wide scanning to find associations between a phenotype of interest, and the genetic component of a battery of candidate phenotypes. Briefly, BADGERS capitalizes on publicly available weights of genetic variants for a myriad of traits to estimate polygenic risk scores for each trait, and then identify associations with the trait of interest. Of note, the method works using summary statistics for the trait of interest, which is especially beneficial for running in population-based cohorts that are not enriched for any particular phenotype (ie. with few actual cases of the phenotype of interest).

Here, they apply BADGERS on Alzheimer's disease (AD) as the trait of interest, and a battery of circa 2,000 phenotypes with publicly available precalculated genome-wide summary statistics from the UK Biobank. They run it on two AD cohorts, to discover at least 14 significant associations between AD and traits. These include expected associations with dementia, cognition (educational attainment), and socioeconomic status-related phenotypes. Through multivariate modelling, they distinguish between (1) clearly independent components associated with AD, from (2) by-product associations that are inflated in the original bivariate analysis. Analyses stratified according to APOE inclusion show that this region does not seem to play a role in the association of some of the identified phenotypes. Of note, they observe overlap but significant differences in the associations identified with BADGERS and other Mendelian randomization (MR), hinting at BADGERS being more powerful than classical top variant-based MR approaches. They then extend BADGERS to other AD-related phenotypes, which serves to refine the hypotheses about the underlying mechanisms accounting for the genetic correlation patterns originally identified for AD. Finally, they run BADGERS on a pre-clinical cohort with mild cognitive impairment. They observe important differences in the association patterns, suggesting that this preclinical phenotype (at least in this cohort) has a different genetic architecture than general AD.

We thank the reviewer a lot for the conclusion and positive comments.

Strengths:

BADGERS is an interesting new addition to a stream of attempts to "squeeze" biobank data beyond pure association studies for diagnosis. Increasingly available biobank cohorts do not usually focus on specific diseases. However, they tend to be data-rich, opening for deep explorations that can be useful to refine our knowledge of the latent factors that lead to diagnosis. Indeed, the possibility of running genetic correlation studies in specific sub-settings of interest (e.g. preclinical cohorts) is arguably the most interesting aspect of BADGERS. Classical methods like LDSC or two-sample MR capitalize on publicly available summary statistics from large cohorts, or having access to individual genotype data of large cohorts to ensure statistical power. Seemingly, BADGERS provides a balanced opportunity to dissect the correlation between traits of interest in settings with small sample size in which other methods do not work well.

We thank the reviewer a lot for the conclusion and positive comments.

Weaknesses:

However, the increased statistical power is just hinted, and for instance, they do not explore if LDSC would have identified these associations. Although I suspect that is the case, this evidence is important to ensure that the abovementioned balance is right. Finally, as discussed by the authors, the reliance on polygenic risk scoring necessarily undermines the causality evidence gained through BADGERS. In this sense, BADGERS provides an alternative to strict instrumental-variable based analysis, which can be particularly useful to generate new mechanistic hypotheses.

We thank the reviewer a lot for the comments. We understand the importance of comparing BADGER to other methods. The comparison with LDSC, while not directly relevant toBADGER’s causal inference aims, is indeed an interesting aspect to consider for future studies. In this paper, we focused on comparing BADGER with Mendelian Randomization (MR), which shares its causal inference objective.

As a result, BADGERS identified a total of 48 traits that reached Bonferroni-corrected statistical significance. In contrast, MR-IVW only identified nine traits with Bonferroni-corrected statistical significance. Among these nine traits, seven were also identified by BADGERS. This demonstrates that BADGER holds higher power in detecting causal relationships.

Regarding the use of polygenic risk scoring, we agree that it holds challenges in directly inferring causality. While BADGERS offers an innovative way to explore genetic correlations and can help generate new hypotheses about disease mechanisms, it does not replace the causal inferences that can be drawn from instrumental-variable-based analyses. Instead, it should be viewed as a complementary tool that can illuminate potential genetic relationships and guide further causal investigations.

In summary, after 15 years of focus on diagnosis that would require having individual access to large patient cohorts, BADGERS can become an excellent tool to dig into trait heterogeneity, especially if it turns out to be more powerful than other available methodologies.

We thank the reviewer a lot for the conclusion and positive comments.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Lambert JC. 2013. NG00036 - IGAP Summary Statistics- Lambert et al. (2013) NIAGADS database. ng00036
Naj AC. 2011. Alzheimer's Disease Genetics Consortium (ADGC) Collection. NIAGADS database. NG00027

Supplementary Materials

Supplementary file 1. Simulation result; Result from Mendelian randomization and GSMR; Acknowledgements to Alzheimer’s Disease Genetics Consortium (ADGC).

elife-91360-supp1.docx^{(64.5KB, docx)}

Supplementary file 2. Full association result between UK-biobank traits and Alzheimer’s disease/endophenotypes.

elife-91360-supp2.xlsx^{(2.5MB, xlsx)}

MDAR checklist

elife-91360-mdarchecklist1.docx^{(99.6KB, docx)}

Data Availability Statement

The following previously published datasets were used:

Lambert JC. 2013. NG00036 - IGAP Summary Statistics- Lambert et al. (2013) NIAGADS database. ng00036

Naj AC. 2011. Alzheimer's Disease Genetics Consortium (ADGC) Collection. NIAGADS database. NG00027

[bib1] Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. Journal of Statistical Software. 2015;67:1–48. doi: 10.18637/jss.v067.i01. [DOI] [Google Scholar]

[bib3] Beecham GW, Hamilton K, Naj AC, Martin ER, Huentelman M, Myers AJ, Corneveaux JJ, Hardy J, Vonsattel JP, Younkin SG, Bennett DA, De Jager PL, Larson EB, Crane PK, Kamboh MI, Kofler JK, Mash DC, Duque L, Gilbert JR, Gwirtsman H, Buxbaum JD, Kramer P, Dickson DW, Farrer LA, Frosch MP, Ghetti B, Haines JL, Hyman BT, Kukull WA, Mayeux RP, Pericak-Vance MA, Schneider JA, Trojanowski JQ, Reiman EM, Schellenberg GD, Montine TJ. Genome-wide association meta-analysis of neuropathologic features of Alzheimer’s disease and related dementias. PLOS Genetics. 2014;10:e1004606. doi: 10.1371/journal.pgen.1004606. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium. Patterson N, Daly MJ, Price AL, Neale BM. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature Genetics. 2015;47:291–295. doi: 10.1038/ng.3211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genetic Epidemiology. 2013;37:658–665. doi: 10.1002/gepi.21758. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K. Genome-wide genetic data on~ 500,000 UK Biobank participants. bioRxiv. 2017 doi: 10.1101/166298. [DOI]

[bib7] Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] Clark LR, Racine AM, Koscik RL, Okonkwo OC, Engelman CD, Carlsson CM, Asthana S, Bendlin BB, Chappell R, Nicholas CR, Rowley HA, Oh JM, Hermann BP, Sager MA, Christian BT, Johnson SC. Beta-amyloid and cognitive decline in late middle age: Findings from the Wisconsin Registry for Alzheimer’s Prevention study. Alzheimer’s & Dementia. 2016;12:805–814. doi: 10.1016/j.jalz.2015.12.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Conomos MP, Miller MB, Thornton TA. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genetic Epidemiology. 2015;39:276–293. doi: 10.1002/gepi.21896. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Crane PK, Trittschuh E, Mukherjee S, Saykin AJ, Sanders RE, Larson EB, McCurry SM, McCormick W, Bowen JD, Grabowski T, Moore M, Bauman J, Gross AL, Keene CD, Bird TD, Gibbons LE, Mez J. Incidence of cognitively defined late-onset Alzheimer’s dementia subgroups from a prospective cohort study. Alzheimer’s & Dementia. 2017;13:1307–1316. doi: 10.1016/j.jalz.2017.04.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh P-R, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C. Next-generation genotype imputation service and methods. Nature Genetics. 2016;48:1284–1287. doi: 10.1038/ng.3656. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human Molecular Genetics. 2014;23:R89–R98. doi: 10.1093/hmg/ddu328. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Deming Y, Li Z, Kapoor M, Harari O, Del-Aguila JL, Black K, Carrell D, Cai Y, Fernandez MV, Budde J, Ma S, Saef B, Howells B, Huang K-L, Bertelsen S, Fagan AM, Holtzman DM, Morris JC, Kim S, Saykin AJ, De Jager PL, Albert M, Moghekar A, O’Brien R, Riemenschneider M, Petersen RC, Blennow K, Zetterberg H, Minthon L, Van Deerlin VM, Lee VM-Y, Shaw LM, Trojanowski JQ, Schellenberg G, Haines JL, Mayeux R, Pericak-Vance MA, Farrer LA, Peskind ER, Li G, Di Narzo AF, Alzheimer’s Disease Neuroimaging Initiative (ADNI) Alzheimer Disease Genetic Consortium (ADGC) Kauwe JSK, Goate AM, Cruchaga C. Genome-wide association study identifies four novel loci associated with Alzheimer’s endophenotypes and disease modifiers. Acta Neuropathologica. 2017;133:839–856. doi: 10.1007/s00401-017-1685-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Deming Y, Dumitrescu L, Barnes LL, Thambisetty M, Kunkle B, Gifford KA, Bush WS, Chibnik LB, Mukherjee S, De Jager PL, Kukull W, Huentelman M, Crane PK, Resnick SM, Keene CD, Montine TJ, Schellenberg GD, Haines JL, Zetterberg H, Blennow K, Larson EB, Johnson SC, Albert M, Moghekar A, Del Aguila JL, Fernandez MV, Budde J, Hassenstab J, Fagan AM, Riemenschneider M, Petersen RC, Minthon L, Chao MJ, Van Deerlin VM, Lee VM-Y, Shaw LM, Trojanowski JQ, Peskind ER, Li G, Davis LK, Sealock JM, Cox NJ, Alzheimer’s Disease Neuroimaging Initiative (ADNI) Alzheimer Disease Genetics Consortium (ADGC) Goate AM, Bennett DA, Schneider JA, Jefferson AL, Cruchaga C, Hohman TJ. Sex-specific genetic predictors of Alzheimer’s disease biomarkers. Acta Neuropathologica. 2018;136:857–872. doi: 10.1007/s00401-018-1881-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Djelti F, Braudeau J, Hudry E, Dhenain M, Varin J, Bièche I, Marquer C, Chali F, Ayciriex S, Auzeil N, Alves S, Langui D, Potier M-C, Laprevote O, Vidaud M, Duyckaerts C, Miles R, Aubourg P, Cartier N. CYP46A1 inhibition, brain cholesterol accumulation and neurodegeneration pave the way for Alzheimer’s disease. Brain. 2015;138:2383–2398. doi: 10.1093/brain/awv166. [DOI] [PubMed] [Google Scholar]

[bib17] Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLOS Genetics. 2013;9:e1003348. doi: 10.1371/journal.pgen.1003348. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, GTEx Consortium. Nicolae DL, Cox NJ, Im HK. A gene-based association method for mapping traits using reference transcriptome data. Nature Genetics. 2015;47:1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, Jansen R, de Geus EJC, Boomsma DI, Wright FA, Sullivan PF, Nikkola E, Alvarez M, Civelek M, Lusis AJ, Lehtimäki T, Raitoharju E, Kähönen M, Seppälä I, Raitakari OT, Kuusisto J, Laakso M, Price AL, Pajukanta P, Pasaniuc B. Integrative approaches for large-scale transcriptome-wide association studies. Nature Genetics. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, Hamshere ML, Pahwa JS, Moskvina V, Dowzell K, Williams A, Jones N, Thomas C, Stretton A, Morgan AR, Lovestone S, Powell J, Proitsi P, Lupton MK, Brayne C, Rubinsztein DC, Gill M, Lawlor B, Lynch A, Morgan K, Brown KS, Passmore PA, Craig D, McGuinness B, Todd S, Holmes C, Mann D, Smith AD, Love S, Kehoe PG, Hardy J, Mead S, Fox N, Rossor M, Collinge J, Maier W, Jessen F, Schürmann B, Heun R, van den Bussche H, Heuser I, Kornhuber J, Wiltfang J, Dichgans M, Frölich L, Hampel H, Hüll M, Rujescu D, Goate AM, Kauwe JSK, Cruchaga C, Nowotny P, Morris JC, Mayo K, Sleegers K, Bettens K, Engelborghs S, De Deyn PP, Van Broeckhoven C, Livingston G, Bass NJ, Gurling H, McQuillin A, Gwilliam R, Deloukas P, Al-Chalabi A, Shaw CE, Tsolaki M, Singleton AB, Guerreiro R, Mühleisen TW, Nöthen MM, Moebus S, Jöckel K-H, Klopp N, Wichmann H-E, Carrasquillo MM, Pankratz VS, Younkin SG, Holmans PA, O’Donovan M, Owen MJ, Williams J. Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer’s disease. Nature Genetics. 2009;41:1088–1093. doi: 10.1038/ng.440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Hohman TJ, Dumitrescu L, Barnes LL, Thambisetty M, Beecham G, Kunkle B, Gifford KA, Bush WS, Chibnik LB, Mukherjee S, De Jager PL, Kukull W, Crane PK, Resnick SM, Keene CD, Montine TJ, Schellenberg GD, Haines JL, Zetterberg H, Blennow K, Larson EB, Johnson SC, Albert M, Bennett DA, Schneider JA, Jefferson AL, Alzheimer’s Disease Genetics Consortium and the Alzheimer’s Disease Neuroimaging Initiative Sex-Specific association of apolipoprotein e with cerebrospinal fluid levels of Tau. JAMA Neurology. 2018;75:989–998. doi: 10.1001/jamaneurol.2018.0821. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Hu Y, Lu Q, Powles R, Yao X, Yang C, Fang F, Xu X, Zhao H. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLOS Computational Biology. 2017;13:e1005589. doi: 10.1371/journal.pcbi.1005589. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, Yu Z, Li B, Muchnik S, Shi Y, Kunkle BW, Mukherjee S, Natarajan P, Naj A, Kuzma A, Zhao Y, Crane PK, Zhao H, Alzheimer’s Disease Genetics Consortium A Statistical Framework for Cross-Tissue Transcriptome-Wide Association Analysis. bioRxiv. 2018 doi: 10.1101/286013. [DOI] [PMC free article] [PubMed]

[bib25] Jack CR, Jr, Knopman DS, Jagust WJ, Petersen RC, Weiner MW, Aisen PS, Shaw LM, Vemuri P, Wiste HJ, Weigand SD, Lesnick TG, Pankratz VS, Donohue MC, Trojanowski JQ. Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. The Lancet Neurology. 2013;12:207–216. doi: 10.1016/S1474-4422(12)70291-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Johnson SC, Koscik RL, Jonaitis EM, Clark LR, Mueller KD, Berman SE, Bendlin BB, Engelman CD, Okonkwo OC, Hogan KJ, Asthana S, Carlsson CM, Hermann BP, Sager MA. The Wisconsin Registry for Alzheimer’s Prevention: A review of findings and current directions. Alzheimer’s & Dementia. 2018;10:130–142. doi: 10.1016/j.dadm.2017.11.007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Jun GR, Chung J, Mez J, Barber R, Beecham GW, Bennett DA, Buxbaum JD, Byrd GS, Carrasquillo MM, Crane PK, Cruchaga C, De Jager P, Ertekin-Taner N, Evans D, Fallin MD, Foroud TM, Friedland RP, Goate AM, Graff-Radford NR, Hendrie H, Hall KS, Hamilton-Nelson KL, Inzelberg R, Kamboh MI, Kauwe JSK, Kukull WA, Kunkle BW, Kuwano R, Larson EB, Logue MW, Manly JJ, Martin ER, Montine TJ, Mukherjee S, Naj A, Reiman EM, Reitz C, Sherva R, St George-Hyslop PH, Thornton T, Younkin SG, Vardarajan BN, Wang LS, Wendlund JR, Winslow AR, Haines J, Mayeux R, Pericak-Vance MA, Schellenberg G, Lunetta KL, Farrer LA. Transethnic genome-wide scan identifies novel Alzheimer’s disease loci. Alzheimer’s & Dementia. 2017;13:727–738. doi: 10.1016/j.jalz.2016.12.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Koscik RL, Berman SE, Clark LR, Mueller KD, Okonkwo OC, Gleason CE, Hermann BP, Sager MA, Johnson SC. Intraindividual cognitive variability in middle age predicts cognitive impairment 8-10 years later: Results from the wisconsin registry for alzheimer’s prevention. Journal of the International Neuropsychological Society. 2016;22:1016–1025. doi: 10.1017/S135561771600093X. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Larsson SC, Traylor M, Malik R, Dichgans M, Burgess S, Markus HS, CoSTREAM Consortium, on behalf of the International Genomics of Alzheimer’s Project Modifiable pathways in Alzheimer’s disease: Mendelian randomisation analysis. BMJ. 2017;359:j5375. doi: 10.1136/bmj.j5375. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Loh P-R, Danecek P, Palamara PF, Fuchsberger C, A Reshef Y, K Finucane H, Schoenherr S, Forer L, McCarthy S, Abecasis GR, Durbin R, L Price A. Reference-based phasing using the Haplotype Reference Consortium panel. Nature Genetics. 2016;48:1443–1448. doi: 10.1038/ng.3679. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TFC, McCarroll SA, Visscher PM. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics. 2007;39:906–913. doi: 10.1038/ng2088. [DOI] [PubMed] [Google Scholar]

[bib34] McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, Luo Y, Sidore C, Kwong A, Timpson N, Koskinen S, Vrieze S, Scott LJ, Zhang H, Mahajan A, Veldink J, Peters U, Pato C, van Duijn CM, Gillies CE, Gandin I, Mezzavilla M, Gilly A, Cocca M, Traglia M, Angius A, Barrett JC, Boomsma D, Branham K, Breen G, Brummett CM, Busonero F, Campbell H, Chan A, Chen S, Chew E, Collins FS, Corbin LJ, Smith GD, Dedoussis G, Dorr M, Farmaki A-E, Ferrucci L, Forer L, Fraser RM, Gabriel S, Levy S, Groop L, Harrison T, Hattersley A, Holmen OL, Hveem K, Kretzler M, Lee JC, McGue M, Meitinger T, Melzer D, Min JL, Mohlke KL, Vincent JB, Nauck M, Nickerson D, Palotie A, Pato M, Pirastu N, McInnis M, Richards JB, Sala C, Salomaa V, Schlessinger D, Schoenherr S, Slagboom PE, Small K, Spector T, Stambolian D, Tuke M, Tuomilehto J, Van den Berg LH, Van Rheenen W, Volker U, Wijmenga C, Toniolo D, Zeggini E, Gasparini P, Sampson MG, Wilson JF, Frayling T, de Bakker PIW, Swertz MA, McCarroll S, Kooperberg C, Dekker A, Altshuler D, Willer C, Iacono W, Ripatti S, Soranzo N, Walter K, Swaroop A, Cucca F, Anderson CA, Myers RM, Boehnke M, McCarthy MI, Durbin R, Haplotype Reference Consortium A reference panel of 64,976 haplotypes for genotype imputation. Nature Genetics. 2016;48:1279–1283. doi: 10.1038/ng.3643. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Mukherjee S, Mez J, Trittschuh E, Saykin AJ. Genetic Data and Cognitively-Defined Late-Onset Alzheimer’s Disease Subgroups. bioRxiv. 2018 doi: 10.1101/367615. [DOI] [PMC free article] [PubMed]

[bib36] Naj AC, Jun G, Beecham GW, Wang L-S, Vardarajan BN, Buros J, Gallins PJ, Buxbaum JD, Jarvik GP, Crane PK, Larson EB, Bird TD, Boeve BF, Graff-Radford NR, De Jager PL, Evans D, Schneider JA, Carrasquillo MM, Ertekin-Taner N, Younkin SG, Cruchaga C, Kauwe JSK, Nowotny P, Kramer P, Hardy J, Huentelman MJ, Myers AJ, Barmada MM, Demirci FY, Baldwin CT, Green RC, Rogaeva E, St George-Hyslop P, Arnold SE, Barber R, Beach T, Bigio EH, Bowen JD, Boxer A, Burke JR, Cairns NJ, Carlson CS, Carney RM, Carroll SL, Chui HC, Clark DG, Corneveaux J, Cotman CW, Cummings JL, DeCarli C, DeKosky ST, Diaz-Arrastia R, Dick M, Dickson DW, Ellis WG, Faber KM, Fallon KB, Farlow MR, Ferris S, Frosch MP, Galasko DR, Ganguli M, Gearing M, Geschwind DH, Ghetti B, Gilbert JR, Gilman S, Giordani B, Glass JD, Growdon JH, Hamilton RL, Harrell LE, Head E, Honig LS, Hulette CM, Hyman BT, Jicha GA, Jin L-W, Johnson N, Karlawish J, Karydas A, Kaye JA, Kim R, Koo EH, Kowall NW, Lah JJ, Levey AI, Lieberman AP, Lopez OL, Mack WJ, Marson DC, Martiniuk F, Mash DC, Masliah E, McCormick WC, McCurry SM, McDavid AN, McKee AC, Mesulam M, Miller BL, Miller CA, Miller JW, Parisi JE, Perl DP, Peskind E, Petersen RC, Poon WW, Quinn JF, Rajbhandary RA, Raskind M, Reisberg B, Ringman JM, Roberson ED, Rosenberg RN, Sano M, Schneider LS, Seeley W, Shelanski ML, Slifer MA, Smith CD, Sonnen JA, Spina S, Stern RA, Tanzi RE, Trojanowski JQ, Troncoso JC, Van Deerlin VM, Vinters HV, Vonsattel JP, Weintraub S, Welsh-Bohmer KA, Williamson J, Woltjer RL, Cantwell LB, Dombroski BA, Beekly D, Lunetta KL, Martin ER, Kamboh MI, Saykin AJ, Reiman EM, Bennett DA, Morris JC, Montine TJ, Goate AM, Blacker D, Tsuang DW, Hakonarson H, Kukull WA, Foroud TM, Haines JL, Mayeux R, Pericak-Vance MA, Farrer LA, Schellenberg GD. Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer’s disease. Nature Genetics. 2011;43:436–441. doi: 10.1038/ng.801. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Norton S, Matthews FE, Barnes DE, Yaffe K, Brayne C. Potential for primary prevention of Alzheimer’s disease: an analysis of population-based data. The Lancet. Neurology. 2014;13:788–794. doi: 10.1016/S1474-4422(14)70136-X. [DOI] [PubMed] [Google Scholar]

[bib38] Østergaard SD, Mukherjee S, Sharp SJ, Proitsi P, Lotta LA, Day F, Perry JRB, Boehme KL, Walter S, Kauwe JS, Gibbons LE, Alzheimer’s Disease Genetics Consortium. GERAD1 Consortium. EPIC-InterAct Consortium. Larson EB, Powell JF, Langenberg C, Crane PK, Wareham NJ, Scott RA. Associations between potentially modifiable risk factors and alzheimer disease: a mendelian randomization study. PLOS Medicine. 2015;12:e1001841. doi: 10.1371/journal.pmed.1001841. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Paternoster L, Tilling K, Davey Smith G. Genetic epidemiology and Mendelian randomization for informing disease therapeutics: Conceptual and methodological challenges. PLOS Genetics. 2017;13:e1006944. doi: 10.1371/journal.pgen.1006944. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Prince M, Bryce R, Albanese E, Wimo A, Ribeiro W, Ferri CP. The global prevalence of dementia: A systematic review and metaanalysis. Alzheimer’s & Dementia. 2013;9:63. doi: 10.1016/j.jalz.2012.11.007. [DOI] [PubMed] [Google Scholar]

[bib41] Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] qlu-lab Badgers. swh:1:rev:d6d1d181549d3df29639bd736be4a39d5a9d44aaSoftware Heritage. 2024 https://archive.softwareheritage.org/swh:1:dir:e973443627f31ba29bcf4c4d176d59daeee8ee56;origin=https://github.com/qlu-lab/BADGERS;visit=swh:1:snp:cdc16e40d72e70869fb348b6bc08f67ad49b2bf8;anchor=swh:1:rev:d6d1d181549d3df29639bd736be4a39d5a9d44aa

[bib43] Reed B, Villeneuve S, Mack W, DeCarli C, Chui HC, Jagust W. Associations between serum cholesterol levels and cerebral amyloidosis. JAMA Neurology. 2014;71:195–200. doi: 10.1001/jamaneurol.2013.5390. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Reitz C, Mayeux R. Alzheimer disease: epidemiology, diagnostic criteria, risk factors and biomarkers. Biochemical Pharmacology. 2014;88:640–651. doi: 10.1016/j.bcp.2013.12.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Sager MA, Hermann B, La Rue A. Middle-aged children of persons with Alzheimer’s disease: APOE genotypes and cognitive function in the Wisconsin Registry for Alzheimer’s Prevention. Journal of Geriatric Psychiatry and Neurology. 2005;18:245–249. doi: 10.1177/0891988705281882. [DOI] [PubMed] [Google Scholar]

[bib46] Seshadri S, Fitzpatrick AL, Ikram MA, DeStefano AL, Gudnason V, Boada M, Bis JC, Smith AV, Carassquillo MM, Lambert JC, Harold D, Schrijvers EMC, Ramirez-Lorca R, Debette S, Longstreth WT, Janssens ACJW, Pankratz VS, Dartigues JF, Hollingworth P, Aspelund T, Hernandez I, Beiser A, Kuller LH, Koudstaal PJ, Dickson DW, Tzourio C, Abraham R, Antunez C, Du Y, Rotter JI, Aulchenko YS, Harris TB, Petersen RC, Berr C, Owen MJ, Lopez-Arrieta J, Varadarajan BN, Becker JT, Rivadeneira F, Nalls MA, Graff-Radford NR, Campion D, Auerbach S, Rice K, Hofman A, Jonsson PV, Schmidt H, Lathrop M, Mosley TH, Au R, Psaty BM, Uitterlinden AG, Farrer LA, Lumley T, Ruiz A, Williams J, Amouyel P, Younkin SG, Wolf PA, Launer LJ, Lopez OL, van Duijn CM, Breteler MMB, CHARGE Consortium. GERAD1 Consortium. EADI1 Consortium Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA. 2010;303:1832–1840. doi: 10.1001/jama.2010.574. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Simons M, Keller P, Dichgans J, Schulz JB. Cholesterol and Alzheimer’s disease: is there a link? Neurology. 2001;57:1089–1093. doi: 10.1212/wnl.57.6.1089. [DOI] [PubMed] [Google Scholar]

[bib48] Sleiman PMA, Grant SFA. Mendelian randomization in the era of genomewide association studies. Clinical Chemistry. 2010;56:723–728. doi: 10.1373/clinchem.2009.141564. [DOI] [PubMed] [Google Scholar]

[bib49] Stern Y. Cognitive reserve in ageing and Alzheimer’s disease. The Lancet. Neurology. 2012;11:1006–1012. doi: 10.1016/S1474-4422(12)70191-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Valenzuela MJ, Sachdev P. Brain reserve and dementia: a systematic review. Psychological Medicine. 2006;36:441–454. doi: 10.1017/S0033291705006264. [DOI] [PubMed] [Google Scholar]

[bib51] Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S, Ripke S, Genovese G, Loh PR, Bhatia G, Do R, Hayeck T, Won HH, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study. Kathiresan S, Pato M, Pato C, Tamimi R, Stahl E, Zaitlen N, Pasaniuc B, Belbin G, Kenny EE, Schierup MH, De Jager P, Patsopoulos NA, McCarroll S, Daly M, Purcell S, Chasman D, Neale B, Goddard M, Visscher PM, Kraft P, Patterson N, Price AL. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. American Journal of Human Genetics. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Yang J, Bakshi A, Zhu Z, Hemani G, Vinkhuyzen AAE, Lee SH, Robinson MR, Perry JRB, Nolte IM, van Vliet-Ostaptchouk JV, Snieder H, LifeLines Cohort Study. Esko T, Milani L, Mägi R, Metspalu A, Hamsten A, Magnusson PKE, Pedersen NL, Ingelsson E, Soranzo N, Keller MC, Wray NR, Goddard ME, Visscher PM. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nature Genetics. 2015;47:1114–1120. doi: 10.1038/ng.3390. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Yavorska OO, Burgess S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. International Journal of Epidemiology. 2017;46:1734–1739. doi: 10.1093/ije/dyx034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] Zhu Z, Zheng Z, Zhang F, Wu Y, Trzaskowski M, Maier R, Robinson MR, McGrath JJ, Visscher PM, Wray NR, Yang J. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nature Communications. 2018;9:224. doi: 10.1038/s41467-017-02317-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Biobank-wide association scan identifies risk factors for late-onset Alzheimer’s disease and endophenotypes

Donghui Yan

Bowen Hu

Burcu F Darst

Shubhabrata Mukherjee

Brian W Kunkle

Yuetiva Deming

Logan Dumitrescu

Yunling Wang

Adam Naj

Amanda Kuzma

Yi Zhao

Hyunseung Kang

Sterling C Johnson

Cruchaga Carlos

Timothy J Hohman

Paul K Crane

Corinne D Engelman

Qiongshi Lu

Roles

Abstract

Introduction

Results

Method overview

Figure 1. Biobank-wide Association Discovery using GEnetic Risk Scores (BADGERS) Workflow.

Figure 1—figure supplement 1. A flowchart for analyses of Alzheimer’s genetic data.

Simulations

Figure 2. Simulation results.

Figure 2—figure supplement 1. Comparison of effect size estimates from Biobank-wide Association Discovery using GEnetic Risk Scores (BADGERS) and regression analysis based on individual-level data.

Figure 2—figure supplement 2. Comparison of p-values from Biobank-wide Association Discovery using GEnetic Risk Scores (BADGERS) and regression analysis based on individual-level data.

Figure 2—figure supplement 3. Comparison of effect size estimates from Biobank-wide Association Discovery using GEnetic Risk Scores (BADGERS) and regression analysis based on individual-level data when p-values are smaller than 0.05.

Figure 2—figure supplement 4. Biobank-wide Association Discovery using GEnetic Risk Scores (BADGERS) estimates using marginal polygenic risk scores (PRS) and joint PRS.

Identify risk factors for late-onset AD among 1738 heritable traits in the UK biobank

Figure 3. polygenic risk score (PR)S-based biobank-wide association scan (BWAS) identifies risk factors for Alzheimer’s disease (AD).

Figure 3—figure supplement 1. Workflow of the two-stage biobank-wide association scan (BWAS) for late-onset Alzheimer’s disease (AD).

Figure 3—figure supplement 2. Associations between Alzheimer’s disease (AD) and education attainment in two independent analyses.

Multivariate conditional analysis identifies independently associated risk factors

Figure 4. Polygenic risk score (PRS) correlation matrix for the 48 traits identified in marginal association analysis.

Figure 4—figure supplement 1. Correlation heatmap for the 15 representative traits selected based on hierarchical clustering.

Influence of the APOE region on identified associations

Figure 5. Influence of the APOE region on trait-Alzheimer’s disease (AD) associations.

Figure 5—figure supplement 1. Influence of a wider APOE region on polygenic risk score (PRS)-Alzheimer’s disease (AD) associations.

Causal inference via Mendelian randomization

Associations with AD subgroups, biomarkers, and pathologies

Figure 6. Associations between identified Alzheimer’s disease (AD) risk factors and various AD subgroups, cerebrospinal fluid (CSF) biomarkers, and neuropathologic features.

Figure 6—figure supplement 1. Association directions between identified Alzheimer’s disease (AD) risk factors and AD endophenotypes.

Figure 6—figure supplement 2. Association results for the complete set of 13 neuropathologic features for Alzheimer’s disease (AD) and other dementias.

Associations with cognitive traits in a pre-clinical cohort

Figure 7. Associations between six traits and pre-clinical cognitive phenotypes in Wisconsin Registry for Alzheimer's Prevention (WRAP).

Discussion

Methods

BADGERS framework

Multivariate analysis in BADGERS

Genetic prediction

Simulation settings

Setting 1

Setting 2

Setting 3

Setting 4

GWAS datasets

Analysis of GWAS summary statistics

Analysis of WRAP data

Software availability

Acknowledgements

Funding Statement

Contributor Information

Funding Information

Additional information

Competing interests

Author contributions

Additional files

Data availability

References

eLife assessment

Nicholas Mancuso

Roles

Reviewer #1 (Public Review):

Anonymous

Roles