Dissecting Alzheimer’s disease heterogeneity by cross-trait polygenic prediction

William F Li; Nabil Mohammed; David A Bennett; Manolis Kellis; Yosuke Tanigawa

doi:10.64898/2026.05.15.725551

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2026 May 15:2026.05.15.725551. [Version 1] doi: 10.64898/2026.05.15.725551

Dissecting Alzheimer’s disease heterogeneity by cross-trait polygenic prediction

William F Li ^1,^2,³, Nabil Mohammed ^4,⁵, David A Bennett ^6,⁷, Manolis Kellis ^1,^2,^†, Yosuke Tanigawa ^1,^2,^4,^5,^†

PMCID: PMC13192582 PMID: 42182496

Abstract

Mapping the genetic basis of inter-individual heterogeneity in multifactorial diseases opens the door to mechanistic insights and opportunities for targeted intervention. In Alzheimer’s disease (AD), clinical and pathological heterogeneity is well recognized, but genetic dissection is limited by a lack of well-powered cohorts with deep phenotypic characterization. Here, we introduce a polygenic score (PGS) analysis strategy to address these limitations by leveraging the inherent pleiotropy in complex trait genetics. We perform a cross-cohort, cross-trait application of pre-trained PGS, integrating 713 UK Biobank-derived PGS with 36 deep AD phenotypes across 1678 ROSMAP participants. We identify 268 statistically significant (FDR<0.1) associations between 12 prioritized PGS and 36 AD phenotypes. Prioritized PGS include blood lipid measurements, inflammatory biomarkers, and cancer traits; observed AD phenotypes include cognition, amyloid, and tangles. Of the 268 associations, 49 persist with APOE-excluded PGS. Predictive models trained on multiple prioritized PGS outperform the AD PGS or APOE alone for predicting amyloid and cognition. Lastly, our approach identifies six individual-level AD polygenic subtypes supported by distinct pathological patterns. Overall, we combine large-scale biobank resources and deeply-phenotyped cohorts using PGS, reveal genetic features underlying AD heterogeneity, and provide a general model for stratifying heterogeneous disease-focused cohorts using genomics.

Introduction

Inter-individual heterogeneity is a fundamental feature of complex traits, where individuals with the same diagnosis can differ profoundly in their clinical course and underlying etiology^1,2. For instance, Alzheimer’s disease (AD), the most prevalent neurodegenerative disease worldwide³, presents with significant variability in neuropathological markers^4,5, cognitive decline patterns^6–8, and disease genetics^9,10. Notably, neuropathological markers, including neuritic plaques, diffuse plaques, and neurofibrillary tangles, differ in their regional distribution^11–13, molecular correlates^14,15, and clinical implications^16,17. Differences in AD neuropathology are associated with variable treatment responses to modern disease-modifying therapies^18–23. Mapping the genetic basis of AD phenotypes provides a scalable way to understand disease heterogeneity, predict individual-level manifestations, and inform clinical strategies tailored to patient subgroups.

Two complementary approaches provide promising insights into the biological basis of AD heterogeneity but fall short in directly connecting genetics with phenotypic variation. On the one hand, focused disease-specific cohort studies, such as the Religious Orders Study and Rush Memory and Aging Project (ROSMAP), have extensively profiled genetic variation^24,25, cell-type-resolved molecular signatures^26–33, neuropathologic measurements^25,34,35, and cognitive change³⁶ in thousands of aging individuals. On the other hand, large-scale genome-wide association studies (GWAS) on hundreds of thousands of individuals have revealed at least 75 AD risk variants and implicated biological processes^10,37–39. However, these two existing approaches have insufficient statistical power to directly map the genetic contributions to AD phenotypic heterogeneity. Due to the clinical and technical obstacles in acquiring and profiling human tissue at scale, even the largest focused cohorts, such as ROSMAP, have orders of magnitude fewer individuals compared to the hundreds of thousands used in modern GWAS meta-analysis^40–46.

Large-scale biobanks, such as the UK Biobank (UKB)^47–51, have accelerated the discovery of common variant associations across thousands of traits using GWAS^49,52,53. Polygenic scores (PGS) aggregate these variant-trait associations from hundreds to thousands of variants into a single score to estimate the genetic liability of a single trait^54,55. For AD, individuals in the highest decile of disease-specific PGS distributions have up to a 1.9-fold greater risk of disease compared to those in the lowest decile^10,56. However, single-trait PGS models capture only a fraction of the polygenic signal relevant to a given trait and cannot predict deep disease phenotypes not directly assessed in biobank-scale studies.

The pervasive pleiotropy in complex trait genetics offers a unique opportunity to investigate the genetic basis of phenotypes not directly measured in biobank-scale studies. Indeed, 20–60% of genetic associations are estimated to be pleiotropic^57–61. For disease case-control prediction, leveraging pleiotropy using cross-trait PGS has led to improved predictive performance beyond what any individual single-trait PGS achieves^62–67. Despite these advancements, little is known about whether cross-trait PGS approaches can be used to characterize the heterogeneity of a single disease. Motivated by this gap, we hypothesize that comprehensive PGS libraries trained on health phenotypes in biobank populations can be applied cross-trait to understand the heterogeneity of disease-related deep phenotypes in smaller, richly phenotyped cohorts.

Here, we address prior limitations and present an application of cross-cohort, cross-trait PGS analysis to leverage pleiotropy for the study of AD heterogeneity. Our analysis bridges 713 UKB-derived PGS⁶⁸ with 36 observed deep AD phenotypes across 1678 ROSMAP participants⁶⁹ to uncover cross-trait associations for deep phenotypes spanning the AD phenome. We report that many of the cross-trait associations reflect polygenic influences beyond the APOE locus. The genomic loci underlying prioritized PGS models are enriched for involvement in trait-relevant biological processes that facilitate the interpretation of the association results. We further show that combining information across multiple PGS improves the prediction of AD phenotypes and identifies six genetically defined subtypes marked by differing pathological patterns. Overall, our work highlights the contributions of polygenic signals to inter-individual variation in AD and demonstrates an application of PGS for dissecting the genetic basis of phenotypic heterogeneity in disease.

Methods

Compliance with ethical regulations and informed consent

We analyzed individual-level genetic and phenotypic data from the Religious Orders Study/Rush Memory and Aging Project (ROSMAP)⁶⁹, and we derived previously characterized polygenic score (PGS) models from prior work on the UK Biobank resource^50,51,68. The ROSMAP study consists of two longitudinal cohort studies of aging and dementia with extensive clinical and pathological phenotypic measurements (n=4067). Genotyping information is now available for nearly all autopsied participants⁷⁰; we used an early subset of 1681 genotyped individuals²⁴. APOE was directly genotyped⁷¹. Participants enroll without known dementia. Informed and repository consents were obtained from all participants, as well as an Anatomic Gift Act. An Institutional Review Board at Rush Medical Center approved both the Religious Orders Study and the Rush Memory and Aging Project⁶⁹.

This research has been conducted using the UK Biobank Resource under Application Number 21942, “Integrated models of complex traits in the UK Biobank” (https://www.ukbiobank.ac.uk/enable-your-research/approved-research/integrated-models-of-complex-traits-in-the-uk-biobank). All participants of the UK Biobank resource provided written informed consent. More information is available at https://www.ukbiobank.ac.uk/explore-your-participation/basis-of-your-participation/.

ROSMAP phenotype selection and processing

We obtained 126 clinical and pathological ROSMAP phenotypic variables described in the Rush Alzheimer’s Disease Center Research Resource Sharing Hub⁶⁹ (Table S1), now with a total of 4067 individuals profiled for these phenotypes with a complete clinical evaluation. We used the 85 quantitative phenotypes. We focused on a subset of the individuals (n=1678) with both genotype (n=1681) and phenotype (n=4067) data for all downstream analyses.

To focus on ROSMAP phenotypes relevant to AD, we evaluated associations with two AD diagnosis variables and kept 37 phenotypic variables with statistically significant associations. We associated each ROSMAP phenotype using logistic regression with a binary clinical diagnosis (diagnoses made by a neurologist at the time of death following review of select clinical data without post-mortem data⁷²) and binary NIA Reagan score for pathologic AD⁷³. For each association, the diagnosis served as the dependent variable, while each ROSMAP phenotype was the independent variable. If a ROSMAP phenotype had p<1 × 10⁻⁸ significance in logistic regression in both logistic regression models, we determined that it was an AD phenotype and used it for further analysis. The TDP-43 pathology phenotype was generated as previously described⁷⁴, resulting in a final count of 36 distinct AD-related quantitative phenotypes. For visualization, we performed Z-score standardization for each phenotype, ignoring missing values (Figure S1). For downstream analysis, we performed mean imputation across all 1678 joint dataset ROSMAP individuals for each of the resulting 36 ROSMAP phenotypes.

To characterize the phenotypic heterogeneity, we performed a PCA on the 36 variables over the 1678 ROSMAP individuals; for the input data to the PCA, we centered and standardized all variables to have a mean of 0 and a standard deviation of 1 over the 1678 individuals. We then quantified the variance explained by each component and visualized latent components using biplot visualization^75,76, where we used scatter plots to show the ROSMAP individuals (observations) projected onto the PCs with overlaid arrows to indicate the PC loadings of each phenotype (variables) to the respective PCs (Figure S3).

ROSMAP genotype quality control, liftOver, and imputation

For the ROSMAP genotype data, we obtained raw, unimputed genotype data from the Synapse Alzheimer’s disease (AD) Knowledge portal (accession ID: syn17008939)⁷⁷. The genotyping was performed in two batches with the hg18 reference, one at the Broad Institute using Affymetrix GeneChip 6.0 for 1126 individuals with 749k SNPs per individual, and the other at the Translational Genomics Institute (TGen) for 582 individuals with 653k SNPs per individual²⁴, for 1708 total individuals with genotype data. We performed quality control, liftOver, and imputation to prepare the genetic data on the hg19 reference, which was used in the PGS models in the UKB⁶⁸ (Figure S2).

First, we applied a missingness filter on the individuals with a 10% threshold, which filtered out 7 individuals, for a total of 1701 individuals (1120 from the Broad, 581 from TGen). We applied a missingness filter on the variants with a 10% threshold.

Second, we converted the genomic coordinates from hg18 to hg19 using the variant ID (rsID) in dbSNP⁷⁸ (version 155) and aligned the alleles using the GRCh37 reference genome⁷⁹. For multi-allelic sites, we kept the variant at the site only if one of the alleles matched the hg19 reference sequence. We confirmed that 99.5% of the variants in the original hg18 dataset were mapped to the hg19 coordinates. We additionally checked our genotyping data against the Haplotype Reference Consortium, which showed 99.7% rsID match^80,81.

Third, we applied genotype imputation followed by an additional round of quality control. To that end, we split the dataset into two batches based on the SNP Array batch each individual belonged to (Broad or TGen). Within each batch, we imputed the genotype data through the Michigan Imputation Server with Eagle 2.4.1 phasing and Minimac4 imputation^82–84. The imputation pipeline defines about 160 sub-chromosomal chunks (161 for Broad and 157 for TGen) and checks the chunk-level missingness of the input genotype for each individual. We removed a total of 20 individuals (14 from Broad and 6 from TGen), each of whom had more than 50% variant missingness in at least one of the 160 chunks. We performed genotype imputation for autosomes and chromosome X separately: the 21 individuals with over 50% missingness in a chromosome X chunk only (10 from Broad and 11 from TGen) have no imputed chromosome X genotype data, but are retained in the dataset; there is additionally 1 retained individual from the Broad batch with no imputed chromosome 20 genotype data. Overall, after imputation, we have a total of 1681 individuals, consisting of 1106 from the Broad batch with 39.6 million imputed variants and 575 from the TGen batch with 38.2 million imputed variants. Of the 1681 individuals, 21 have no chromosome X genotype data, and 1 has no chromosome 20 genotype data.

To inspect the genetic ancestry of individuals, we used pre-computed principal component analysis (PCA) loadings of genetic variants from the Human Genome Diversity Project and the 1000 Genomes Project as a reference^85–87 and projected ROSMAP individuals passing genetic quality control into the latent principal components (PCs) (Figure S4).

Polygenic scoring of ROSMAP individuals

We used sparse UKB PGS models from our previous study, one of the largest collections of PGS models trained on the same cohort with the unified PGS training procedure⁶⁸. In the previous study, we applied the Batch Screening Iterative Lasso algorithm⁸⁸ to 269,704 unrelated white British individuals in the UKB resource, resulting in PGS models for a range of traits, including disease outcomes, cancer registry data, family history for disease, blood biomarkers, and imaging-derived measures, such as volumes of the gray matter in specific brain regions^50,68. We analyzed PGS models reported with statistically significant predictive performance (nominal p<2.5 × 10⁻⁵), except for those for time-to-event traits, resulting in 713 PGS models analyzed in the study (Table S2). Among the two PGS models trained for AD-related phenotypes, we focused on the PGS model trained for family history of AD (n_case=41,451 of 269,704 in UKB) as the primary PGS model for AD (“AD PGS”), given its superior statistical significance in the UKB hold-out test set (p=9.8 × 10⁻¹⁰⁷ vs p=6.3 × 10⁻²⁸) and broader genetic basis (139 underlying genetic variants vs 15 underlying variants). In contrast, “prostate cancer PGS” throughout this paper refers primarily to the PGS model trained for prostate cancer risk alone rather than the one for family history, given the superior statistical performance in this case without family history (p=8.9 × 10⁻⁸⁷ vs p=2.8 × 10⁻⁴⁰).

We computed PGS for each of the 1678 ROSMAP individuals using each of the 713 PGS models in our UKB-derived PGS library (Figure S5a). We performed the computation using PLINK v2.00⁸⁹ to find the average score over the alleles, with missing variants imputed by the mean allele frequency.

Transferability analysis of UKB AD PGS for prediction within ROSMAP

To assess the transferability of UKB PGS models on ROSMAP, we assigned each of the ROSMAP individuals a binary label of AD (NIA Reagan score<=2) or non-AD (NIA Reagan score>2). We excluded individuals with missing NIA Reagan scores from this analysis (407 individuals excluded, with 1271 remaining for the analysis). We computed the receiver operating characteristic (ROC) curve for AD PGS in predicting the binary AD label as well as the area under the ROC curve (AUROC) (Figure S6).

Covariates used for cross-trait association analysis

To adjust for covariates in the two-stage cross-trait association, we included age at baseline, age at death, sex, a binary variable for the batch in which genotyping was performed (TGen or Broad), and the 10 leading population PCs (Table S1). We computed population PCs using PLINK v2.00⁸⁹, the Human Genome Diversity Project and the 1000 Genomes Project reference weights^85,86, and publicly available scoring scripts⁸⁷.

Two-stage cross-trait polygenic association

We conducted cross-trait analysis using UKB-derived PGS and observed ROSMAP AD phenotypes using a two-stage procedure. First, to select PGS relevant to AD, we performed the first stage of cross-trait association. We selected five global observed AD phenotypes (amyloid density, tangle density, global pathology, global cognition level, and global cognition resilience slope). For each trait (both UKB PGS and observed in ROSMAP), we performed z-score standardization across the ROSMAP individuals. We then performed a pairwise linear regression between each of the 5 observed phenotypes as the dependent variable and each PGS in our entire library of 713 PGS as the independent variable, with the described covariates. For each pairwise linear regression, we recorded the effect size (standardized regression coefficient) of the association, the standard error in the effect size estimate, and the p-value of the linear relationship (Figure S7, Table S3). We computed the false discovery rate (FDR) over the p-values of all 3565 (= 5 ROSMAP phenotypes * 713 UKB PGS) pairwise associations using the Benjamini-Hochberg procedure⁹⁰. We retained PGS with at least one pairwise association at FDR<0.5, yielding a collection of 12 AD-relevant PGS (Figure 2a).

Figure 2. — **(a-b)** We show associations of the 12 most relevant PGS (x-axis) with 5 global **(a)** and 36 distinct **(b)** observed AD phenotypes (y-axis), each sorted by hierarchical clustering (dendrograms). The color represents the strength of association between pairs of PGS and observed AD phenotypes, with asterisks marking statistically significant associations. **(c-f)** We show ROSMAP individuals plotted by PGS z-scores (x-axis) and observed plaque burden (y-axis). We show the line of association through the centroid of the data points, where the slope of the line is given by the effect size computed in the stage 2 analysis, and the shaded error regions are 95% confidence intervals (Methods).

To thoroughly characterize AD-related phenotypes, we performed the second stage of cross-trait association. We calculated pairwise associations between the 12 PGS and 36 observed AD phenotypes using multivariable linear regression, with the covariates as described above. For each variable (both UKB PGS and observed ROSMAP phenotype), we performed z-score standardization across the ROSMAP individuals. For each pairwise linear regression, we recorded the effect size of the association, the standard error in the effect size calculation, and the p-value of the linear relationship (Figure 2b, Table S4). We computed the FDR over the p-values of 432 pairwise associations (= 36 ROSMAP phenotypes * 12 UKB PGS) using the Benjamini-Hochberg procedure.

To visualize specific pairwise associations from the second stage of cross-trait association, we plotted individuals by observed ROSMAP phenotype value and standardized z-score PGS (Figure 2c–f). We furthermore plotted a line through the centroid of the individual-level coordinates with slope calculated from the pairwise association effect size; in particular, because the phenotype values are not z-score normalized in this plot, the slope shown in the plot is the recorded effect size (i.e., standardized regression coefficient) from Table S4 multiplied by the standard deviation of the observed ROSMAP phenotype over the individuals. The shaded region around the plotted line was computed as a 95% confidence interval by multiplying the standard error from Table S4 by 1.96, then likewise multiplying by the standard deviation of the observed phenotype.

APOE-exclusion association analysis

To evaluate the impact of the APOE region on our findings, we excluded the genetic variants in the APOE locus (chr19:45,176,340–45,447,221, hg19) from each UKB PGS model, following the literature^91–94, and repeated the second stage of cross-trait association using the modified 12 PGS, recording effect sizes, errors, p-values, and FDR calculated with Benjamini-Hochberg over the 432 p-values (Figure 3a,c,e,g,i, Table S5). We additionally encoded APOE as a continuous APOE-gradient variable using the following mapping for defining a gradient APOE variable (APOE-gradient) ⁹⁵: allele ε2/ε2 maps to −2, ε2/ε3 maps to −1, ε2/ε4 maps to −0.5, ε3/ε3 maps to 0, ε3/ε4 maps to 1, and ε4/ε4 maps to 2. Of the 1678 ROSMAP individuals we are analyzing, we found 5 individuals have missing APOE genotype values; we imputed these 5 using the mode genotype (ε3/ε3). We performed an association between APOE-gradient and the 36 AD-related observed ROSMAP phenotypes using multivariable linear regression with the described covariates. We recorded the effect sizes, errors, p-values, and Benjamini-Hochberg FDR over the 36 p-values (Figure 3k, Table S6).

Figure 3. — **(left)** We show association profiles for individual PGS, showing the effect sizes (y-axis) between each genetic feature (a PGS or *APOE*) and a selection of 6 observed AD phenotypes (x-axis). Error bars show the 95% confidence interval. Each plotted effect size is colored by whether its respective genetic feature has an *APOE* contribution, and with shapes and opacity based on statistical significance. **(right)** We show effect sizes by model weights (y-axis) over the span of the genome (x-axis) for the respective PGS models. We annotate representative variants with large absolute effect sizes by rsID. Vertical red dotted lines demarcate the *APOE* region. Shapes of points in the plots are consequence groups. PTV, protein-truncating variant; PAV, protein-altering variant; PCV, proximal coding variant; UTR, untranslated region. Shown are PGS for AD **(a)**, ApoB **(c)**, use of sun/UV protection **(e)**, hayfever or allergic rhinitis risk **(g)**, prostate cancer **(i)**, as well as the *APOE* allele **(k)**. We also show the PGS model coefficients (y-axis) by genomic position (x-axis) for the prioritized PGS, with each variant annotated by consequence group (color). We show the UKB PGS models for AD **(b)**, Apolipoprotein B **(d)**, Use of sun/UV protection (always) **(f),** Doctor diagnosed hayfever or allergic rhinitis **(h)**, and Prostate cancer **(j).**

Biological characterization of PGS variants

To better understand the AD-relevant PGS, we downloaded the UKB PGS model weights⁶⁸ for prioritized PGS, and we plotted variant coefficients in these models by genomic position (Figure 3b,d,f,h,j). We queried the implicated biological functions of top loci using the Open Targets Platform⁹⁶.

For each of the prioritized AD-relevant PGS, we took the 1000 loci in the respective PGS model with the greatest absolute weight or all the loci if there are fewer than 1000 as the representative loci, and subsequently applied the genomic regions enrichment of annotations tool (GREAT, v4.0.4)^97,98 to evaluate the statistical enrichment of biological processes in Gene Ontology (GO)^99,100. We filtered the GO biological processes by hypergeometric FDR<0.05, then displayed them by binomial FDR and region-fold enrichment (Figure 4a–b, Table S7–S8).

Figure 4. — To evaluate the biological relevance of prioritized PGS models, we applied the Genomic Regions Enrichment of Annotations Tool (GREAT) on the top 1,000 loci from two example PGS models (Methods). (**a-b**) We show the binomial fold enrichment (x-axis) and the statistical significance (binomial FDR) (y-axis) for the PGS models for ApoB (a) and prostate cancer (b). We highlight the significant (FDR < 5%, horizontal dashed line) enrichment with >1.5 binomial fold-changes (vertical dashed line) in color. (c) Semantic clustering (color) of the top 25 most significantly (x-axis) enriched GO terms (y-axis) for the ApoB PGS model reveals eight key processes (Methods, Figure S8).

To group similar enriched GO terms, we performed semantic clustering using the Global Vectors for Word Representation (GloVe), a 50-dimensional word vector mapping pretrained on Wikipedia 2024 + Gigaword 5^101–103. Specifically, we focused on enriched GO terms (hypergeometric FDR<0.05 and binomial FDR<10⁻¹⁵) and removed words with limited semantic meaning (e.g., “of”, “to”, “−”). To obtain the semantic vector of a GO term, we averaged word-level vector representations for each component word. We applied k-means clustering on the semantic representation of GO terms, resulting in clusters of semantically similar GO terms (Figure 4c, Figure S8). In our ApoB PGS model analysis, we manually identified a unifying biological theme for the eight identified clusters.

Multiple-PGS predictive modeling and principal component genetic feature computation

To evaluate the potential of a multiple-PGS model in predicting observed ROSMAP AD phenotypes, we compared the performance of our selected PGS with known genetic features implicated in AD. We split the 1678 individuals randomly into training (70%, n=1174) and testing (30%, n=504) sets. To generate an additional set of genetic features and to better understand the space of PGS, we performed a PCA with centering and standardization of the 12 AD-relevant PGS for individuals in the training set to generate PC loadings (Figure 5a–b). We computed PGS principal components (PCs) among the test set by projecting test set individuals onto these PCs (Figure S5b). We performed a biplot visualization^75,76, showing the PC scores of the test set individuals as a scatterplot overlaid with arrows showing the contributions of each PGS to the respective PC (Figure 5c–e). We chose the specific PCs to show on the biplot first by variance explained (PC1, PC2) and then by identifying a PC that we noted to have a strong contribution from PGS with APOE-independent association findings (PC6).

Figure 5. — **(a)** For principal components (PCs) characterized from 12 prioritized PGS (PGS PCs) (x-axis), we show the proportion of variance explained by each component (y-axis) in the training set (n=1174, 70%). We show individual components in the bar plot and cumulative variance explained in the line plot. **(b)** For each PGS PC (x-axis), we show the relative importance of contributing PGS (y-axis) as absolute PCA loadings (color). We show hierarchical clustering as dendrograms. (c-e) Biplot representing the projected PC scores of test set (n=504, 30%) individuals (dot) and the PGS loadings for each PC (arrows). The color represents the phenotypic value of neuritic plaque burden. We show PC1 vs. PC2 in (c), PC1 vs. PC6 in (d), and PC2 vs. PC6 in (e), selected based on the overall variance explained and the relevance of the *APOE* loci in the PC (Methods). Violet outline in (e) emphasizes that PC2 separates AD PGS from cholesterol PGS. **(f)** We show PGS-based AD polygenic subtype assignments of all 1678 ROSMAP individuals (color) as a biplot of PGS PC1 (x-axis) and PGS PC2 (y-axis), with overlaid PGS arrows as in c. **(g)** We show the distribution of observed AD phenotype value (color) within each AD polygenic subtype for selected variables (x-axis). We used the Wilcoxon rank-sum test to compare the distributions for each variable (Methods). **(h)** For each PGS-based AD polygenic subtype (x-axis), we show the number of ROSMAP individuals (y-axis) stratified by the *APOE* allelotype (color).

We then compared five different modeling schemes for the prediction of the variation of six observed AD phenotypes and two overall AD diagnosis criteria using extreme gradient boosting (XGBoost)¹⁰⁴, training 40 total predictive models. In the first scheme, we trained the XGBoost predictors using only covariates. In the second, we used the quantified APOE allele. In the third, we trained the predictors using AD PGS only. In the fourth, we used all 12 AD-relevant PGS from the Stage 1 association. In the fifth, we used the top 8 PCs computed from PCA of PGS values in the training set. Each scheme included the previously described covariates as additional predictor features for the observed AD phenotypes. In each scheme, we trained 8 predictors, one for each global AD phenotype or AD diagnosis criterion, and used 200 rounds of boosting to train each predictor with default parameters otherwise.

To evaluate the predictors trained in each scheme, we first used each predictor to predict values for its global observed AD phenotype among test set individuals, and we then computed the Pearson correlation coefficient between predicted observed AD phenotype values and ground truth values (Table 1, Table S9). We computed Shapley values for the contribution of each genetic feature in the multiple-PGS prediction schemes (Figure S9).

Table 1.

Predictive performance of gradient-boosted models.

	Covariates only	APOE	AD PGS	All-PGS	PGS PCs
Global cognitive function (19 tests)	0.059	0.138	0.106	0.190	0.184
Global AD pathology burden	0.117	0.242	0.205	0.260	0.224
Neuritic plaque burden (5 regions)	0.078	0.241	0.211	0.257	0.291
Amyloid level (% cortex area, 8 brain regions)	0.235	0.275	0.219	0.265	0.305
Diffuse plaque burden (5 regions)	0.069	0.149	0.168	0.149	0.194
Tangle density (IHC, 8 brain regions)	0.061	0.159	0.063	0.027	0.109
Binarized NIA Reagan Score (neuropathologist postmortem tissue AD diagnosis)	0.126	0.228	0.198	0.160	0.194
Binarized Consensus Cognitive Diagnosis (neurologist/neuropsychologist clinical course AD diagnosis)	0.065	0.139	0.084	0.052	0.128

Open in a new tab

We evaluated the predictive performance (Pearson’s correlation r) of gradient-boosted models with different predictor variables (columns) across six observed AD phenotypes and two compiled AD diagnosis variables (rows). APOE indicates a continuous apolipoprotein E genotype variable (APOE gradient), and PGS PCs indicate the top 8 principal components of prioritized PGS (Methods).

PGS-based individual subtyping

We used genetic features derived from the 12 prioritized AD-relevant UKB PGS to propose genetic subtypes of AD liability. We generated PGS PCs weighted by relevance by multiplying each of the PGS PC values for each individual by the calculated variance-explained values (Figure 5a). Using these relevance-weighted PCs, we used k-means clustering to propose and assign individuals to six individual-level subtypes (Figure 5f), termed “AD polygenic subtypes”. For each subtype, we calculated the average subtype value for selected ROSMAP phenotypes (Figure S10). We performed a Wilcoxon rank-sum test¹⁰⁵ on selected variable values between two chosen subtypes, indicating the computed p-values on the plot (Figure 5g). We additionally calculated the number of individuals with each APOE allelotype (Figure 5h). We stratified individuals by APOE allelotype and repeated the Wilcoxon rank-sum test between subtypes within selected allelotypes (Figure S11). In the calculation of average phenotype values, we used the z-score-standardized scaling as evaluated for (Figure S1) visualization.

Results

Study design to investigate the genetic basis of AD phenotypic heterogeneity

We analyzed 1678 individuals with genotyping and phenotyping of 36 distinct observed AD-related cognitive and histopathological phenotype variables (the “AD phenome”) in ROSMAP (Figure S1–S3, Table S1). Those individuals exhibited substantial phenotypic heterogeneity in cognition, amyloid, and tangles. Applying principal component analysis (PCA), we found that different AD phenotypes have distinct loadings along the top 2 principal components (PCs), with global amyloid variables having opposite and weaker contributions to PC1 compared to global cognitive function (Figure S3b). Overall, the leading twelve phenotypic PCs explained 40.1% of the overall variation (Figure S3a).

To complement detailed phenotypic characterization in ROSMAP with genetic predictors characterized from much larger cohorts, we turned to 713 PGS models pre-trained over 269,704 individuals in the UK Biobank⁶⁸ (Figure 1b, Figure S5a, Table S2). Our approach leveraged both the depth of AD-specific phenotyping in ROSMAP and the statistical power of genetic information from the UKB by computing UKB PGS in ROSMAP individuals to generate biologically relevant genetic features (Figure 1c). To validate the application of UKB PGS in ROSMAP, we evaluated the cross-cohort predictive performance of the AD PGS based on family history (Methods). The AD PGS achieved an AUROC of 0.577 for observed NIA Reagan Score AD diagnosis in ROSMAP individuals (Figure S6), performing comparably with the AUROC of 0.558 in the UKB hold-out test set and confirming comparative predictive performance across cohorts.

Figure 1. — **(a)** Alzheimer’s disease presents with cognitive and histopathological phenotypic heterogeneity. **(b)** We used polygenic score (PGS) models across 713 traits characterized from the UKB resource. **(c)** A cross-trait application of a PGS library pre-trained on the UKB enables characterization of the genetics of AD phenotypic heterogeneity across individuals. **(d)** Using computed PGS and observed ROSMAP cohort phenotypes, we performed a two-stage phenome-wide association analysis to characterize the genetic basis of phenotypic heterogeneity in AD. In Stage 1, we used the cross-trait association to prioritize relevant PGS as genetic features. In Stage 2, we use prioritized PGS to characterize observed AD phenotypes in ROSMAP. **(e)** We further characterized prioritized PGS via GREAT enrichment on leading PGS model loci to identify enriched biological pathways. **(f)** Principal components (PCs) were computed for prioritized PGS, and loadings of each PGS were used to visualize the distinct axes of prioritized PGS. **(g)** We performed individual-level genetic subtyping of the ROSMAP cohort using PGS PCs. **(h)** We performed predictive modeling to evaluate the predictive power of multiple-PGS genetic features for the prediction of observed AD phenotypes.

With evidence of transferability, we designed a two-stage cross-trait association procedure to characterize the genetic basis of the AD phenome using UKB PGS models (Figure 1d). In Stage 1, we screened all 713 PGS against five global observed AD phenotypes. We used a permissive threshold (FDR<0.5) to minimize false negatives in candidate selection. In Stage 2, we used the prioritized PGS from Stage 1 to characterize the full set of 36 observed AD phenotypes with multiple-testing correction. We further performed biological pathway enrichment analysis on the top genomic loci in prioritized PGS models (Figure 1e), characterized the genetic variance captured by those PGS via PCA (Figure 1f), and applied them to genetic prediction and individual-level subtyping of the AD phenome (Figure 1g–h).

Cross-trait UKB PGS associations to ROSMAP phenotypes reveal the genetic basis of phenotypic heterogeneity in AD

To assess the polygenic basis of AD-relevant multivariate phenotypes, we systematically assessed the 713 UKB PGS for their predictive performance for 36 observed phenotypes among 1678 ROSMAP individuals. Specifically, we applied a two-stage procedure to maximize the statistical power (Methods). Briefly, we selected five observed global AD phenotype variables, including amyloid, tangles, combined pathology, cognition, and slope of cognitive decline, representing the major axes of phenotypic variation in ROSMAP (Figure S7, Table S3). In the Stage 1 association analysis, our analysis prioritized 12 PGS with suggestive associations (FDR<0.5 with at least one of the 5 phenotypes) (Figure 2a). We used the permissive threshold to minimize type II error. In the Stage 2 analysis, we assessed the pleiotropic associations between prioritized PGS across a wider range of observed phenotypes; we expanded the set of observed phenotypes to the full set of 36 phenotypes. Overall, we found 268 statistically significant cross-trait associations (FDR<0.1) across 12 PGS and 36 observed phenotypes (Figure 2b). At more stringent FDR thresholds, we found 233 cross-trait pairs (FDR<0.05), 173 pairs (FDR<0.01), and 112 pairs (FDR<0.001). Hierarchical clustering of association summary statistics revealed three clusters among PGS (AD, AD concordant, and AD discordant) and three clusters among observed phenotypes (cognition, amyloid pathology, and tangle pathology).

Our analysis revealed differential patterns of association for PGS across the AD phenome, with each cluster of observed phenotypes having a unique set of enriched genetic features. We noted strong correlations of the AD PGS across the AD phenome, with FDR<0.001 associations for all of the 36 observed AD phenotypes. Among observed phenotypes in the cognition cluster, the PGS for C-reactive protein and Apolipoprotein B (ApoB) showed the strongest correlations, with 10 and 7 distinct pairwise FDR<0.05 associations with observed cognition variables, respectively. For observed phenotypes in the amyloid pathology cluster, the most strongly associated PGS were cholesterol, LDL cholesterol, and ApoB, which each had FDR<0.05 associations with 14 out of 14 of the observed variables in the cluster. Finally, for the tangle pathology cluster, the strongest associations were with the PGS for LDL cholesterol (8 out of 11 associations with FDR<0.05), ApoB (8 out of 11), and C-reactive protein (7 out of 11). Notably, cholesterol PGS had an FDR=0.0497 association with the observed 5-region neurofibrillary tangle burden and an FDR=0.0414 association with the observed Braak stage, the only associations it had with FDR<0.05 in the tangle pathology cluster. We also found a specific example of a revealed differential pattern of association. While ApoB PGS had FDR<0.01 associations with both observed diffuse plaque burden (Figure 2c, FDR=2×10⁻⁵) and observed neuritic plaque burden (Figure 2d, FDR=6×10⁻⁷), prostate cancer PGS had an FDR<0.01 association only with observed neuritic plaque burden (FDR=0.0015) and not with observed diffuse plaque burden (Figure 2e–f, FDR=0.375). This example contrasted a genetic correlate for both types of plaque burden together (ApoB PGS) with a genetic correlate for neuritic plaque burden alone (prostate cancer PGS).

Distinguishing APOE-dependent and independent genetic correlations

To understand the effect of the APOE locus, we next compared cross-trait associations with and without APOE. Overall, we identified 49 cross-trait pairs that remained significantly associated (FDR<0.1) without the APOE locus. For example, associations that involved AD and ApoB PGS relied heavily on APOE contributions (Figure 3a–d, Table S5), whereas associations that involved sun/UV protection and prostate cancer PGS maintained associations independent of APOE (Figure 3e–j, Table S5).

Without APOE, only observed perceptual orientation retained an FDR<0.1 (FDR=0.07) association with AD PGS; by contrast, with the APOE region, AD PGS had FDR<0.001 associations across nearly the entire AD phenome. Similarly, without APOE, ApoB PGS had no FDR<0.1 associations for any observed AD phenotype. We additionally noted that the APOE allele itself shows an association pattern with the AD phenome resembling that of AD PGS risk with APOE, highlighting the strength of the APOE contribution to the AD PGS risk estimate (Figure 3k, Table S6).

Conversely, several other PGS identified in Stage 1 did not exhibit this dependence. Use of sun/UV protection and hayfever or allergic rhinitis PGS retained multiple FDR<0.1 associations over the AD phenome (Table S5). We noted that the use of sun/UV protection PGS retained FDR<0.05 associations across observed amyloid pathology variables (Figure 3e, e.g., FDR=0.02 for overall neuritic plaque burden and FDR=0.03 for overall diffuse plaque burden), and hayfever or allergic rhinitis PGS retained FDR<0.05 associations for tangle pathology variables (Figure 3g, e.g., FDR=0.02 for tangle density). Similarly, each of the PGS in the AD-discordant cluster, other than PGS for C-reactive protein, retained FDR<0.1 associations for 2 or more observed variables in the AD phenome (Table S5). Prostate cancer PGS retained its association strength across observed variables, particularly for neuritic plaque (Figure 3i, FDR=0.02 with overall neuritic plaque burden). Therefore, we found that both of the major pathological clusters (amyloid and tangles) of the observed AD phenome retained statistically significant cross-trait associations without the inclusion of APOE.

Because each PGS model linearly summed coefficients from individual variants, and we used sparse PGS models, we were able to interpret the coefficients for each PGS. Qualitatively, AD PGS (Figure 3b) and ApoB PGS (Figure 3d) were both enriched for loci in the APOE region, while prostate cancer PGS (Figure 3j), for instance, did not show similar dependence, consistent with our findings from our APOE exclusion analysis. At a more granular resolution, the strongest effect size outside of the APOE region for the prostate cancer PGS was from rs6983267^106–108.

Analysis of top variants for each PGS reveals enriched biological processes

We next asked whether aggregating the top genomic loci by effect size for each prioritized PGS could identify enriched biological pathways linking PGS to observed AD phenotypes. Using the genomic regions enrichment of annotations tool (GREAT)^97,98, we found that the genomic regions comprising the ApoB PGS model were enriched for annotations related to lipid metabolism and lipoprotein regulation (Figure 4a, Table S7), processes that have been noted to be implicated in amyloid pathology^109–114 and may provide a putative biological interpretation of the association results between ApoB PGS and observed amyloid pathology (Figure 2b). On the other hand, the genomic regions comprising the prostate cancer PGS model were enriched in annotations for kidney development, male gonad development, glial cell proliferation, and fibroblast proliferation (Figure 4b, Table S8).

To look into the composition of the Gene Ontology (GO) annotations enriched for the ApoB PGS model, we embedded each annotation by semantic meaning using a pre-trained word vector model (Methods), and we clustered GO terms using their semantic meaning vectors. We identified the clustered GO terms to fall under eight biological themes, and we show the 25 leading GO terms by their statistical significance of enrichment and biological theme (Figure 4c, Figure S8). We observed the four leading GO terms fall in the homeostasis and lipoprotein themes.

Multiple-PGS models enhance the prediction of multiple AD-relevant phenotypes

Having prioritized PGS associated with multiple AD phenotypes, we next asked whether joint modeling of these PGS could improve the prediction of observed deep AD phenotypes, an approach informed by prior work on multiple-PGS-based prediction^62,63. We trained nonlinear gradient-boosted models to predict six key AD phenotypes and two overall AD diagnosis variables. We evaluated four sets of genetic features alongside covariates, as well as a covariates-only baseline model. For the two benchmarks, we used the APOE allele and AD PGS. We then trained a model using all 12 prioritized PGS from Stage 1 (Figure 2a) as the genetic features. We also trained a model using the top 8 PGS PCs from a PCA of the training set (Figure 5a); test set PCs were computed by projecting onto the training set PCs. On the test set, the all-PGS model achieved the best predictive performance for global cognitive function and overall AD pathology, while the PGS-PCs model performed best for the three amyloid-related variables (Table 1, Table S9). APOE had the best performance for tangle density and overall AD diagnosis variables. Across AD phenotype prediction tasks, the multiple-PGS models relied on distinct sets of contributing PGS predictors (Figure S9), showing PC1 as the leading contributor for most phenotype predictions whereas PC2 was the leading contributor for consensus cognitive diagnosis and global cognitive function.

Identified PGS characterize heterogeneity at the individual level in an aging and dementia cohort

Having validated the utility of cross-trait PGS application in recovering the genetic basis of deep AD phenotypes at the population level, we investigated prioritized PGS for characterizing both the phenotypic and genetic heterogeneity among ROSMAP individuals. To account for shared genetic effects across the prioritized PGS, we randomly split the n=1678 ROSMAP individuals into a training set (n=1174, 70%) and a held-out test set (n=504, 30%) and performed a PCA of the 12 selected PGS scores in the 1174 training set individuals (Methods). We found that the top 8 PCs combined explained 96% of the variance. Moreover, each PC explained at least 5% of the inter-individual variance of the PGS (Figure 5a–b), indicating there were multiple orthogonal directions of genetic variance captured by the PCs.

We next jointly visualized phenotypic and genetic heterogeneity among ROSMAP individuals using multiple PGS. We projected the remaining 504 held-out test set individuals using the PCA loadings (Figure S5b, Methods). We showed these individuals as points on biplots^75,76 (Figure 5c–e). In these biplots, each point represents an individual, with position determined by projected genetic PC scores derived from multiple PGS and color indicating overall neuritic plaque burden.

The biplots recapitulated known relationships between neuritic plaque burden and AD genetic risk, with higher observed plaque burdens noted toward the right-hand side (higher PC1) of both plots (Figure 5c–d). PC1 had strong positive contributions from PGS for AD, cholesterol, LDL cholesterol, and ApoB and negative contributions from PGS for C-reactive protein and prostate cancer risk (Figure 5c–d). This visualization reflected our earlier quantitative finding that observed neuritic plaque burden correlates with the identified PGS (Figure 2b). On the other hand, for projections onto PC2 by PC6 (Figure 5e), we noted no such qualitative pattern of neuritic plaque burden; indeed, based on the PCA loadings arrows on the biplot and our quantitative results in (Figure 2b), we would not expect any.

We next used the prioritized PGS to propose individual-level polygenic subtypes of AD liability. Using the 12 computed PCs weighted by proportion of variance explained (Methods), we clustered the 1678 ROSMAP individuals to propose six PGS-based AD polygenic subtypes (Figure 5f). We noted, as expected, that individuals within each of the six subtypes are qualitatively closely grouped when plotted on a PC1 vs PC2 biplot. We further showed the number of ROSMAP individuals assigned to each of the six subtypes, as well as the APOE allelotype compositions and selected observed phenotype averages for each subtype (Figure 5g–h, S10). Notably, we found that individuals within each subtype exhibit a diversity of APOE allelotypes, and hence, the APOE allelotype alone would not be sufficient to capture the heterogeneity proposed by these subtypes. While subtypes 4 and 6 differed in their observed diffuse plaque burden distributions with p<0.01 by the Wilcoxon rank-sum test, that statistically significant difference was not present for neuritic plaque burden (Figure 5g). On the other hand, with both observed diffuse and neuritic plaque burden, we found subtype 6 had a lower median than subtype 4, a trend maintained in three out of four comparisons after stratifying individuals within each subtype by APOE allelotype (Figure S11).

Discussion

Here, we present a genetic dissection of phenotypic heterogeneity in complex traits using a systematic application of cross-trait polygenic score (PGS) analysis, using Alzheimer’s disease (AD) as a case study. We combine 713 PGS models pre-trained on 269,704 UKB individuals and 36 deep AD phenotypes across 1678 ROSMAP individuals. We report 268 statistically significant associations between PGS and observed AD phenotypes (FDR<0.1). Gradient-boosted models integrating multiple PGS demonstrate improved predictive performance for amyloid and cognitive AD phenotypes relative to baseline genetic predictors. Finally, we group individuals into AD polygenic subtypes and observe subtype-specific differences in observed diffuse plaque pathology.

In the context of human genetics, our study illustrates how predictive modeling with pleiotropy can be used to map the genetic basis of phenotypic heterogeneity within disease. Previous studies have employed pleiotropy to enhance the predictive performance of PGS in case-control disease risk prediction^62,63. Here, we introduce an application for mapping the genetic basis of disease heterogeneity, leveraged through a large collection of biobank-trained PGS⁶⁸. The proposed cross-cohort, cross-trait strategy is particularly attractive when a densely phenotyped disease-focused cohort has a smaller sample size compared to the cohorts used in conventional case-control GWAS and meta-analyses, as is the case with ROSMAP. We envision similar approaches to help characterize the genetic basis of phenotypic heterogeneity in other multifactorial diseases^115,116.

Our cross-trait analyses demonstrated two specific promising use cases for applying PGS resources from large-scale cohorts to disease-focused cohorts with deep phenotypic characterizations: improved genetic prediction of disease phenotypic heterogeneity and a genetics-based subtyping of individuals. First, combining multiple AD-relevant PGS into a joint gradient-boosted predictive model outperformed single-score approaches for amyloid and cognition phenotypes, suggesting a generalizable way to more accurately predict deep AD phenotypes from genetics alone, consistent with prior findings^62,63. Second, the same set of PGS was used to cluster individuals into distinct genetic subtypes, which exhibit both phenotypic differentiation and diverse APOE allelotype compositions. Together, these results suggest that integrating cross-trait polygenic signals may provide a useful framework for refining genetic risk stratification and for exploring genetically defined subtypes of AD liability in future studies.

Biologically, our study identifies genetic factors associated with distinct aspects of the AD phenome. We identify genetic features beyond APOE and AD PGS: of the 268 significant PGS-phenotype associations, 49 (18.3%) remained significant after excluding the APOE locus entirely. Notably, significant associations with both observed amyloid and tangle pathologies persisted, indicating that non-APOE genetic variation informs both major neuropathological axes of AD. We reveal three main clusters of AD phenotypes arising from association patterns: cognition, amyloid pathology, and tangle pathology variables. Likewise, we reveal three main clusters of AD-relevant PGS: PGS that directly predict AD (“AD PGS”), PGS whose association patterns are concordant with those of AD PGS, and PGS whose association patterns are discordant with those of AD PGS. Significant associations emerge between PGS and AD phenotypes for every combination of PGS cluster and phenotype cluster, showing a diversity of genetic factors associated with each dimension of the AD phenome.

Several PGS-AD phenotype associations warrant follow-up analysis on their causal relevance. For example, we found prostate cancer PGS retained associations specific to observed neuritic, but not diffuse, plaque burden, suggesting genetic dimensions of heterogeneity between neuritic plaque formation and diffuse plaques. GREAT^97,98 enrichment of the prostate cancer PGS model loci showed pathways involved in glial cell and fibroblast proliferation regulation, suggesting these gene sets as directions for follow-up to evaluate their potential role in neuritic plaque formation. We also found that the hayfever or allergic rhinitis PGS retained FDR<0.05 associations specifically with observed tangle pathology variables, generating biological hypotheses for follow-up on whether neuroinflammatory pathways differentially modulate tangle burden^117–119.

We note several directions for future study. First, because each PGS model represents a weighted combination of genetic variant effects, cross-trait associations can highlight biological pathways for mechanistic follow-up, for example, GREAT-enriched lipid metabolism pathways in relation to neuritic plaque burden. Second, our results rely on genetic correlations between PGS and observed AD phenotypes and do not establish causality. The association between ApoB PGS and observed amyloid pathology, for example, does not demonstrate that LDL-related variants causally affect amyloid burden, as shared genetic architecture could reflect horizontal pleiotropy or correlated downstream effects. Causal inference through pleiotropy-aware Mendelian randomization or related methods will be needed to evaluate specific mechanistic hypotheses arising from these associations^120,121. Third, the PGS models used here were trained in European-ancestry individuals from the UKB⁶⁸, and ROSMAP is itself predominantly of European ancestry⁶⁹. Further studies are needed to generalize our findings to distinct genetic ancestry groups. Finally, further replication, preferably in prospective cohorts, is the next step to validate and refine our multiple-PGS predictive models and genetic subtypes.

Overall, our results establish an application of cross-trait PGS for pairing population-based genetic resources with deeply phenotyped disease cohorts to characterize disease heterogeneity. Our strategy addresses the statistical power limitations arising from the limited sample sizes of richly phenotyped disease cohorts. In our application to AD, the systematic mapping of APOE-dependent and -independent genetic contributions across amyloid pathology, tau pathology, and cognition domains provides a foundation for future mechanistic investigations of disease heterogeneity. The multiple-PGS predictive framework and individual-level genetic subtyping presented in our work have potential for developing genetic risk stratification efforts with larger and more diverse cohorts and deeper integration with molecular and functional data.

Supplementary Material

Supplement 1

media-1.pdf^{(1.9MB, pdf)}

Supplement 2

media-2.xlsx^{(450KB, xlsx)}

Acknowledgments

We acknowledge general support from National Institutes of Health (NIH) grants AG054012, AG058002, MH109978, AG062377, AG081017, NS129032, AG077227, NS110453, NS115064, AG062335, AG074003, NS127187, AG067151, MH119509, HG008155, DA053631, R01-AT011460, and R56AG081376 (to M.K.). ROSMAP is supported by P30AG10161, P30AG72975, R01AG17917, R01AG015819, U01AG072572, and U01AG046152. We thank Patricia Purcell, Amy Grayson, and the members of the Kellis lab for their scientific suggestions. Figure 1 was created with the use of BioRender.com. The content is solely the responsibility of the authors.

Footnotes

Declaration of Interests

Y.T. holds a visiting Associate Professorship at Kyoto University and a visiting researcher position at the University of Tokyo for collaboration; those affiliations have no role in study design, data collection, data analysis, the decision to publish, or the preparation of the manuscript.

Declaration of generative AI and AI-assisted technologies.

During the preparation of this work, the authors used ChatGPT and Claude to develop analysis scripts for the project and to refine text. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the results and content of the published article.

Data and Code Availability

We used individual-level data from the Religious Orders Study/Rush Memory and Aging Project (ROSMAP) through the Rush Alzheimer’s Disease Center Resource Sharing Hub (https://www.radc.rush.edu/).

We downloaded the pre-trained polygenic score models characterized from our previous study⁶⁸ from the PGS catalog¹²² (PGS Publication ID: PGP000244) https://www.pgscatalog.org/.

The code used to analyze the data and generate the figures in the manuscript is available at https://github.com/williamfli/AD_PGS_paper_code/tree/main

References

1.Johansson Å. et al. Precision medicine in complex diseases-Molecular subgrouping for improved prediction and treatment stratification. J Intern Med 294, 378–396 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Woodward A. A., Urbanowicz R. J., Naj A. C. & Moore J. H. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet Epidemiol 46, 555–571 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Gustavsson A. et al. Global estimates on the number of persons across the Alzheimer’s disease continuum. Alzheimers. Dement. 19, 658–670 (2023). [DOI] [PubMed] [Google Scholar]
4.Duara R. & Barker W. Heterogeneity in Alzheimer’s Disease Diagnosis and Progression Rates: Implications for Therapeutic Trials. Neurotherapeutics 19, 8–25 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Murray M. E. et al. Neuropathologically defined subtypes of Alzheimer’s disease with distinct clinical characteristics: a retrospective study. Lancet Neurol 10, 785–796 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Qian J., Betensky R. A., Hyman B. T. & Serrano-Pozo A. Association of Genotype With Heterogeneity of Cognitive Decline Rate in Alzheimer Disease. Neurology 96, e2414–e2428 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Lam B., Masellis M., Freedman M., Stuss D. T. & Black S. E. Clinical, imaging, and pathological heterogeneity of the Alzheimer’s disease syndrome. Alzheimers Res Ther 5, 1 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Peter J. et al. Subgroups of Alzheimer’s disease: stability of empirical clusters over time. J Alzheimers Dis 42, 651–661 (2014). [DOI] [PubMed] [Google Scholar]
9.Sweet R. A. et al. Effect of Alzheimer’s disease risk genes on trajectories of cognitive function in the Cardiovascular Health Study. Am J Psychiatry 169, 954–962 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Bellenguez C. et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat. Genet. 54, 412–436 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Arnold S. E., Hyman B. T., Flory J., Damasio A. R. & Van Hoesen G. W. The topographical and neuroanatomical distribution of neurofibrillary tangles and neuritic plaques in the cerebral cortex of patients with Alzheimer’s disease. Cereb Cortex 1, 103–116 (1991). [DOI] [PubMed] [Google Scholar]
12.Trejo-Lopez J. A., Yachnis A. T. & Prokop S. Neuropathology of Alzheimer’s Disease. Neurotherapeutics 19, 173–185 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Montine T. J. et al. National Institute on Aging-Alzheimer’s Association guidelines for the neuropathologic assessment of Alzheimer’s disease: a practical approach. Acta Neuropathol 123, 1–11 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Grothe M. J. et al. Molecular properties underlying regional vulnerability to Alzheimer’s disease pathology. Brain 141, 2755–2771 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Mathys H. et al. Single-cell atlas reveals correlates of high cognitive function, dementia, and resilience to Alzheimer’s disease pathology. Cell 186, 4365–4385.e27 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Graff-Radford J. et al. New insights into atypical Alzheimer’s disease in the era of biomarkers. Lancet Neurol. 20, 222–234 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Malek-Ahmadi M., Perez S. E., Chen K. & Mufson E. J. Neuritic and Diffuse Plaque Associations with Memory in Non-Cognitively Impaired Elderly. J Alzheimers Dis 53, 1641–1652 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Suzuki N., Hatta T., Ito M. & Kusakabe K.-I. Anti-amyloid-β Antibodies and Anti-tau Therapies for Alzheimer’s Disease: Recent Advances and Perspectives. Chem Pharm Bull (Tokyo) 72, 602–609 (2024). [DOI] [PubMed] [Google Scholar]
19.Sims J. R. et al. Donanemab in Early Symptomatic Alzheimer Disease: The TRAILBLAZER-ALZ 2 Randomized Clinical Trial. JAMA 330, 512–527 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Chen S. et al. Lecanemab treatment for Alzheimer’s Disease of varying severities and associated plasma biomarkers monitoring: A multi-center real-world study in China. Alzheimers Dement 21, e70750 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.van Dyck C. H. et al. Lecanemab in Early Alzheimer’s Disease. N Engl J Med 388, 9–21 (2023). [DOI] [PubMed] [Google Scholar]
22.Fox N. C. et al. Treatment for Alzheimer’s disease. Lancet 406, 1408–1423 (2025). [DOI] [PubMed] [Google Scholar]
23.Paczynski M. et al. Lecanemab Treatment in a Specialty Memory Clinic. JAMA Neurol 82, 655–665 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.De Jager P. L. et al. A genome-wide scan for common variants affecting the rate of age-related cognitive decline. Neurobiol. Aging 33, 1017.e1–15 (2012). [Google Scholar]
25.Kapasi A. et al. High-throughput digital quantification of Alzheimer disease pathology and associated infrastructure in large autopsy studies. J Neuropathol Exp Neurol 82, 976–986 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Liu Z. et al. Single-cell multiregion epigenomic rewiring in Alzheimer’s disease progression and cognitive resilience. Cell 188, 4980–5002.e29 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Sun N. et al. Human microglial state dynamics in Alzheimer’s disease progression. Cell 186, 4386–4403.e29 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Xiong X. et al. Epigenomic dissection of Alzheimer’s disease pinpoints causal variants and reveals epigenome erosion. Cell 186, 4422–4437.e21 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Dileep V. et al. Neuronal DNA double-strand breaks lead to genome structural variations and 3D genome disruption in neurodegeneration. Cell 186, 4404–4421.e20 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Mathys H. et al. Single-cell multiregion dissection of Alzheimer’s disease. Nature 632, 858–868 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Fujita M. et al. Cell subtype-specific effects of genetic variation in the Alzheimer’s disease brain. Nat Genet 56, 605–614 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Sun N. et al. Single-nucleus multiregion transcriptomic analysis of brain vasculature in Alzheimer’s disease. Nat Neurosci 26, 970–982 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Mathys H. et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Bennett D. A. et al. Apolipoprotein E epsilon4 allele, AD pathology, and the clinical expression of Alzheimer’s disease. Neurology 60, 246–252 (2003). [DOI] [PubMed] [Google Scholar]
35.Bennett D. A., Schneider J. A., Tang Y., Arnold S. E. & Wilson R. S. The effect of social networks on the relation between Alzheimer’s disease pathology and level of cognitive function in old people: a longitudinal cohort study. Lancet Neurol 5, 406–412 (2006). [DOI] [PubMed] [Google Scholar]
36.Wilson R. S. et al. Temporal course and pathologic basis of unawareness of memory loss in dementia. Neurology 85, 984–991 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Lake J. et al. Multi-ancestry meta-analysis and fine-mapping in Alzheimer’s disease. Mol. Psychiatry 28, 3121–3132 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Rajabli F. et al. Multi-ancestry genome-wide meta-analysis of 56,241 individuals identifies known and novel cross-population and ancestry-specific associations as novel risk loci for Alzheimer’s disease. Genome Biol. 26, 210 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Wightman D. P. et al. A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease. Nat. Genet. 53, 1276–1282 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Danner B. et al. Brain banking in the United States and Europe: Importance, challenges, and future trends. J Neuropathol Exp Neurol 83, 219–229 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Rush A. et al. The Experts Speak: Challenges in Banking Brain Tissue for Research. Biopreserv Biobank 22, 179–184 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Shepherd C. E., Alvendia H. & Halliday G. M. Brain Banking for Research into Neurodegenerative Disorders and Ageing. Neurosci Bull 35, 283–288 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Uffelmann E. et al. Genome-wide association studies. Nat. Rev. Methods Primers 1, (2021). [Google Scholar]
44.Politi C., Roumeliotis S., Tripepi G. & Spoto B. Sample Size Calculation in Genetic Association Studies: A Practical Approach. Life (Basel) 13, (2023). [Google Scholar]
45.Shade L. M. P. et al. GWAS of multiple neuropathology endophenotypes identifies new risk loci and provides insights into the genetic risk of dementia. Nat Genet 56, 2407–2421 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Cheruiyot E. K., Yang T. & McRae A. F. GWAS significance thresholds in large cohorts of European ancestry. Genetics 230, (2025). [Google Scholar]
47.UK Biobank Whole-Genome Sequencing Consortium. Whole-genome sequencing of 490,640 UK Biobank participants. Nature 645, 692–701 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Lieb W. et al. Population-Based Biobanking. Genes (Basel) 15, (2024). [Google Scholar]
49.Karczewski K. J. et al. Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects. Nat. Genet. 57, 2408–2417 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Bycroft C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Sudlow C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12, e1001779 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Canela-Xandri O., Rawlik K. & Tenesa A. An atlas of genetic associations in UK Biobank. Nat Genet 50, 1593–1599 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Sakaue S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Choi S. W., Mak T. S.-H. & O’Reilly P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc 15, 2759–2772 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Kullo I. J. Clinical use of polygenic risk scores: current status, barriers and future directions. Nat Rev Genet 27, 246–263 (2026). [DOI] [PubMed] [Google Scholar]
56.Nicolas A. et al. Transferability of European-derived Alzheimer’s disease polygenic risk scores across multiancestry populations. Nat Genet 57, 1598–1610 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Watanabe K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet 51, 1339–1348 (2019). [DOI] [PubMed] [Google Scholar]
58.Jordan D. M., Verbanck M. & Do R. HOPS: a quantitative score reveals pervasive horizontal pleiotropy in human genetic variation is driven by extreme polygenicity of human traits and diseases. Genome Biol 20, 222 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Barbitoff Y. A., Bogaichuk P. M., Pavlova N. S., Malysheva P. V. & Predeus A. V. Functional Determinants and Evolutionary Consequences of Pleiotropy in Complex and Mendelian Traits. Mol Biol Evol 42, (2025). [Google Scholar]
60.Jee Y. H. et al. Dissecting pleiotropy to gain mechanistic insights into human disease. Nat Rev Genet (2025) doi: 10.1038/s41576-025-00908-0. [DOI] [Google Scholar]
61.Sollis E. et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res 51, D977–D985 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Truong B. et al. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. Cell Genom 4, 100523 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Sinnott-Armstrong N. et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nat. Genet. 53, 185–194 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Li C., Yang C., Gelernter J. & Zhao H. Improving genetic risk prediction by leveraging pleiotropy. Hum Genet 133, 639–650 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Tian X. et al. PRISM: ancestry-aware integration of tissue-specific genomic annotations enhances the transferability of polygenic scores. bioRxiv (2025) doi: 10.1101/2025.11.13.688144. [DOI] [Google Scholar]
66.Zheng Z. et al. Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries. Nat. Genet. 56, 767–777 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Tanigawa Y. & Kellis M. Power of inclusion: Enhancing polygenic prediction with admixed individuals. Am J Hum Genet 110, 1888–1902 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Tanigawa Y. et al. Significant sparse polygenic risk scores across 813 traits in UK Biobank. PLoS Genet. 18, e1010105 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Bennett D. A. et al. Religious Orders Study and Rush Memory and Aging Project. J. Alzheimers. Dis. 64, S161–S189 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Vialle R. A. et al. Structural variants linked to Alzheimer’s disease and other common age-related clinical and neuropathologic traits. Genome Med 17, 20 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Oveisgharan S. et al. Proteins linking APOE ε4 with Alzheimer’s disease. Alzheimers Dement 20, 4499–4511 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Bennett D. A. et al. Decision rules guiding the clinical diagnosis of Alzheimer’s disease in two community-based cohort studies compared to standard practice in a clinic-based cohort study. Neuroepidemiology 27, 169–176 (2006). [DOI] [PubMed] [Google Scholar]
73.Schneider J. A., Arvanitakis Z., Leurgans S. E. & Bennett D. A. The neuropathology of probable Alzheimer disease and mild cognitive impairment. Ann Neurol 66, 200–208 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Nag S. et al. TDP-43 pathology in anterior temporal pole cortex in aging and Alzheimer’s disease. Acta Neuropathol. Commun. 6, 33 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Gower J. C., Lubbe S. G. & Le Roux N. J. Understanding Biplots. (John Wiley & Sons, 2011). [Google Scholar]
76.Gabriel K. R. The biplot graphic display of matrices with application to principal component analysis. Biometrika 58, 453–467 (1971). [Google Scholar]
77.Logsdon B. AMP AD target discovery data portal. Synapse; 10.7303/SYN2580853 (2015). [DOI] [Google Scholar]
78.Sherry S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
79.Church D. M. et al. Modernizing reference genome assemblies. PLoS Biol. 9, e1001091 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Rayner N. W., Robertson N., Mahajan A. & McCarthy M. I. A Suite Of Programs For Pre- And Postimputation Data Checking. in American Society of Human Genetics Posters (2016). [Google Scholar]
81.McCarthy S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48, 1279–1283 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
82.Das S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
83.Loh P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. bioRxiv (2016) doi: 10.1101/052308. [DOI] [Google Scholar]
84.Fuchsberger C., Abecasis G. R. & Hinds D. A. Minimac2: Faster genotype imputation. Bioinformatics 31, 782–784 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
85.1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
86.Cavalli-Sforza L. L. The Human Genome Diversity Project: past, present and future. Nat. Rev. Genet. 6, 333–340 (2005). [DOI] [PubMed] [Google Scholar]
87.COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
88.Qian J. et al. A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank. PLoS Genet. 16, e1009141 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
89.Chang C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
90.Benjamini Y. & Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995). [Google Scholar]
91.Skoog I. et al. A Non-APOE Polygenic Risk Score for Alzheimer’s Disease Is Associated With Cerebrospinal Fluid Neurofilament Light in a Representative Sample of Cognitively Unimpaired 70-Year Olds. J Gerontol A Biol Sci Med Sci 76, 983–990 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
92.Ware E. B., Faul J. D., Mitchell C. M. & Bakulski K. M. Considering the APOE locus in Alzheimer’s disease polygenic scores in the Health and Retirement Study: a longitudinal panel study. BMC Med Genomics 13, 164 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
93.Bakulski K. M. et al. A non-APOE Polygenic score for Alzheimer’s disease and APOE-ε4 have independent associations with dementia in the Health and Retirement Study. bioRxiv (2020) doi: 10.1101/2020.02.10.20021667. [DOI] [Google Scholar]
94.Leonenko G. et al. Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores. Nat Commun 12, 4506 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
95.Norgren J., Sindi S., Matton A., Kivipelto M. & Kåreholt I. APOE-Genotype and Insulin Modulate Estimated Effect of Dietary Macronutrients on Cognitive Performance: Panel Analyses in Nondiabetic Older Adults at Risk of Dementia. J Nutr 153, 3506–3520 (2023). [DOI] [PubMed] [Google Scholar]
96.Buniello A. et al. Open Targets Platform: facilitating therapeutic hypotheses building in drug discovery. Nucleic Acids Res 53, D1467–D1475 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
97.Tanigawa Y., Dyer E. S. & Bejerano G. WhichTF is functionally important in your open chromatin data? PLoS Comput Biol 18, e1010378 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
98.McLean C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 28, 495–501 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
99.Ashburner M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–29 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
100.Gene Ontology Consortium et al. The Gene Ontology knowledgebase in 2023. Genetics 224, (2023). [Google Scholar]
101.Pennington J., Socher R. & Manning C. D. GloVe: Global Vectors for Word Representation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014). doi: 10.3115/v1/D14-1162. [DOI] [Google Scholar]
102.Carlson R., Bauer J. & Manning C. D. A new pair of GloVes. arXiv [cs.CL] (2025) doi: 10.48550/ARXIV.2507.18103. [DOI] [Google Scholar]
103.Parker R., Graff D., Kong J., Chen K. & Maeda K. English Gigaword Fifth Edition. Linguistic Data Consortium; 10.35111/WK4F-QT80 (2011). [DOI] [Google Scholar]
104.Chen T. & Guestrin C. XGBoost: A Scalable Tree Boosting System. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (2016). doi: 10.1145/2939672.2939785. [DOI] [Google Scholar]
105.Mann H. B. & Whitney D. R. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Stat. 18, 50–60 (1947). [Google Scholar]
106.Yao H.-F. et al. CASC8 activates the pentose phosphate pathway to inhibit disulfidptosis in pancreatic ductal adenocarcinoma though the c-Myc-GLUT1 axis. J Exp Clin Cancer Res 44, 26 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
107.Wu Q. et al. The m6A-induced lncRNA CASC8 promotes proliferation and chemoresistance via upregulation of hnRNPL in esophageal squamous cell carcinoma. Int J Biol Sci 18, 4824–4836 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
108.Sang Y. et al. Long non-coding RNA CASC8 polymorphisms are associated with the risk of esophageal cancer in a Chinese population. Thorac Cancer 11, 2852–2857 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
109.Darabi S., Gorgich E. A. C., Moradi F. & Rustamzadeh A. Lipidopathy disrupts peripheral and central amyloid clearance in Alzheimer’s disease: Where are our knowledge. IBRO Neurosci Rep 18, 191–199 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
110.Di Paolo G. & Kim T.-W. Linking lipids to Alzheimer’s disease: cholesterol and beyond. Nat. Rev. Neurosci. 12, 284–296 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
111.Yang L. G., March Z. M., Stephenson R. A. & Narayan P. S. Apolipoprotein E in lipid metabolism and neurodegenerative disease. Trends Endocrinol. Metab. 34, 430–445 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
112.Wang H. et al. Regulation of beta-amyloid production in neurons by astrocyte-derived cholesterol. Proc. Natl. Acad. Sci. U. S. A. 118, e2102191118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
113.Morgado I. & Garvey M. Lipids in amyloid-β processing, aggregation, and toxicity. Adv. Exp. Med. Biol. 855, 67–94 (2015). [DOI] [PubMed] [Google Scholar]
114.Sprenger K. G., Lietzke E. E., Melchior J. T. & Bruce K. D. Lipid and lipoprotein metabolism in microglia: Alzheimer’s disease mechanisms and interventions. J. Lipid Res. 66, 100872 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
115.Boix C. A., James B. T., Park Y. P., Meuleman W. & Kellis M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
116.Shi C. et al. Multifactorial Diseases of the Heart, Kidneys, Lungs, and Liver and Incident Cancer: Epidemiology and Shared Mechanisms. Cancers (Basel) 15, (2023). [Google Scholar]
117.Ising C. et al. NLRP3 inflammasome activation drives tau pathology. Nature 575, 669–673 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
118.Dutta D. et al. Tau fibrils induce glial inflammation and neuropathology via TLR2 in Alzheimer’s disease-related mouse models. J Clin Invest 133, (2023). [Google Scholar]
119.Langworth-Green C. et al. Chronic effects of inflammation on tauopathies. Lancet Neurol 22, 430–442 (2023). [DOI] [PubMed] [Google Scholar]
120.Sanderson E. et al. Mendelian randomization. Nat Rev Methods Primers 2, (2022). [Google Scholar]
121.Richmond R. C. & Davey Smith G. Mendelian Randomization: Concepts and Scope. Cold Spring Harb Perspect Med 12, (2022). [Google Scholar]
122.Lambert S. A. et al. Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization. Nat Genet 56, 1989–1994 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

media-1.pdf^{(1.9MB, pdf)}

Supplement 2

media-2.xlsx^{(450KB, xlsx)}

Data Availability Statement

We used individual-level data from the Religious Orders Study/Rush Memory and Aging Project (ROSMAP) through the Rush Alzheimer’s Disease Center Resource Sharing Hub (https://www.radc.rush.edu/).

We downloaded the pre-trained polygenic score models characterized from our previous study⁶⁸ from the PGS catalog¹²² (PGS Publication ID: PGP000244) https://www.pgscatalog.org/.

The code used to analyze the data and generate the figures in the manuscript is available at https://github.com/williamfli/AD_PGS_paper_code/tree/main

[R1] 1.Johansson Å. et al. Precision medicine in complex diseases-Molecular subgrouping for improved prediction and treatment stratification. J Intern Med 294, 378–396 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Woodward A. A., Urbanowicz R. J., Naj A. C. & Moore J. H. Genetic heterogeneity: Challenges, impacts, and methods through an associative lens. Genet Epidemiol 46, 555–571 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Gustavsson A. et al. Global estimates on the number of persons across the Alzheimer’s disease continuum. Alzheimers. Dement. 19, 658–670 (2023). [DOI] [PubMed] [Google Scholar]

[R4] 4.Duara R. & Barker W. Heterogeneity in Alzheimer’s Disease Diagnosis and Progression Rates: Implications for Therapeutic Trials. Neurotherapeutics 19, 8–25 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Murray M. E. et al. Neuropathologically defined subtypes of Alzheimer’s disease with distinct clinical characteristics: a retrospective study. Lancet Neurol 10, 785–796 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Qian J., Betensky R. A., Hyman B. T. & Serrano-Pozo A. Association of Genotype With Heterogeneity of Cognitive Decline Rate in Alzheimer Disease. Neurology 96, e2414–e2428 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Lam B., Masellis M., Freedman M., Stuss D. T. & Black S. E. Clinical, imaging, and pathological heterogeneity of the Alzheimer’s disease syndrome. Alzheimers Res Ther 5, 1 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Peter J. et al. Subgroups of Alzheimer’s disease: stability of empirical clusters over time. J Alzheimers Dis 42, 651–661 (2014). [DOI] [PubMed] [Google Scholar]

[R9] 9.Sweet R. A. et al. Effect of Alzheimer’s disease risk genes on trajectories of cognitive function in the Cardiovascular Health Study. Am J Psychiatry 169, 954–962 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Bellenguez C. et al. New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat. Genet. 54, 412–436 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Arnold S. E., Hyman B. T., Flory J., Damasio A. R. & Van Hoesen G. W. The topographical and neuroanatomical distribution of neurofibrillary tangles and neuritic plaques in the cerebral cortex of patients with Alzheimer’s disease. Cereb Cortex 1, 103–116 (1991). [DOI] [PubMed] [Google Scholar]

[R12] 12.Trejo-Lopez J. A., Yachnis A. T. & Prokop S. Neuropathology of Alzheimer’s Disease. Neurotherapeutics 19, 173–185 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Montine T. J. et al. National Institute on Aging-Alzheimer’s Association guidelines for the neuropathologic assessment of Alzheimer’s disease: a practical approach. Acta Neuropathol 123, 1–11 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Grothe M. J. et al. Molecular properties underlying regional vulnerability to Alzheimer’s disease pathology. Brain 141, 2755–2771 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Mathys H. et al. Single-cell atlas reveals correlates of high cognitive function, dementia, and resilience to Alzheimer’s disease pathology. Cell 186, 4365–4385.e27 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Graff-Radford J. et al. New insights into atypical Alzheimer’s disease in the era of biomarkers. Lancet Neurol. 20, 222–234 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Malek-Ahmadi M., Perez S. E., Chen K. & Mufson E. J. Neuritic and Diffuse Plaque Associations with Memory in Non-Cognitively Impaired Elderly. J Alzheimers Dis 53, 1641–1652 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Suzuki N., Hatta T., Ito M. & Kusakabe K.-I. Anti-amyloid-β Antibodies and Anti-tau Therapies for Alzheimer’s Disease: Recent Advances and Perspectives. Chem Pharm Bull (Tokyo) 72, 602–609 (2024). [DOI] [PubMed] [Google Scholar]

[R19] 19.Sims J. R. et al. Donanemab in Early Symptomatic Alzheimer Disease: The TRAILBLAZER-ALZ 2 Randomized Clinical Trial. JAMA 330, 512–527 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Chen S. et al. Lecanemab treatment for Alzheimer’s Disease of varying severities and associated plasma biomarkers monitoring: A multi-center real-world study in China. Alzheimers Dement 21, e70750 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.van Dyck C. H. et al. Lecanemab in Early Alzheimer’s Disease. N Engl J Med 388, 9–21 (2023). [DOI] [PubMed] [Google Scholar]

[R22] 22.Fox N. C. et al. Treatment for Alzheimer’s disease. Lancet 406, 1408–1423 (2025). [DOI] [PubMed] [Google Scholar]

[R23] 23.Paczynski M. et al. Lecanemab Treatment in a Specialty Memory Clinic. JAMA Neurol 82, 655–665 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.De Jager P. L. et al. A genome-wide scan for common variants affecting the rate of age-related cognitive decline. Neurobiol. Aging 33, 1017.e1–15 (2012). [Google Scholar]

[R25] 25.Kapasi A. et al. High-throughput digital quantification of Alzheimer disease pathology and associated infrastructure in large autopsy studies. J Neuropathol Exp Neurol 82, 976–986 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Liu Z. et al. Single-cell multiregion epigenomic rewiring in Alzheimer’s disease progression and cognitive resilience. Cell 188, 4980–5002.e29 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] 27.Sun N. et al. Human microglial state dynamics in Alzheimer’s disease progression. Cell 186, 4386–4403.e29 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Xiong X. et al. Epigenomic dissection of Alzheimer’s disease pinpoints causal variants and reveals epigenome erosion. Cell 186, 4422–4437.e21 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Dileep V. et al. Neuronal DNA double-strand breaks lead to genome structural variations and 3D genome disruption in neurodegeneration. Cell 186, 4404–4421.e20 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Mathys H. et al. Single-cell multiregion dissection of Alzheimer’s disease. Nature 632, 858–868 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Fujita M. et al. Cell subtype-specific effects of genetic variation in the Alzheimer’s disease brain. Nat Genet 56, 605–614 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Sun N. et al. Single-nucleus multiregion transcriptomic analysis of brain vasculature in Alzheimer’s disease. Nat Neurosci 26, 970–982 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Mathys H. et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Bennett D. A. et al. Apolipoprotein E epsilon4 allele, AD pathology, and the clinical expression of Alzheimer’s disease. Neurology 60, 246–252 (2003). [DOI] [PubMed] [Google Scholar]

[R35] 35.Bennett D. A., Schneider J. A., Tang Y., Arnold S. E. & Wilson R. S. The effect of social networks on the relation between Alzheimer’s disease pathology and level of cognitive function in old people: a longitudinal cohort study. Lancet Neurol 5, 406–412 (2006). [DOI] [PubMed] [Google Scholar]

[R36] 36.Wilson R. S. et al. Temporal course and pathologic basis of unawareness of memory loss in dementia. Neurology 85, 984–991 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Lake J. et al. Multi-ancestry meta-analysis and fine-mapping in Alzheimer’s disease. Mol. Psychiatry 28, 3121–3132 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Rajabli F. et al. Multi-ancestry genome-wide meta-analysis of 56,241 individuals identifies known and novel cross-population and ancestry-specific associations as novel risk loci for Alzheimer’s disease. Genome Biol. 26, 210 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Wightman D. P. et al. A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease. Nat. Genet. 53, 1276–1282 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Danner B. et al. Brain banking in the United States and Europe: Importance, challenges, and future trends. J Neuropathol Exp Neurol 83, 219–229 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Rush A. et al. The Experts Speak: Challenges in Banking Brain Tissue for Research. Biopreserv Biobank 22, 179–184 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Shepherd C. E., Alvendia H. & Halliday G. M. Brain Banking for Research into Neurodegenerative Disorders and Ageing. Neurosci Bull 35, 283–288 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Uffelmann E. et al. Genome-wide association studies. Nat. Rev. Methods Primers 1, (2021). [Google Scholar]

[R44] 44.Politi C., Roumeliotis S., Tripepi G. & Spoto B. Sample Size Calculation in Genetic Association Studies: A Practical Approach. Life (Basel) 13, (2023). [Google Scholar]

[R45] 45.Shade L. M. P. et al. GWAS of multiple neuropathology endophenotypes identifies new risk loci and provides insights into the genetic risk of dementia. Nat Genet 56, 2407–2421 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] 46.Cheruiyot E. K., Yang T. & McRae A. F. GWAS significance thresholds in large cohorts of European ancestry. Genetics 230, (2025). [Google Scholar]

[R47] 47.UK Biobank Whole-Genome Sequencing Consortium. Whole-genome sequencing of 490,640 UK Biobank participants. Nature 645, 692–701 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Lieb W. et al. Population-Based Biobanking. Genes (Basel) 15, (2024). [Google Scholar]

[R49] 49.Karczewski K. J. et al. Pan-UK Biobank genome-wide association analyses enhance discovery and resolution of ancestry-enriched effects. Nat. Genet. 57, 2408–2417 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.Bycroft C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Sudlow C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12, e1001779 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] 52.Canela-Xandri O., Rawlik K. & Tenesa A. An atlas of genetic associations in UK Biobank. Nat Genet 50, 1593–1599 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Sakaue S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.Choi S. W., Mak T. S.-H. & O’Reilly P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc 15, 2759–2772 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Kullo I. J. Clinical use of polygenic risk scores: current status, barriers and future directions. Nat Rev Genet 27, 246–263 (2026). [DOI] [PubMed] [Google Scholar]

[R56] 56.Nicolas A. et al. Transferability of European-derived Alzheimer’s disease polygenic risk scores across multiancestry populations. Nat Genet 57, 1598–1610 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] 57.Watanabe K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet 51, 1339–1348 (2019). [DOI] [PubMed] [Google Scholar]

[R58] 58.Jordan D. M., Verbanck M. & Do R. HOPS: a quantitative score reveals pervasive horizontal pleiotropy in human genetic variation is driven by extreme polygenicity of human traits and diseases. Genome Biol 20, 222 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] 59.Barbitoff Y. A., Bogaichuk P. M., Pavlova N. S., Malysheva P. V. & Predeus A. V. Functional Determinants and Evolutionary Consequences of Pleiotropy in Complex and Mendelian Traits. Mol Biol Evol 42, (2025). [Google Scholar]

[R60] 60.Jee Y. H. et al. Dissecting pleiotropy to gain mechanistic insights into human disease. Nat Rev Genet (2025) doi: 10.1038/s41576-025-00908-0. [DOI] [Google Scholar]

[R61] 61.Sollis E. et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res 51, D977–D985 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R62] 62.Truong B. et al. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. Cell Genom 4, 100523 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R63] 63.Sinnott-Armstrong N. et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nat. Genet. 53, 185–194 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R64] 64.Li C., Yang C., Gelernter J. & Zhao H. Improving genetic risk prediction by leveraging pleiotropy. Hum Genet 133, 639–650 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R65] 65.Tian X. et al. PRISM: ancestry-aware integration of tissue-specific genomic annotations enhances the transferability of polygenic scores. bioRxiv (2025) doi: 10.1101/2025.11.13.688144. [DOI] [Google Scholar]

[R66] 66.Zheng Z. et al. Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries. Nat. Genet. 56, 767–777 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R67] 67.Tanigawa Y. & Kellis M. Power of inclusion: Enhancing polygenic prediction with admixed individuals. Am J Hum Genet 110, 1888–1902 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R68] 68.Tanigawa Y. et al. Significant sparse polygenic risk scores across 813 traits in UK Biobank. PLoS Genet. 18, e1010105 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R69] 69.Bennett D. A. et al. Religious Orders Study and Rush Memory and Aging Project. J. Alzheimers. Dis. 64, S161–S189 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R70] 70.Vialle R. A. et al. Structural variants linked to Alzheimer’s disease and other common age-related clinical and neuropathologic traits. Genome Med 17, 20 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R71] 71.Oveisgharan S. et al. Proteins linking APOE ε4 with Alzheimer’s disease. Alzheimers Dement 20, 4499–4511 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R72] 72.Bennett D. A. et al. Decision rules guiding the clinical diagnosis of Alzheimer’s disease in two community-based cohort studies compared to standard practice in a clinic-based cohort study. Neuroepidemiology 27, 169–176 (2006). [DOI] [PubMed] [Google Scholar]

[R73] 73.Schneider J. A., Arvanitakis Z., Leurgans S. E. & Bennett D. A. The neuropathology of probable Alzheimer disease and mild cognitive impairment. Ann Neurol 66, 200–208 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R74] 74.Nag S. et al. TDP-43 pathology in anterior temporal pole cortex in aging and Alzheimer’s disease. Acta Neuropathol. Commun. 6, 33 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R75] 75.Gower J. C., Lubbe S. G. & Le Roux N. J. Understanding Biplots. (John Wiley & Sons, 2011). [Google Scholar]

[R76] 76.Gabriel K. R. The biplot graphic display of matrices with application to principal component analysis. Biometrika 58, 453–467 (1971). [Google Scholar]

[R77] 77.Logsdon B. AMP AD target discovery data portal. Synapse; 10.7303/SYN2580853 (2015). [DOI] [Google Scholar]

[R78] 78.Sherry S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R79] 79.Church D. M. et al. Modernizing reference genome assemblies. PLoS Biol. 9, e1001091 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R80] 80.Rayner N. W., Robertson N., Mahajan A. & McCarthy M. I. A Suite Of Programs For Pre- And Postimputation Data Checking. in American Society of Human Genetics Posters (2016). [Google Scholar]

[R81] 81.McCarthy S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48, 1279–1283 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R82] 82.Das S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R83] 83.Loh P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. bioRxiv (2016) doi: 10.1101/052308. [DOI] [Google Scholar]

[R84] 84.Fuchsberger C., Abecasis G. R. & Hinds D. A. Minimac2: Faster genotype imputation. Bioinformatics 31, 782–784 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R85] 85.1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R86] 86.Cavalli-Sforza L. L. The Human Genome Diversity Project: past, present and future. Nat. Rev. Genet. 6, 333–340 (2005). [DOI] [PubMed] [Google Scholar]

[R87] 87.COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R88] 88.Qian J. et al. A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank. PLoS Genet. 16, e1009141 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R89] 89.Chang C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R90] 90.Benjamini Y. & Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995). [Google Scholar]

[R91] 91.Skoog I. et al. A Non-APOE Polygenic Risk Score for Alzheimer’s Disease Is Associated With Cerebrospinal Fluid Neurofilament Light in a Representative Sample of Cognitively Unimpaired 70-Year Olds. J Gerontol A Biol Sci Med Sci 76, 983–990 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R92] 92.Ware E. B., Faul J. D., Mitchell C. M. & Bakulski K. M. Considering the APOE locus in Alzheimer’s disease polygenic scores in the Health and Retirement Study: a longitudinal panel study. BMC Med Genomics 13, 164 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R93] 93.Bakulski K. M. et al. A non-APOE Polygenic score for Alzheimer’s disease and APOE-ε4 have independent associations with dementia in the Health and Retirement Study. bioRxiv (2020) doi: 10.1101/2020.02.10.20021667. [DOI] [Google Scholar]

[R94] 94.Leonenko G. et al. Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores. Nat Commun 12, 4506 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R95] 95.Norgren J., Sindi S., Matton A., Kivipelto M. & Kåreholt I. APOE-Genotype and Insulin Modulate Estimated Effect of Dietary Macronutrients on Cognitive Performance: Panel Analyses in Nondiabetic Older Adults at Risk of Dementia. J Nutr 153, 3506–3520 (2023). [DOI] [PubMed] [Google Scholar]

[R96] 96.Buniello A. et al. Open Targets Platform: facilitating therapeutic hypotheses building in drug discovery. Nucleic Acids Res 53, D1467–D1475 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R97] 97.Tanigawa Y., Dyer E. S. & Bejerano G. WhichTF is functionally important in your open chromatin data? PLoS Comput Biol 18, e1010378 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R98] 98.McLean C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 28, 495–501 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R99] 99.Ashburner M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–29 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R100] 100.Gene Ontology Consortium et al. The Gene Ontology knowledgebase in 2023. Genetics 224, (2023). [Google Scholar]

[R101] 101.Pennington J., Socher R. & Manning C. D. GloVe: Global Vectors for Word Representation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014). doi: 10.3115/v1/D14-1162. [DOI] [Google Scholar]

[R102] 102.Carlson R., Bauer J. & Manning C. D. A new pair of GloVes. arXiv [cs.CL] (2025) doi: 10.48550/ARXIV.2507.18103. [DOI] [Google Scholar]

[R103] 103.Parker R., Graff D., Kong J., Chen K. & Maeda K. English Gigaword Fifth Edition. Linguistic Data Consortium; 10.35111/WK4F-QT80 (2011). [DOI] [Google Scholar]

[R104] 104.Chen T. & Guestrin C. XGBoost: A Scalable Tree Boosting System. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (2016). doi: 10.1145/2939672.2939785. [DOI] [Google Scholar]

[R105] 105.Mann H. B. & Whitney D. R. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann. Math. Stat. 18, 50–60 (1947). [Google Scholar]

[R106] 106.Yao H.-F. et al. CASC8 activates the pentose phosphate pathway to inhibit disulfidptosis in pancreatic ductal adenocarcinoma though the c-Myc-GLUT1 axis. J Exp Clin Cancer Res 44, 26 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R107] 107.Wu Q. et al. The m6A-induced lncRNA CASC8 promotes proliferation and chemoresistance via upregulation of hnRNPL in esophageal squamous cell carcinoma. Int J Biol Sci 18, 4824–4836 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R108] 108.Sang Y. et al. Long non-coding RNA CASC8 polymorphisms are associated with the risk of esophageal cancer in a Chinese population. Thorac Cancer 11, 2852–2857 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R109] 109.Darabi S., Gorgich E. A. C., Moradi F. & Rustamzadeh A. Lipidopathy disrupts peripheral and central amyloid clearance in Alzheimer’s disease: Where are our knowledge. IBRO Neurosci Rep 18, 191–199 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R110] 110.Di Paolo G. & Kim T.-W. Linking lipids to Alzheimer’s disease: cholesterol and beyond. Nat. Rev. Neurosci. 12, 284–296 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R111] 111.Yang L. G., March Z. M., Stephenson R. A. & Narayan P. S. Apolipoprotein E in lipid metabolism and neurodegenerative disease. Trends Endocrinol. Metab. 34, 430–445 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R112] 112.Wang H. et al. Regulation of beta-amyloid production in neurons by astrocyte-derived cholesterol. Proc. Natl. Acad. Sci. U. S. A. 118, e2102191118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R113] 113.Morgado I. & Garvey M. Lipids in amyloid-β processing, aggregation, and toxicity. Adv. Exp. Med. Biol. 855, 67–94 (2015). [DOI] [PubMed] [Google Scholar]

[R114] 114.Sprenger K. G., Lietzke E. E., Melchior J. T. & Bruce K. D. Lipid and lipoprotein metabolism in microglia: Alzheimer’s disease mechanisms and interventions. J. Lipid Res. 66, 100872 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R115] 115.Boix C. A., James B. T., Park Y. P., Meuleman W. & Kellis M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R116] 116.Shi C. et al. Multifactorial Diseases of the Heart, Kidneys, Lungs, and Liver and Incident Cancer: Epidemiology and Shared Mechanisms. Cancers (Basel) 15, (2023). [Google Scholar]

[R117] 117.Ising C. et al. NLRP3 inflammasome activation drives tau pathology. Nature 575, 669–673 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R118] 118.Dutta D. et al. Tau fibrils induce glial inflammation and neuropathology via TLR2 in Alzheimer’s disease-related mouse models. J Clin Invest 133, (2023). [Google Scholar]

[R119] 119.Langworth-Green C. et al. Chronic effects of inflammation on tauopathies. Lancet Neurol 22, 430–442 (2023). [DOI] [PubMed] [Google Scholar]

[R120] 120.Sanderson E. et al. Mendelian randomization. Nat Rev Methods Primers 2, (2022). [Google Scholar]

[R121] 121.Richmond R. C. & Davey Smith G. Mendelian Randomization: Concepts and Scope. Cold Spring Harb Perspect Med 12, (2022). [Google Scholar]

[R122] 122.Lambert S. A. et al. Enhancing the Polygenic Score Catalog with tools for score calculation and ancestry normalization. Nat Genet 56, 1989–1994 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

This is a preprint.

Dissecting Alzheimer’s disease heterogeneity by cross-trait polygenic prediction

William F Li

Nabil Mohammed

David A Bennett

Manolis Kellis

Yosuke Tanigawa

Abstract

Introduction

Methods

Compliance with ethical regulations and informed consent

ROSMAP phenotype selection and processing

ROSMAP genotype quality control, liftOver, and imputation

Polygenic scoring of ROSMAP individuals

Transferability analysis of UKB AD PGS for prediction within ROSMAP

Covariates used for cross-trait association analysis

Two-stage cross-trait polygenic association

Figure 2. Cross-cohort cross-trait application of UKB PGS to ROSMAP AD phenome reveals genetic basis for phenotypic heterogeneity in AD.

APOE-exclusion association analysis

Figure 3. The effect of the APOE allele on cross-trait associations between PGS and observed AD phenotypes.

Biological characterization of PGS variants

Figure 4. Genomic loci in prioritized PGS models are enriched for relevant biological processes in Gene Ontology (GO).

Multiple-PGS predictive modeling and principal component genetic feature computation

Figure 5. Prioritized PGS inform phenotypic heterogeneity in AD.

Table 1.

PGS-based individual subtyping

Results

Study design to investigate the genetic basis of AD phenotypic heterogeneity

Figure 1. Overview of the study.

Cross-trait UKB PGS associations to ROSMAP phenotypes reveal the genetic basis of phenotypic heterogeneity in AD

Distinguishing APOE-dependent and independent genetic correlations

Analysis of top variants for each PGS reveals enriched biological processes

Multiple-PGS models enhance the prediction of multiple AD-relevant phenotypes

Identified PGS characterize heterogeneity at the individual level in an aging and dementia cohort

Discussion

Supplementary Material

Acknowledgments

Footnotes

Data and Code Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases