Epigenomic subtypes of late-onset Alzheimer’s disease reveal distinct microglial signatures

Valentin T Laroche; Rachel Cavill; Morteza Kouhsar; Joshua Müller; Rick A Reijnders; Joshua Harvey; Adam R Smith; Jennifer Imm; Jarno Koetsier; Luke Weymouth; Lachlan MacBean; Giulia Pegoraro; Lars Eijssen; Byron Creese; Gunter Kenis; Betty M Tijms; Daniel van den Hove; Katie Lunnon; Ehsan Pishva

doi:10.21203/rs.3.rs-7232080/v1

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Aug 4:rs.3.rs-7232080. [Version 1] doi: 10.21203/rs.3.rs-7232080/v1

Epigenomic subtypes of late-onset Alzheimer’s disease reveal distinct microglial signatures

Valentin T Laroche ¹, Rachel Cavill ², Morteza Kouhsar ³, Joshua Müller ⁴, Rick A Reijnders ⁵, Joshua Harvey ⁶, Adam R Smith ⁷, Jennifer Imm ⁸, Jarno Koetsier ⁹, Luke Weymouth ¹⁰, Lachlan MacBean ¹¹, Giulia Pegoraro ¹², Lars Eijssen ¹³, Byron Creese ¹⁴, Gunter Kenis ¹⁵, Betty M Tijms ¹⁶, Daniel van den Hove ¹⁷, Katie Lunnon ¹⁸, Ehsan Pishva ¹⁹

PMCID: PMC12340906 PMID: 40799738

Abstract

Growing evidence suggests that clinical, pathological, and genetic heterogeneity in late onset Alzheimer’s disease (LOAD) contributes to variable therapeutic outcomes, potentially explaining many trial failures. Advances in molecular subtyping through proteomic and transcriptomic profiling reveal distinct patient subgroups, highlighting disease complexity beyond amyloid-beta plaques and tau tangles. This underscores the need to expand subtyping across new molecular layers, to identify novel drug targets for different patient subgroups.

In this study, we analyzed genome-wide DNA methylation (DNAm) data from three independent postmortem brain cohorts (N = 831) to identify epigenetic subtypes of LOAD. Unsupervised clustering approaches were employed to identify distinct DNAm patterns, with subsequent cross-cohort validation. We assessed how subtype-specific methylation signatures map onto individual brain cell types by comparing them with DNAm profiles from purified cells. Next, we integrated bulk and single-cell RNA-seq data to determine each subtype’s functional impact on gene expression. Finally, we explored clinical and neuropathological correlates of the identified subtypes to elucidate biological and clinical significance.

We identified two distinct epigenomic subtypes of LOAD, consistently observed across three cohorts. Both subtypes exhibit significant yet distinct microglial methylation enrichment. Bulk transcriptomic analyses further highlighted distinct biological mechanisms underlying these subtypes: subtype 1 was enriched for immune-related processes, while subtype 2 was characterized by neuronal and synaptic pathways. Single-cell transcriptional profiling of microglia revealed subtype-specific inflammatory states: subtype 1 displayed chronic innate immune hyperactivation with impaired resolution, whereas subtype 2 exhibited a more dynamic inflammatory profile, balancing pro-inflammatory signaling with reparative and regulatory mechanisms.

These findings reveal distinct epigenetic and functional microglial states underlying LOAD subtypes, advancing our understanding of disease heterogeneity. This work lays the groundwork for targeted therapeutic strategies tailored to specific molecular and cellular disease profiles.

Keywords: Alzheimer’s disease, Epigenetics, DNA methylation, Subtyping, Microglia

Introduction

Late-onset Alzheimer’s disease (LOAD) is the most common form of dementia, typically developing after the age of 65 [1]. It primarily affects memory, cognitive function, and behavior. This condition is characterized by the gradual accumulation of amyloid-beta (Aβ) plaques and neurofibrillary tangles (NFTs) in the brain, resulting in the progressive destruction of neurons and brain atrophy [2]. However, significant heterogeneity has been observed among individuals with LOAD in terms of disease onset, progression, symptom variability, spreading of pathology, and their response to treatment [3–5].

In recent years, molecular subtyping of AD has increasingly leveraged omics data to identify distinct molecular profiles that may contribute to the observed heterogeneity in disease manifestation and mechanisms. This has been particularly transformative as traditional classification methods based on clinical or pathological features alone have proven insufficient. A recent study that used mass spectrometry proteomics to analyze CSF protein profiles identified five distinct molecular subtypes of AD [6]. These subtypes highlight various underlying mechanisms, including neuronal plasticity, innate immune activation, RNA dysregulation, and dysfunction in the choroid plexus and blood-brain barrier. Another study, which examined the molecular heterogeneity of AD across multiple brain regions, identified five major AD subtypes using transcriptomic data [7]. Importantly, the immune-related and synaptic pathway subtypes predicted in the CSF proteomic are consistent with the transcriptomic-based subtypes of AD. Additionally, the transcriptomic findings revealed subtypes associated with protein metabolism and the upregulation of organic acid–related genes, further enriching our understanding of LOAD’s diverse molecular landscape.

Epigenetic modifications, such as DNA methylation (DNAm) and histone modifications, can alter gene expression without changing the DNA sequence. These modifications are influenced by both genetic predisposition and environmental factors, exerting diverse effects among individuals. Owing to these characteristics, epigenetic modifications serve as valuable molecular markers for identifying the drivers of disease heterogeneity. This is particularly relevant in the context of LOAD, where a complex interplay of genetic, lifestyle, and environmental factors are believed to contribute to the onset, progression, and variability of symptoms [1]. Moreover, an increasing body of evidence suggests that modifiable lifestyle factors, such as physical activity, may delay the onset of dementia [8]. Notably, our previous study highlighted that variation in methylation profile scores associated with lifestyle and environmental factors such as physical inactivity and low educational attainment as predictive markers for the prospective onset of dementia [9]. However, it remains unknown whether distinct DNAm subgroups exist within AD, which could provide further insights into disease heterogeneity.

In the present study, we used genome-wide DNAm data from three independent postmortem brain cohorts to identify epigenomic-based subtypes of LOAD. We employed data driven methods to identify molecular subtypes of LOAD based on the highest degree of similarity in methylomic patterns within each cohort. The subtypes were further validated through rigorous cross-cohort comparisons using multiple clustering algorithms. To better understand the complex biological mechanisms underlying the observed heterogeneity, we conducted a comprehensive characterization of predicted subtypes at cell-type-specific DNAm levels. Additionally, we examined AD risk genes as potential contributors to methylomic heterogeneity and identified transcriptomic correlates for each subtype using both bulk and single-cell RNA sequencing data.

Materials and methods

Brain samples

We analyzed 840 cortical postmortem brain samples obtained from three independent cohorts to investigate epigenomic-based heterogeneity in LOAD. We included 249 prefrontal cortex (PFC) samples from the UK Brain Banks Network (UKBBN), and 220 PFC samples from the University of Pittsburgh’s Alzheimer’s Disease Research Center (PITT-ADRC) [10], and 371 dorsolateral PFC (DLPFC) samples from the Religious Orders Study and Memory and Aging Project (ROSMAP) [11]. Samples were selected from donors with no or minimal AD pathology and definite AD (see Diagnostic Criteria).

Diagnostic Criteria

The diagnosis of LOAD and control samples was established based on clinical and neuropathological data from donors aged over 65 at the time of death. Dementia was defined by an antemortem Mini-Mental State Examination (MMSE) score below 24 or a Clinical Dementia Rating (CDR) score of ≥ 1. Within the dementia group, samples were classified as AD if they met the criteria of a “probable” or “definite” Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) score for neuritic plaques and a Braak neurofibrillary tangle (NFT) stage of ≥ 3. In cases where CERAD data were unavailable, samples with a Braak stage of ≥ 5 were included in the AD group.

Control samples were defined using stringent criteria specific to each brain bank. They were required to have a Braak stage of 0–2 for NFT pathology and negative for amyloid (CERAD neuritic plaque score 0). Additionally, when available from brain banks, both AD and control samples were screened for co-occurring other tau pathologies, including argyrophilic grain disease, as well as alpha-synuclein and TDP-43 pathologies. Only samples negative for these co-pathologies, with none to moderate cerebrovascular disease and rare microinfarcts, were included. Furthermore, samples from donors with known systemic hematopoietic malignancies, such as leukemia, were excluded to prevent the inclusion of malignant cells circulating in the brain.

Methylomic profiling and data harmonization

For the UKBBN and PITT-ADRC cohorts, genomic DNA was extracted from 30 mg of fresh-frozen tissue using the AllPrep DNA/RNA/miRNA Universal Kit (QIAGEN), followed by bisulfite treatment with the EZ-96 DNA Methylation-Gold Kit (Zymo Research). The treated DNA was analyzed using Illumina’s Infinium MethylationEPIC v1.0 BeadChip arrays, quantifying DNAm at over 850,000 CpG sites. For the ROSMAP cohort, Illumina 450K methylation raw data files were obtained from the Accelerated Medicine Partnership (AMP-AD) portal (synID: syn7357283). Uniform and stringent quality control (QC) and normalization procedures were then applied across all three cohorts.

The raw methylation IDAT files were imported into R (version 4.3.0) using the ‘readEPIC’ function from the wateRmelon R package (v.2.6.0) [12], generating “MethylumiSet” objects. For quality control (QC), samples with a median signal intensity below 1000 were excluded. Additional outliers were removed using the ‘outlyx’ function, which detects anomalies based on the Inter-Quartile Range (IQR) and Mahalanobis distance. Bisulfite conversion efficiency was assessed with the ‘bscon’ function, discarding samples with efficiency below 0.8. Low-quality samples and probes were filtered using the ‘pfilter’ function, removing samples with a detection P-value > 0.05 in more than 5% of probes and CpGs with fewer than three bead counts in 5% of samples or a detection P-value > 0.05 in 1% of samples.

Methylation sex was estimated using the ‘estimateSex’ function based on CpG sites on chromosomes X and Y, and any mismatches with true gender were excluded. Data normalization was performed using ‘BMIQ’ (Beta-Mixture Quantile intra-sample normalization), preserving inter-sample variance. Probe-Wise Outlier Detection (PWOD) via the ‘homonymous’ function removed probes associated with known variants to prevent downstream biases. Additional filtering excluded probes with SNP IDs, cross-hybridizing probes [13], and those on sex chromosomes.

Cell type composition in bulk PFC DNAm data was computed using CETYGO (v0.99.0) [14], employing a reference panel to distinguish inhibitory (GABAergic) neurons, excitatory (glutamatergic) neurons, oligodendrocytes, microglia, and astrocytes. To account for mutual technical and biological effects (e.g., age, sex, postmortem interval, plates, cell composition, and cohort-specific batches), a linear multiple regression model was applied to each methylation probe. Non-variant probes were filtered using the 90th percentile absolute deviation, resulting in 325,834 CpGs from EPIC arrays for subtyping. In the ROSMAP cohort, the CpG set was defined by the overlap between the processed 450K array and the available 450K probes within the final EPIC CpGs, yielding 148,951 CpGs.

Following pre-processing and QC, the final dataset included 240 UKBBN samples (129 LOAD, 111 controls), 220 PITT-ADRC samples (192 LOAD, 28 controls), and 371 ROSMAP samples (248 LOAD, 123 controls). This dataset, detailed in Supplementary Table 1, integrates diverse demographics, clinical profiles, and neuropathological information.

Bulk RNA sequencing

Total RNA quality in the UKBBN and PITT-ADRC cohorts was assessed using the Agilent 4200 TapeStation System, with samples having an RNA Integrity Number (RIN) below 3 excluded. Library preparation was conducted using the Illumina Stranded mRNA Prep kit, followed by sequencing on the NovaSeq 6000 platform at the Exeter Sequencing Service. Raw sequencing data in FASTQ format underwent quality trimming to remove low quality bases and adapter sequences. Processed reads were aligned to the human reference genome (GRCh37) using the STAR aligner (v2.7.3a) [15] with default parameters. After alignment, a raw count matrix was generated, and genes with low expression, defined as counts below 10 in over 80% of samples. Gene counts were normalized using the Trimmed Mean of M-values (TMM) method from the edgeR package (v3.42.4) [16]. The normalized counts were then transformed into log-transformed counts per million (logCPM) to stabilize variance and improve interpretability. Following quality control and processing, the RNA sequencing dataset comprised 483 cortical samples from three independent postmortem brain cohorts: UKBBN, PITT-ADRC, and ROSMAP. The dataset included 174 samples classified as LOAD-S1, with 63 from UKBBN, 57 from PITT-ADRC, and 54 from ROSMAP. LOAD-S2 consisted of 107 samples, with 45 from UKBBN, 20 from PITT-ADRC, and 42 from ROSMAP. The control group contained 202 samples, including 109 from UKBBN, 25 from PITT-ADRC, and 68 from ROSMAP.

Genotyping, imputation, and generation of polygenic scores

Genotyping was conducted using the Illumina Global Screening Array (GSA) for the UKBBN and PITT-ADRC cohorts. For the ROSMAP cohort, genotyping data from the Affymetrix GeneChip 6.0 (Affymetrix, Inc.) and the Illumina HumanOmniExpress chip array were obtained from (https://www.synapse.org/). Quality control was performed using PLINK (v2.0), excluding samples with > 5% missing values and SNPs with > 1% missing values. Additionally, SNPs with a minor allele frequency (MAF) < 0.05 and a Hardy-Weinberg equilibrium p-value < 1.0 × 10 were filtered out to retain the most reliable signals.

Processed genotype data were merged with the 1000 Genomes Project dataset, and principal component analysis (PCA) was performed to determine sample ethnicity. Samples of non-European ancestry were removed based on PCA results. The quality-controlled genotype data were uploaded to the Michigan Imputation Server and imputed using Minimac4 with the 1000 Genomes reference panel (phase 3, version 5). The imputed data for each chromosome were compiled into a unified VCF file using BCFtools (v1.9) [17] with tri-allelic SNPs removed using VCFtools (v0.1.16) [18]. The final VCF file was converted into PLINK binary format.

Polygenic scores (PGS) were generated using the imputed genotype data and PRSice-2 software [19]. Genetic variants associated with AD were selected based on the genome-wide association study (GWAS) by Bellenguez et al., 2022 [20].

Data analysis

Clustering algorithms

To identify cortical brain subgroups with similar DNAm profiles, agglomerative hierarchical clustering using the Ward D2 method with Euclidean distance and K-means clustering as a nonhierarchical approach were applied to LOAD samples, with each cohort analyzed independently. The optimal number of clusters for both methods was determined using the Elbow method. To assess clustering consistency, Normalized Mutual Information (NMI) analysis was performed for each cohort using the Aricode R package (v1.0.3) [21] (https://github.com/jchiquet/aricode), comparing hierarchical and K-means clustering results. The cluster number with the highest NMI value was selected for further analysis.

To assess cluster specificity, Sparse Partial Least Squares Discriminant Analysis (sPLSDA) was performed using the mixOmics package (v6.24.0) [22] for each clustering algorithm to identify the most discriminative features. From each sPLS-DA model, 12,000 features corresponding to six components were extracted. To evaluate robustness, cluster labels were randomized 10 times across population proportions ranging from 100–1%, followed by repeated sPLS-DA modeling for each iteration. The overlap between the 12,000 extracted features from each iteration and those from the original cluster labels was then examined. Technical batch effects, including plate and chip variations, were assessed using Spearman’s correlation test for continuous variables and Normalized Mutual Information (NMI) for discrete variables, the latter being particularly suited for evaluating clustering outcomes.

To verify the generalizability of findings across Illumina EPIC arrays (UKBBN and PITT-ADRC cohorts) and 450K arrays (ROSMAP cohort), Weighted Gene Co-expression Network Analysis (WGCNA) was employed. Co-expression networks were constructed using the ‘blockwiseModules’ function in the WGCNA R package (v1.72–5) [23] based on selected EPIC array probes. Module eigenprobes were then calculated using the ‘moduleEigengenes’ function, and their associations with categorical cluster labels were assessed through ANOVA followed by post hoc Tukey’s tests. For modules showing significant differences between identified clusters, eigenprobes were recalculated using the subset of probes available on the 450K array. Cluster associated co-methylated probe preservation was confirmed by demonstrating that eigenvalue differences between cluster labels remained significant with the 450K probes and that eigenvalue differences for specific subtypes across both platforms were non-significant.

Cross-cohort replication

To identify matching clusters with maximum similarity in DNAm profiles across the UKBBN, PITT-ADRC, and ROSMAP cohorts, we employed two complementary approaches at both the probe and sample levels.

In the first approach, we calculated the median DNAm value per probe for each clustered brain sample within each cohort. Pearson correlation tests were then used to assess the relationship between these median values across the three cohorts.

In the second approach, the UKBBN cohort was designated as the discovery dataset. Using the ‘sPLSDA’ function from the mixOmics package (v6.24.0) [22], we reduced data into a lower dimensional latent space while preserving class separation. The first two latent spaces of sPLSDA were mapped against the clusters identified in this cohort. Convex hulls were established around each UKBBN cluster by connecting the outermost points. DNAm profiles from a replication cohort (either PITT-ADRC or ROSMAP) were then projected onto these latent spaces, and the number of replication cohort samples falling within the UKBBN cluster hulls was recorded. To assess significance, contingency tables were constructed, and a hypergeometric test was performed to determine overlaps between the discovery and replication cohort clusters. This procedure was repeated with PITT-ADRC and ROSMAP serving as discovery sets, while the other two cohorts acted as replication cohorts. An epigenomic-based LOAD subtype was confirmed only when both approaches validated the DNAm profile similarities across all three cohorts, at both the probe and sample levels, using two clustering methods.

Subtype-specific epigenome-wide association analysis

Epigenome-wide association studies (EWASs) were conducted using multiple linear regression models to identify differentially methylated positions (DMPs) associated with each predicted subtype in each cohort. Methylation data were rigorously adjusted for potential confounders, including age, sex, cell type composition, brain banks, and plates, ensuring consistency with previous clustering steps. Surrogate variable analysis (SVA) was applied to adjust for unwanted variations unrelated to the main outcome, further refining accuracy. P-value inflation was assessed using the inflation index for each EWAS and estimates and standard errors were corrected if needed using the Bacon package in R (v1.28.0) [24] before proceeding to the meta-analysis. The meta-analysis was performed using the inverse variance method via the rma.uni function in the metafor R package (v4.6–0) [25]. Subtype-specific DNAm signatures were identified using relaxed criteria. Specifically, DMPs were required to have a Bonferroni adjusted p-value under 0.05. The overlap between DMPs identified across subtypes in the meta-analysis was assessed using the geneOverlap R package (v1.40.0) (http://shenlab-sinai.github.io/shenlab-sinai/).

Cell-type specific DNAm enrichment analysis

To investigate the cell type specificity of the methylation signatures associated with the identified subtypes, we employed the CEAM (Cell-type Enrichment Analysis for Methylation) framework developed by Müller et al. [26] (https://um-dementia-systems-biology.shinyapps.io/CEAM/). CEAM uses fluorescence-activated nuclei sorting (FANS) derived methylation profiles from purified brain cell populations, neurons, oligodendrocytes, microglia, and astrocytes, to construct cell type-specific CpG sets. These CpG sets are stratified into three specificity levels, ranging from uniquely cell type-specific CpGs to those shared across multiple cell types, enabling nuanced enrichment analyses. For each LOAD subtype, meta-analysis-derived DMPs were independently assessed against these CpG panels.

Colocalization analysis

To investigate whether subtype-specific methylation signals share a common genetic basis with AD genetic risk loci, we performed colocalization analysis using methylation quantitative trait loci (mQTL) data from the ROSMAP cohort, accessed via Brain xQTLServe (https://mostafavilab.stat.ubc.ca/xqtl/). As input, we extracted cis-mQTLs (p < 1.0 × 10⁻⁵) associated with DMPs identified in the subtype-specific EWASs, using the suggestive nominal p-value threshold of 1.0 × 10⁻⁵. Bayesian colocalization was conducted using the coloc R package (v5.2.3) [27] for all pairwise comparisons between genomic regions associated with AD risk (as identified by Bellenguez et al.) [20] and variants linked to LOAD subtype DMPs. The coloc.abf function was used, and colocalization was considered significant when the combined posterior probability of hypotheses H3 and H4 exceeded 0.90.

Bulk transcriptomic analysis

Differential expression analysis (DEA) was performed using the Limma R package (v3.55.10) [28], comparing LOAD subtypes to their respective controls across the three cohorts. To account for potential confounders, the analysis included covariates such as age, sex, RIN, brain banks, and surrogate variables, which were incorporated to adjust for unmeasured variation, including differences in cell type composition. Bacon correction was applied to estimates and standard errors to reduce bias. Genes were considered significantly differentially expressed if they showed a nominal p-value below 0.05 in all cohorts, a fold change greater than 1.5 in at least one cohort, and a consistent direction of effect across all cohorts. The lenient p-value threshold was selected to balance sensitivity and reproducibility by emphasizing consistent signals across datasets while avoiding false positives from single-cohort noise. The fold-change cutoff, however, ensures biological relevance, making this approach suitable for exploratory downstream subtype analyses at the transcriptional level.

Single-cell transcriptomic analysis

Subtype-specific analyses of gene expressions across 12 microglial states were conducted using single cell RNA sequencing data from the ROSMAP study. This dataset comprises 427 samples, including 70 definite AD cases and 64 control samples, overlapping with the subset of ROSMAP data utilized in this project. Subtype-specific differentially expressed genes (DEGs) were identified for each of the 12 microglial states characterized by Sun et al.[29].

The muscat R package (v1.14.0) was used with default parameters. Non-expressed or lowly expressed genes were filtered out, and outlier detection was performed at the cell level. To generate a total expression value for each sample, cells were aggregated in a pseudobulk manner by summing RNA expression from all cells within each microglial state. Pseudobulk differential DEA was then conducted for each microglial state, comparing each subtype both to healthy controls and to one another. Genes that were significantly differentially expressed in subtype-to-subtype comparisons, and that were also differentially expressed versus controls in at least one subtype were subjected to GO enrichment analysis. For this analysis, a more lenient significance threshold of p < 0.05 was chosen to broaden the candidate gene set and uncover subtler, potentially biologically relevant pathways.

GO enrichment analysis

GO enrichment analysis was conducted using the clusterProfiler R package (v4.12.6) [30] to identify biological processes, molecular functions, and cellular components significantly associated with DEGs for each subtype. For subtype-specific microglial transcriptomic signatures, enrichment analysis focused exclusively on GO terms related to microglial activity, immune responses, and pro- and anti-inflammatory states [31].

Results

Data-driven clustering

The clustering analysis was conducted on 325,834 CpGs from EPIC arrays and 148,951 CpGs from 450K arrays, following stringent preprocessing and filtering steps. Clustering algorithms were applied exclusively to samples with a definite LOAD diagnosis, initially identifying three to five optimal clusters across the three cohorts when evaluated using the Elbow method (Supplementary Fig. 1). Further validation with the NMI test demonstrated high similarity between hierarchical and K-means clustering methods, with the strongest agreement observed when selecting three clusters (Supplementary Table 2). To maintain consistency across cohorts, clusters within each dataset were randomly labeled as A, B, and C.

The specificity of the detected clusters was further validated by randomizing cluster labels across varying proportions of the sample populations. Minimal overlap was observed between the features distinguishing the original clusters and those identified under randomized labels, particularly when randomization was applied to the entire dataset. This finding indicates that the original clusters are distinct and unlikely to result from random assignment, reinforcing the robustness and biological relevance of the detected clusters (Supplementary Fig. 2).

No significant association was observed between the identified clusters and known technical batch effects, including plate variations and brain sample sources (Supplementary Table 3). This confirms the reliability of the clustering results, suggesting that the detected clusters are more likely to reflect true biological differences rather than procedural inconsistencies. The distribution of samples assigned to each cluster using both clustering methods across the three cohorts is reported in Supplementary Table 4.

Illumina EPIC arrays were used for DNAm quantification in the UKBBN and PITTADRC cohorts, covering over 850K CpG sites, while the ROSMAP cohort utilized the 450K array. To assess the generalizability of findings across these two array types, WGCNA was applied. The significant relationship between module eigenvalues and the three identified clusters, as determined by ANOVA, remained preserved when reconstructing modules using only the subset of CpGs available on 450K arrays. Additionally, no significant changes in module eigenvalues within the same cluster were observed when transitioning from EPIC to 450K arrays, supporting the robustness and cross platform reproducibility of the clustering results (Supplementary Figs. 3–6).

Cross-cohort replication of epigenomic-based subtypes

Cross-cohort replication analyses identified two distinct subtypes of LOAD based on DNAm profiles measured in bulk cortical brain samples. Replication was conducted using two complementary strategies at the CpG and sample levels.

CpG-level replication assessed the correlation between median DNAm values per CpG for each data-driven cluster across the three cohorts. This analysis revealed two distinct blocks of correlated clusters: the first block included UKBBN cluster A, PITT-ADRC cluster B, and ROSMAP cluster A, while the second block comprised UKBBN cluster B, PITT-ADRC cluster C, and ROSMAP cluster B. Detailed heatmap plots illustrating the associations between clusters using both methods are provided in Fig. 1a.

Cross-cohort generalizability of DNAm-based clusters. (a) Visual representation of the correlation-based replication of the identified clusters across three cohorts using Hierarchical (Left), and K-means (Right) methods. The first block (LOADS1) includes UKBBN cluster A, PITT-ADRC cluster B, and ROSMAP cluster A, while the second block (LOAD-S2) includes UKBBN cluster B, PITT-ADRC cluster C, and ROSMAP cluster B. (b) Spatial overlap analysis showing cluster assignments across the three cohorts. In this example, the clusters were confirmed through iterative projections, where UKBBN cohort’s first two latent spaces were projected onto the PTT-ADRC and ROSMAP to verify spatial overlap across datasets.

The second replication analysis examined the spatial overlap of the distinct clusters across the three cohorts using latent spaces. Matching clusters across independent cohorts were confirmed by iteratively swapping the study samples, where the latent spaces from one cohort were used to project the other two. Notably, the two blocks of clusters identified through correlation analysis were further validated, showing significant overlap in latent spaces, as assessed by Fisher’s tests. An example projection of UKBBN latent spaces onto PITT-ADRC and ROSMAP datasets is illustrated in Fig. 1b, with detailed Fisher’s test statistics provided in Supplementary Table 5. In both analyses, no matching clusters were identified for UKBBN cluster C, PITT-ADRC cluster A, and ROSMAP cluster C.

As a result, these clusters were labeled as ‘Unassigned’ samples for further analysis. This classification remained consistent across both hierarchical and K-means clustering methods, reinforcing the robustness of the clustering results across different methodologies. To ensure subtype consistency, only samples consistently labeled as a subtype across both clustering methods were retained, with the final counts for each subtype reported in Supplementary Table 6. From this point forward, the first block of clusters is referred to as Subtype 1 (LOAD-S1) and the second block as Subtype 2 (LOAD-S2). Figure 2 presents a plot of the first two principal components (PCs) for the pooled dataset comprising UKBBN, PITTADRC, and ROSMAP cohorts, with samples labeled according to their confirmed LOAD subtypes.

Visualization of the first two PCs for LOAD subtypes across all cohorts. Plot of the first two PCs (PC1 and PC2) for DNAm profiles from all samples in the UKBBN, PITT-ADRC, and ROSMAP cohorts. Samples are labeled according to their subtype assignments: Subtype 1 (LOAD-S1) and Subtype 2 (LOAD-S2). ‘Unassigned’ samples are also indicated, representing clusters without matching correlations across cohorts.

Subtype-specific EWAS

The distinct methylomic signatures of the newly defined LOAD subtypes (LOAD-S1 and LOAD-S2), the Unassigned group, and overall LOAD were characterized using EWAS) across the UKBBN, PITT-ADRC, and ROSMAP cohorts, followed by meta-analyses. These analyses focused on 148,951 highly variable CpGs shared between the EPIC and 450K arrays, which were also used in the clustering algorithms. Comparing DNAm profiles of healthy controls (HC) with overall LOAD identified 267 DMPs. Additionally, 54 DMPs were associated with LOAD-S1, 202 with LOAD-S2, and 61 with the Unassigned group. DMPs were considered significant if they met the criteria of Bonferroni-adjusted p-value 0.05 (Supplementary Tables 7–10).

LOAD-S1 and LOAD-S2 exhibited distinct DNAm profiles, with the least overlap in their DMPs (odds ratio [OR] = 109.0). The greatest overlap with overall LOAD DMPs was observed in the LOAD-S2 (OR = 743.9), followed by the Unassigned group (OR = 694.6), followed by and LOAD-S1 (OR = 125.2). These findings highlight that LOAD-S1 and LOAD-S2 represent the most distinct DNAm profiles, particularly in comparison to the Unassigned group. Similarities and differences between LOAD subtypes and the Unassigned group relative to overall LOAD DMPs are detailed in Supplementary Table 11, with CpG overlaps visualized in Supplementary Fig. 7.

To evaluate the relationship between subtype-specific methylomic signatures identified in this study and previously reported DMPs associated with AD pathology, we examined the overlap of significant DMPs linked to neurofibrillary tangles (NFTs) in the PFC as reported by our group [32]. Of the 236 Bonferroni-significant DMPs linked to AD pathology in that study, 150 CpGs were included in the subset analyzed in the current study. The greatest overlap was observed with overall LOAD (75 CpGs; OR = 767.5) and the Unassigned group (23 CpGs; OR = 703.7). Notably, LOAD-S2 displayed a greater overlap with previously reported DMPs from the PFC DNAm meta-analysis (43 CpGs; OR = 413.7) compared to LOAD-S1 (3 CpGs; OR = 51.5). These findings suggest that LOAD-S2 is more strongly associated with AD pathology than LOAD-S1 (Supplementary Table 12).

Cell type-specific methylation and LOAD subtypes

To investigate the relationship between LOAD subtypes and brain cell types, we used the CEAM framework to assess enrichment of DMPs in cell type-specific CpG sets derived from purified brain cells. Overlap analyses revealed significant enrichment of DMPs from both LOAD-S1 and LOAD-S2 subtypes in microglia, with odds ratios of 3.6 (lowest specificity level) and 8.66 (highest specificity level), respectively. Notably, there was minimal overlap between the microglial methylation signatures of LOAD-S1 and LOAD-S2, with only one DMP shared, suggesting distinct microglial epigenetic profiles despite their shared association with this cell type (Supplementary Table 13).

AD genetics and epigenomic-based subtypes

To investigate the contribution of AD genetic risk variants to the epigenomic-based heterogeneity in LOAD, we examined AD risk genomic regions in relation to subtype-specific DMPs. By mapping these subtype-specific DMPs to mQTL data from the ROSMAP database, we identified 6,008 cis-mQTLs associated with LOAD-S1 and 11,008 cis-mQTLs associated with LOAD-S2. Bayesian colocalization analysis revealed that methylation loci in these subtypes are influenced by distinct AD-associated genomic regions across different chromosomes. Specifically, LOADS1 showed colocalization with genetic regions linked to BIN1, while LOAD-S2 methylation signatures were associated with variants in SPI1. Detailed statistics are provided in Supplementary Table 14.

Distinct transcriptomic profiles and pathway enrichment in LOAD subtypes

Analysis of bulk PFC RNA-sequencing data identified 162 genes in LOAD-S1 and 277 in LOAD-S2 (nominal p-value < 0.05 in all three cohorts, FC > 1.5 in at least one, with a consistent direction of effect). (Supplementary Tables 15–16). GO enrichment analysis revealed that LOAD-S1 is driven by immune dysregulation and inflammation, with enriched terms including ‘cell activation in immune response’ (GO:0002263), positive regulation of cytokine production’ (GO:0001819), and ‘tumor necrosis factor production’ (GO:0032640). In contrast, LOAD-S2 was enriched in synaptic and neuronal processes, including ‘vesicle-mediated transport in synapse’ (GO:0099003), ‘inhibitory synapse assembly’ (GO:1904862), and ‘neurotransmitter transport’ (GO:0006836). These findings highlight distinct molecular pathways driving LOAD subtypes, reinforcing its biological heterogeneity (Supplementary Tables 17–18, Fig. 3a).

(a) Heatmap showing Z-scores for the top 20 enriched Gene Ontology (GO) terms in LOAD-S1 and LOAD-S2 subtypes. Z-scores indicate the deviation of observed gene counts from expected values, normalized by standard deviation. LOAD-S1 is enriched in immune-related pathways, reflecting a dominant immune/inflammatory signature, while LOAD-S2 shows enrichment in synaptic and neuronal pathways. Immune pathways prevalent in LOAD-S1 are demonstrating weak enrichment in LOAD-S2, and vice versa. (b) Comparative number of DEGs across seven microglial (MG) states (MG 0–6) highlights distinct gene expression profiles for LOAD-S1 and LOAD-S2, with minimal overlap, underscoring subtype-specific transcriptional landscapes. (c) shows GO term enrichment for the 269 DEGs identified across MG 0–MG 6 in ROSMAP single-cell pseudobulk profiles.

Subtype-specific microglial single-cell transcriptomes and immune profiles

Analysis of single-cell RNA sequencing data from the ROSMAP study explored the distinct microglial signatures associated with LOAD subtypes at the methylation level. This analysis focused on 12 microglial (MG) states, as characterized by Sun et al. [29] and was conducted by aggregating single-cell data into pseudobulk profiles for each MG state and subtype. DEA was then restricted to the seven MG states (MG 0–6) that met our minimum cellcount and readdepth criteria in both subtype and control samples. This analysis yielded a total of 269 unique genes meeting our lenient significance threshold (p < 0.05) in at least one subtype-to subtype comparison and in a comparison versus healthy controls (Supplementary Tables 19). Comparative analysis revealed minimal overlap in DEGs across MG states, - with a maximum overlap of two genes in MG3 (Fig. 3b). GO enrichment analysis on these genes focused on microglial activity, immune responses, and pro- and anti-inflammatory states listed in Supplementary Table 20. We found 13 GO terms enriched with FDR-adjuested p < 0.05 with the highest number of genes observed in “immune response-activating signaling pathway (GO:0002757)” (Fig. 3c). A detailed list of significant GO terms and the genes contributing to each pathway can be found in Supplementary Tables 21. Additionally, we highlighted the genes that are ordinarily upregulated in each MG state based on the markers reported by Sun et al. (https://compbio.mit.edu/microglia_states/). There was no overlap between the upregulated genes in MG0 and MG1; however, MG2 (the inflammatory I state) shared TYMP and MGAT5, MG3 (ribosome biogenesis) shared TMSB4X, RPL28, and APOC1, MG4 (lipid processing) shared SH3PXD2A, and ALCAM, MG5 (phagocytic) shared IFI44L, UTRN, IFI44 and TRIM14, and MG6 (stress signature) shared ATP13A3, RALGPS2, and IRF8.

Clinical and demographic characterization of LOAD subtypes

Pathological and clinical assessments revealed no significant differences between LOADS1 and LOAD-S2 in terms of APOE status, polygenic risk scores for AD, age of onset, last cognitive assessment, or key measures of NFT and amyloid pathology (Supplementary Table 22). These results suggest that the distinct epigenetic and transcriptomic signatures defining the subtypes, particularly their enrichment in specific cell-type DNAm patterns, operate independently of traditional clinical and pathological markers of AD.

Discussion

This study identified two distinct epigenetic subtypes of LOAD by analyzing genome wide DNAm profiles across three large-scale, independent postmortem brain cohorts. While these subtypes showed no association with established clinicopathological markers of LOAD, cellular and molecular characterization revealed distinct DNAm and transcriptional profiles, highlighting their biological relevance and potential role in LOAD heterogeneity.

Both subtypes exhibited enriched DNAm signatures in microglia only, indicating an immune and inflammatory association in both LOAD-S1 and LOAD-S2. However, bulk RNA sequencing analysis revealed that LOAD-S1 is primarily linked to immune and inflammatory pathways, whereas LOAD-S2 showed stronger associations with synaptic dysfunction and neuronal communication, highlighting distinct molecular mechanisms underlying these subtypes, which would be involving different functions within microglia.

Single-cell RNA sequencing analysis of microglial states further revealed that both subtypes share enrichment of innate immune pathways, but they engage largely non-overlapping gene modules, indicating different modes of activation, regulation, and resolution.

LOAD-S1 microglia transcriptional profile involves in innate immune pathways displays mostly pro-inflammatory phenotype characterized by the upregulation of key innate-immune sensors and interferon (IFN) mediators, alongside the loss of anti-inflammatory checkpoints. In particular, UNC93B1, which is essential for endosomal TLR3/7/9 signaling, and STAT2, a driver of type-I interferon–stimulated gene programs, are both elevated, promoting sustained IFN and cytokine production [33]. Concomitantly, the downregulation of PPAR-γ, a nuclear receptor that normally trans-represses NF-κB [34] and HSPA1B, which inhibits NLRP3 inflammasome assembly [35], removes critical brakes on microglial activation. Although PARP14 is modestly induced and may exert some anti-inflammatory transcriptional control [36], the aggregate profile of LOAD_S1 suggests microglia are skewed toward chronic inflammatory signaling and potential neurotoxicity.

LOAD_S2 microglia, by contrast, exhibits a subtle anti-inflammatory skew, marked by the induction of resolution and repair pathways alongside attenuation of pro-inflammatory coactivators. Upregulation of PRKCE supports Aβ clearance and dampens cytokine release [37], while increased VSIG4 expression promotes complement-mediated phagocytosis with reduced oxidative burst [38, 39]. Elevated TGFB1 further enforces a regulatory phenotype by suppressing microglial activation and facilitating tissue remodeling [40]. Meanwhile, decreased levels of EP300 and PTPN1 reduce transcriptional drive for inflammatory genes and lower Src-family kinase–mediated cytokine production [41]. The net effect in LOAD_S2 is a counter-regulatory program that tempers inflammatory damage and favors debris clearance and repair.

When we examined genes normally upregulated in specific microglial states, we found that TYMP which is ordinarily elevated in MG2 to generate deoxyribosemediated reactive oxygen species, amplifying inflammation remains unchanged in LOAD_S1, allowing a sustained inflammatory response, but is actively downregulated in LOAD_S2, consistent with suppression of inflammatory amplification. Conversely, MGAT5, a Golgi Nacetylglucosaminyltransferase that adds β1,6branched Nglycans to immune receptors to enforce a galectinmediated “lattice” that dampens receptor clustering and Tcell activation serves as an antiinflammatory brake [42]. In LOAD_S1, MGAT5 is downregulated, removing this glycandependent checkpoint and exacerbating inflammation, whereas LOAD_S2 preserves MGAT5 expression, supporting glycanmediated regulation of immune responses.

In this study, we also found that distinct AD-associated genomic regions may potentially underlie the methylomic heterogeneity of LOAD subtypes. Specifically, SNPs at the BIN1 locus colocalize with the methylation signature of LOAD-S1, consistent with its pro-inflammatory phenotype. Sudwarts et al. demonstrated that siRNA-mediated BIN1 knockdown in primary microglia blunted the induction of key inflammatory and disease-associated genes and reduced cytokine secretion, while conditional deletion of Bin1 in microglia in vivo impaired type I interferon and other pro-inflammatory responses [43]. However, demonstrating causality will require microglia-resolved strategies including mapping microglia-specific mQTLs and splicing-QTLs to link genotype, CpG methylation, and BIN1 isoform usage within individual microglia.

In LOAD-S2, risk variants at the SPI1 locus colocalize with its DNAm signature. Mechanistically, PU.1, the transcription factor encoded by SPI1, controls key microglial programs. Higher PU.1 levels boost cell survival under cytotoxic stress and amplify pro-inflammatory signaling, thereby promoting A1-reactive astrocyte activation and exacerbating AD pathology. Conversely, lower PU.1 expression increases microglial susceptibility to cytotoxicity, dampens inflammatory cascades, and reduces A1 astrocyte signatures, consistent with the protective effect of certain SPI1 genotypes in AD [44].

The epigenomic-based LOAD subtypes identified in this study show notable correspondence with previously predicted molecular subtypes of AD. The bulk RNA-seq characterization of epigenomics subtypes highlighted LOAD-S1 as immune/inflammatory and LOAD-S2 as synaptic/neuronal map remarkably onto the CSF proteomic subtypes of Tijms et al. [6] their “innate immune activation” subtype (Tijms S2) mirrors our LOAD-S1, and their “hyperplasticity” subtype (Tijms S1) recapitulates our LOAD-S2. Additionally, we found that two out of the three categories identified in the transcriptomic-based subtypes of AD [7] assigned to immune-related and synaptic pathway share similar characteristics with the methylomic LOAD subtypes identified in this study. This cross-platform concordance not only strengthens the biological validity of these subtypes but also underscores that divergent microglial functions (pro-inflammatory versus synapse-supportive) are fundamental axes of heterogeneity in LOAD.

The presence of unassigned samples, which did not consistently align with LOAD-S1 or LOAD-S2 across cohorts, raises questions about their biological significance and whether they represent distinct, yet uncharacterized, subtypes, intermediate states, or result from other factors. Several factors could explain the unassigned samples. They may represent mixed or intermediate phenotypes, exhibiting overlapping features of both LOAD-S1 and LOAD-S2 without strongly matching either subtype. Another possibility is the presence of co-pathologies, such as other tauopathies, alpha-synucleinopathy, or TDP-43 proteinopathies, which could influence methylation patterns. While efforts were made to screen for these co-pathologies, data availability was not uniform across all samples, potentially introducing variability that obscured subtype-specific DNAm signals. Alternatively, these samples might reflect a cohort-specific subtype, driven by unique genetic or environmental factors that are less prevalent or absent in the replication cohorts. Further investigations integrating multi-omic approaches, larger datasets, and additional neuropathological assessments will be necessary to determine whether these unassigned cases represent a distinct LOAD subtype or reflect methodological limitations.

The lack of correlation between LOAD subtypes and clinicopathological markers, raising questions about how these epigenetic subtypes relate to traditional AD classifications. However, prior research has demonstrated that neuroinflammation in AD is associated with disruptions in brain network connectivity, a key driver of disease progression that occurs independently of amyloid and NFT pathology or cortical atrophy [45]. Moreover, the lack of strong correlation with clinical or pathological markers in postmortem brain tissue has been consistently observed in transcriptomic-based AD subtypes [7]. This disconnect suggests that transcriptomic and epigenomic stratification may capture dimensions of disease biology that lie beyond the scope of current diagnostic criteria.

A key strength of this study is its large, cross-cohort design, analyzing postmortem brain samples from three independent biobanks (UKBBN, PITT-ADRC, and ROSMAP). This approach enhances the robustness of findings, ensuring validation across diverse populations while minimizing cohort-specific biases. Additionally, the study integrates genetic, transcriptomic, and cell-type-specific data, providing a comprehensive characterization of the epigenomic-based LOAD subtypes. This multi-omic approach strengthens the biological relevance of the findings and offers insight into subtype-specific molecular mechanisms underlying AD heterogeneity.

A major limitation of this study is that while it provides robust evidence of epigenetic heterogeneity in LOAD and its association with distinct biological pathways, it does not establish causality. The relationship between epigenetic heterogeneity and LOAD is likely bidirectional, with DNAm changes both contributing to and resulting from the disease process. Additionally, the lack of environmental risk factor data limits the ability to determine whether observed epigenetic changes stem from environmental exposures, lifestyle factors, or disease-related processes. The absence of detailed environmental data further restricts the ability to assess the role of non-genetic factors in driving the methylomic heterogeneity observed across LOAD subtypes.

Conclusion

The findings of this study align with previous large-scale EWASs, which have shown that DNAm changes associated with AD predominantly occur in non-neuronal cells, particularly microglia [32, 46–48]. In this study, we expanded on these insights by identifying cell-type-specific DNAm signatures for the newly defined LOAD subtypes and establishing a clear link between these subtypes and distinct patterns of inflammatory microglial activity. We also showed that, although LOAD-S1 and LOAD-S2 samples are rigorously matched for age and overall AD pathology burden, they capture distinct microglial transcriptional states rather than successive disease stages. Even within brains carrying equivalent plaque and tangle loads, individual microglia can adopt divergent activation programs in response to local cues. These state-specific gene expression patterns likely reflect microenvironmental heterogeneity, such as regional differences in amyloid conformation, local cytokine gradients, or neuron–glia cross-talk as well as cell-intrinsic epigenetic priming that persists despite similar pathological exposures. In other words, LOAD-S1 and LOAD-S2 define parallel “subpopulations” or activation phenotypes of microglia co-existing in the AD brain, not sequential snapshots of progression, which justifies why their opposing inflammatory and synaptic gene programs emerge even when age and pathology are held constant. This study further highlights the importance of subtype-specific analyses in uncovering heterogeneity within complex diseases like AD and provides a foundation for future research aimed at elucidating causal pathways and identifying potential therapeutic targets. While challenges remain regarding causality and environmental influences, these findings lay critical groundwork for refining precision medicine approaches in AD research and treatment.

Acknowledgments

E.P. was supported by a ZonMw Memorabel/Alzheimer Nederland Grant (733050516). V.L. received support through a PhD scholarship funded by the Mental Health and Neuroscience Research Institute (MHeNs), Maastricht University. The generation of DNAm data, bulk RNA sequencing of the PITT-ADRC samples, and the analysis of FANS-purified nuclei from the BDR samples were supported by a grant from the National Institute on Aging (NIA) of the National Institutes of Health (NIH) (R01AG067015) awarded to K.L., B.C., and E.P. The analysis of DNA methylation data and bulk RNA sequencing of the UKBBN samples was funded by a Medical Research Council (MRC) grant (MR/S011625/1) and an Alzheimer’s Society grant (AS-PG-16b-012) awarded to K.L.

Footnotes

Competing interests

The authors have no competing interests to declare.

Contributor Information

Valentin T. Laroche, Maastricht University

Rachel Cavill, Maastricht University.

Morteza Kouhsar, University of Exeter, Royal Devon & Exeter Hospital.

Joshua Müller, Maastricht University.

Rick A. Reijnders, Maastricht University

Joshua Harvey, University of Exeter, Royal Devon & Exeter Hospital.

Adam R. Smith, University of Exeter, Royal Devon & Exeter Hospital

Jennifer Imm, University of Exeter, Royal Devon & Exeter Hospital.

Jarno Koetsier, Maastricht University.

Luke Weymouth, University of Exeter, Royal Devon & Exeter Hospital.

Lachlan MacBean, University of Exeter, Royal Devon & Exeter Hospital.

Giulia Pegoraro, University of Exeter, Royal Devon & Exeter Hospital.

Lars Eijssen, Maastricht University.

Byron Creese, Brunel University.

Gunter Kenis, Maastricht University.

Betty M. Tijms, Alzheimer Center Amsterdam, Vrije Universiteit Amsterdam, Amsterdam UMC

Daniel van den Hove, Maastricht University.

Katie Lunnon, University of Exeter, Royal Devon & Exeter Hospital.

Ehsan Pishva, Maastricht University.

Data availability

The PITT-ADRC datasets used in this study are available on Synapse (https://www.synapse.org/) under Synapse ID: syn23538600. Access requires creating a Synapse user account and submitting a data access request. The UKBBN dataset is accessible via GEO under accession number GSE284764. ROSMAP datasets are also deposited on Synapse (Synapse IDs: syn7357283, syn23650893, syn3157325). Microglial single-cell RNA sequencing data and markers of microglial states were obtained from https://compbio.mit.edu/microglia_states/. The ROSMAP mQTL dataset is accessible at https://mostafavilab.stat.ubc.ca/xqtl/. All codes used for DNA methylation and bulk transcriptomic analyses, clustering, replication, and cross-cohort validations are available at https://github.com/Dementia-Systems-Biology/LOAD_subtyping. For microglial single cell DEG analyses, codes from https://github.com/mathyslab7/ROSMAP_snRNAseq_PFC were used.

References

1.Scheltens P et al. (2021) Alzheimer’s disease. Lancet 397(10284):1577–1590 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.DeTure MA, Dickson DW (2019) The neuropathological diagnosis of Alzheimer’s disease. Mol Neurodegener 14(1):32. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Mann UM, Mohr E, Gearing M, Chase TN (1992) Heterogeneity in Alzheimer’s disease: progression rate segregated by distinct neuropsychological and cerebral metabolic profiles. J Neurol Neurosurg Psychiatry 55(10):956–959 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Mukherjee S et al. (2020) Genetic data and cognitively defined late-onset Alzheimer’s disease subgroups. Mol Psychiatry 25(11):2942–2951 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Komarova NL, Thalhauser CJ (2011) High degree of heterogeneity in Alzheimer’s disease progression patterns. PLoS Comput Biol 7(11):e1002251. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Tijms BM et al. (2024) Cerebrospinal fluid proteomics in patients with Alzheimer’s disease reveals five molecular subtypes with distinct genetic risk profiles. Nat Aging 4(1):33–47 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Neff RA et al. (2021) Molecular subtyping of Alzheimer’s disease using RNA sequencing data reveals novel mechanisms and targets. Sci Adv, 7(2) [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Van Asbroeck S et al. (2024) Lifestyle and incident dementia: A COSMIC individual participant data meta-analysis. Alzheimers Dement 20(6):3972–3986 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Koetsier J et al. (2024) Blood-based multivariate methylation risk score for cognitive impairment and dementia. Alzheimers Dement [DOI] [PMC free article] [PubMed] [Google Scholar]
10.DeMichele-Sweet MA et al. (2011) No association of psychosis in Alzheimer disease with neurodegenerative pathway genes. Neurobiol Aging 32(3):555e9–55511 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Bennett DA et al. (2018) Religious Orders Study and Rush Memory and Aging Project. J Alzheimers Dis 64(s1):S161–S189 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Pidsley R et al. (2013) A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 14:293. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.McCartney DL et al. (2016) Identification of polymorphic and off-target probe binding sites on the Illumina Infinium MethylationEPIC BeadChip. Genom Data 9:22–24 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Vellame DS et al. (2023) Uncertainty quantification of reference-based cellular deconvolution algorithms. Epigenetics 18(1):2137659. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Dobin A et al. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Danecek P et al. (2021) Twelve years of SAMtools and BCFtools. Gigascience, 10(2) [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Danecek P et al. (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Choi SW, O’Reilly PF (2019) PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience, 8(7) [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bellenguez C et al. (2022) New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat Genet 54(4):412–436 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Vinh NX, Epps J, Bailey J (2010) Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. J Mach Learn Res 11:2837–2854 [Google Scholar]
22.Rohart F, Gautier B, Singh A, Le Cao KA (2017) mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol 13(11):e1005752. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.van Iterson M, van Zwet EW, Consortium B, Heijmans BT (2017) Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biol 18(1):19. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Viechtbauer W (2010) Conducting Meta-Analyses in R with the metafor Package. J Stat Softw 36(3):1–48 [Google Scholar]
26.Müller J et al. (2025) A Cell Type Enrichment Analysis Tool for Brain DNA Methylation Data (CEAM). bioRxiv, : p. 2025.07.08.663671 [Google Scholar]
27.Wang G, Sarkar A, Carbonetto P, Stephens M (2020) A simple new approach to variable selection in regression, with application to genetic fine mapping. J R Stat Soc Ser B Stat Methodol 82(5):1273–1300 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Ritchie ME et al. (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Sun N et al. (2023) Human microglial state dynamics in Alzheimer’s disease progression. Cell 186(20):4386–4403e29 [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Yu GC, Wang LG, Han YY, He QY (2012) clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. Omics-a J Integr Biology 16(5):284–287 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Miao J et al. (2023) Microglia in Alzheimer’s disease: pathogenesis, mechanisms, and therapeutic potentials. Front Aging Neurosci 15:1201982. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Smith RG et al. (2021) A meta-analysis of epigenome-wide association studies in Alzheimer’s disease highlights novel differentially methylated loci across cortex. Nat Commun 12(1):3517. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Hung Y et al. (2021) Macropinosomes host TLR9 signaling and regulation of inflammatory responses in microglia. bioRxiv, : p. 2021.02.11.430773 [Google Scholar]
34.Gao C, Jiang JW, Tan YY, Chen SD (2023) Microglia in neurodegenerative diseases: mechanism and potential therapeutic targets. Signal Transduct Target Therapy, 8(1) [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Martine P et al. (2019) HSP70 is a negative regulator of NLRP3 inflammasome activation. Cell Death & Disease, p 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Iwata H et al. (2016) PARP9 and PARP14 cross-regulate macrophage activation via STAT1 ADP-ribosylation. Nat Commun, 7 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Choi DS et al. (2006) PKCε increases endothelin converting enzyme activity and reduces amyloid plaque pathology in transgenic mice. Proc Natl Acad Sci USA 103(21):8215–8220 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Lyu QP et al. (2020) Microglial V-set and immunoglobulin domain-containing 4 protects against ischemic stroke in mice by suppressing TLR4-regulated inflammatory response. Biochem Biophys Res Commun 522(3):560–567 [DOI] [PubMed] [Google Scholar]
39.Lu HF et al. (2025) VSIG4 Alleviates Intracranial Hemorrhage Injury by Regulating Oxidative Stress and Neuroinflammation in Macrophages via the NRF2/HO-1 Signaling Pathway. Front Bioscience-Landmark, 30(4) [DOI] [PubMed] [Google Scholar]
40.Kapoor M, Chinnathambi S (2023) TGF-β1 signalling in Alzheimer’s pathology and cytoskeletal reorganization: a specialized Tau perspective. J Neuroinflamm, 20(1) [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Vieira MNN, Silva NMLE, Ferreira ST, De Felice FG (2017) Protein Tyrosine Phosphatase 1B (PTP1B): A Potential Target for Alzheimer’s Therapy? Front Aging Neurosci, 9 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Shibui A et al. (2011) Alteration of immune responses by N-acetylglucosaminyltransferase V during allergic airway inflammation. Allergol Int 60(3):345–354 [DOI] [PubMed] [Google Scholar]
43.Sudwarts A et al. (2022) BIN1 is a key regulator of proinflammatory and neurodegeneration-related activation in microglia. Mol Neurodegeneration 17(1):33. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Pimenova AA et al. (2021) Alzheimer’s-associated PU.1 expression levels regulate microglial inflammatory response. Neurobiol Dis 148:105217. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Leng FD et al. (2023) Neuroinflammation is independently associated with brain network dysfunction in Alzheimer’s disease. Mol Psychiatry 28(3):1303–1311 [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Shireby G et al. (2022) DNA methylation signatures of Alzheimer’s disease neuropathology in the cortex are primarily driven by variation in non-neuronal cell-types. Nat Commun, 13(1) [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Zhang L et al. (2020) Epigenome-wide meta-analysis of DNA methylation differences in prefrontal cortex implicates the immune processes in Alzheimer’s disease. Nat Commun 11(1):6114. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Muller J et al. (2025) A Cell Type Enrichment Analysis Tool for Brain DNA Methylation Data (CEAM). bioRxiv, : p. 2025.07.08.663671 [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[R1] 1.Scheltens P et al. (2021) Alzheimer’s disease. Lancet 397(10284):1577–1590 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.DeTure MA, Dickson DW (2019) The neuropathological diagnosis of Alzheimer’s disease. Mol Neurodegener 14(1):32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Mann UM, Mohr E, Gearing M, Chase TN (1992) Heterogeneity in Alzheimer’s disease: progression rate segregated by distinct neuropsychological and cerebral metabolic profiles. J Neurol Neurosurg Psychiatry 55(10):956–959 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Mukherjee S et al. (2020) Genetic data and cognitively defined late-onset Alzheimer’s disease subgroups. Mol Psychiatry 25(11):2942–2951 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Komarova NL, Thalhauser CJ (2011) High degree of heterogeneity in Alzheimer’s disease progression patterns. PLoS Comput Biol 7(11):e1002251. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Tijms BM et al. (2024) Cerebrospinal fluid proteomics in patients with Alzheimer’s disease reveals five molecular subtypes with distinct genetic risk profiles. Nat Aging 4(1):33–47 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Neff RA et al. (2021) Molecular subtyping of Alzheimer’s disease using RNA sequencing data reveals novel mechanisms and targets. Sci Adv, 7(2) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Van Asbroeck S et al. (2024) Lifestyle and incident dementia: A COSMIC individual participant data meta-analysis. Alzheimers Dement 20(6):3972–3986 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Koetsier J et al. (2024) Blood-based multivariate methylation risk score for cognitive impairment and dementia. Alzheimers Dement [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.DeMichele-Sweet MA et al. (2011) No association of psychosis in Alzheimer disease with neurodegenerative pathway genes. Neurobiol Aging 32(3):555e9–55511 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Bennett DA et al. (2018) Religious Orders Study and Rush Memory and Aging Project. J Alzheimers Dis 64(s1):S161–S189 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Pidsley R et al. (2013) A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 14:293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.McCartney DL et al. (2016) Identification of polymorphic and off-target probe binding sites on the Illumina Infinium MethylationEPIC BeadChip. Genom Data 9:22–24 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Vellame DS et al. (2023) Uncertainty quantification of reference-based cellular deconvolution algorithms. Epigenetics 18(1):2137659. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Dobin A et al. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Danecek P et al. (2021) Twelve years of SAMtools and BCFtools. Gigascience, 10(2) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Danecek P et al. (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Choi SW, O’Reilly PF (2019) PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience, 8(7) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Bellenguez C et al. (2022) New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nat Genet 54(4):412–436 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Vinh NX, Epps J, Bailey J (2010) Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. J Mach Learn Res 11:2837–2854 [Google Scholar]

[R22] 22.Rohart F, Gautier B, Singh A, Le Cao KA (2017) mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol 13(11):e1005752. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.van Iterson M, van Zwet EW, Consortium B, Heijmans BT (2017) Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biol 18(1):19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Viechtbauer W (2010) Conducting Meta-Analyses in R with the metafor Package. J Stat Softw 36(3):1–48 [Google Scholar]

[R26] 26.Müller J et al. (2025) A Cell Type Enrichment Analysis Tool for Brain DNA Methylation Data (CEAM). bioRxiv, : p. 2025.07.08.663671 [Google Scholar]

[R27] 27.Wang G, Sarkar A, Carbonetto P, Stephens M (2020) A simple new approach to variable selection in regression, with application to genetic fine mapping. J R Stat Soc Ser B Stat Methodol 82(5):1273–1300 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Ritchie ME et al. (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Sun N et al. (2023) Human microglial state dynamics in Alzheimer’s disease progression. Cell 186(20):4386–4403e29 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Yu GC, Wang LG, Han YY, He QY (2012) clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. Omics-a J Integr Biology 16(5):284–287 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Miao J et al. (2023) Microglia in Alzheimer’s disease: pathogenesis, mechanisms, and therapeutic potentials. Front Aging Neurosci 15:1201982. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Smith RG et al. (2021) A meta-analysis of epigenome-wide association studies in Alzheimer’s disease highlights novel differentially methylated loci across cortex. Nat Commun 12(1):3517. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Hung Y et al. (2021) Macropinosomes host TLR9 signaling and regulation of inflammatory responses in microglia. bioRxiv, : p. 2021.02.11.430773 [Google Scholar]

[R34] 34.Gao C, Jiang JW, Tan YY, Chen SD (2023) Microglia in neurodegenerative diseases: mechanism and potential therapeutic targets. Signal Transduct Target Therapy, 8(1) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Martine P et al. (2019) HSP70 is a negative regulator of NLRP3 inflammasome activation. Cell Death & Disease, p 10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Iwata H et al. (2016) PARP9 and PARP14 cross-regulate macrophage activation via STAT1 ADP-ribosylation. Nat Commun, 7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Choi DS et al. (2006) PKCε increases endothelin converting enzyme activity and reduces amyloid plaque pathology in transgenic mice. Proc Natl Acad Sci USA 103(21):8215–8220 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Lyu QP et al. (2020) Microglial V-set and immunoglobulin domain-containing 4 protects against ischemic stroke in mice by suppressing TLR4-regulated inflammatory response. Biochem Biophys Res Commun 522(3):560–567 [DOI] [PubMed] [Google Scholar]

[R39] 39.Lu HF et al. (2025) VSIG4 Alleviates Intracranial Hemorrhage Injury by Regulating Oxidative Stress and Neuroinflammation in Macrophages via the NRF2/HO-1 Signaling Pathway. Front Bioscience-Landmark, 30(4) [DOI] [PubMed] [Google Scholar]

[R40] 40.Kapoor M, Chinnathambi S (2023) TGF-β1 signalling in Alzheimer’s pathology and cytoskeletal reorganization: a specialized Tau perspective. J Neuroinflamm, 20(1) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Vieira MNN, Silva NMLE, Ferreira ST, De Felice FG (2017) Protein Tyrosine Phosphatase 1B (PTP1B): A Potential Target for Alzheimer’s Therapy? Front Aging Neurosci, 9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Shibui A et al. (2011) Alteration of immune responses by N-acetylglucosaminyltransferase V during allergic airway inflammation. Allergol Int 60(3):345–354 [DOI] [PubMed] [Google Scholar]

[R43] 43.Sudwarts A et al. (2022) BIN1 is a key regulator of proinflammatory and neurodegeneration-related activation in microglia. Mol Neurodegeneration 17(1):33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Pimenova AA et al. (2021) Alzheimer’s-associated PU.1 expression levels regulate microglial inflammatory response. Neurobiol Dis 148:105217. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Leng FD et al. (2023) Neuroinflammation is independently associated with brain network dysfunction in Alzheimer’s disease. Mol Psychiatry 28(3):1303–1311 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] 46.Shireby G et al. (2022) DNA methylation signatures of Alzheimer’s disease neuropathology in the cortex are primarily driven by variation in non-neuronal cell-types. Nat Commun, 13(1) [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Zhang L et al. (2020) Epigenome-wide meta-analysis of DNA methylation differences in prefrontal cortex implicates the immune processes in Alzheimer’s disease. Nat Commun 11(1):6114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Muller J et al. (2025) A Cell Type Enrichment Analysis Tool for Brain DNA Methylation Data (CEAM). bioRxiv, : p. 2025.07.08.663671 [Google Scholar]

PERMALINK

This is a preprint.

Epigenomic subtypes of late-onset Alzheimer’s disease reveal distinct microglial signatures

Valentin T Laroche

Rachel Cavill

Morteza Kouhsar

Joshua Müller

Rick A Reijnders

Joshua Harvey

Adam R Smith

Jennifer Imm

Jarno Koetsier

Luke Weymouth

Lachlan MacBean

Giulia Pegoraro

Lars Eijssen

Byron Creese

Gunter Kenis

Betty M Tijms

Daniel van den Hove

Katie Lunnon

Ehsan Pishva

Abstract

Introduction

Materials and methods

Brain samples

Diagnostic Criteria

Methylomic profiling and data harmonization

Bulk RNA sequencing

Genotyping, imputation, and generation of polygenic scores

Data analysis

Clustering algorithms

Cross-cohort replication

Subtype-specific epigenome-wide association analysis

Cell-type specific DNAm enrichment analysis

Colocalization analysis

Bulk transcriptomic analysis

Single-cell transcriptomic analysis

GO enrichment analysis

Results

Data-driven clustering

Cross-cohort replication of epigenomic-based subtypes

Figure 1.

Figure 2. Characterization of the methylomic signatures of predicted LOAD subtypes.

Subtype-specific EWAS

Cell type-specific methylation and LOAD subtypes

AD genetics and epigenomic-based subtypes

Distinct transcriptomic profiles and pathway enrichment in LOAD subtypes

Figure 3. Transcriptomic characterization of LOAD epigenomic Subtypes.

Subtype-specific microglial single-cell transcriptomes and immune profiles

Clinical and demographic characterization of LOAD subtypes

Discussion

Conclusion

Acknowledgments

Footnotes

Contributor Information

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases