Unsupervised machine learning integrates genomic variants and EMR to unravel mechanisms of brain hemorrhage and epilepsy as early indicators of Alzheimer’s in down syndrome

Yichuan Liu; Hui-Qi Qu; Xiao Chang; Frank D Mentch; Haijun Qiu; Kenny Nguyen; Garnet Eister; Kayleigh Ostberg; Joseph Glessner; Hakon Hakonarson

doi:10.1186/s13195-025-01946-w

. 2025 Dec 19;18:20. doi: 10.1186/s13195-025-01946-w

Unsupervised machine learning integrates genomic variants and EMR to unravel mechanisms of brain hemorrhage and epilepsy as early indicators of Alzheimer’s in down syndrome

Yichuan Liu ^1,^✉, Hui-Qi Qu ¹, Xiao Chang ¹, Frank D Mentch ¹, Haijun Qiu ¹, Kenny Nguyen ¹, Garnet Eister ¹, Kayleigh Ostberg ¹, Joseph Glessner ^1,^2,³, Hakon Hakonarson ^1,^2,^3,^4,^5,^✉

PMCID: PMC12849591 PMID: 41419980

Abstract

Background

Comorbidity frequently manifests in pediatric diseases, especially for children with congenital defects due to chromosomal aberrations, such as Down Syndrome (DS). Disease comorbidities are often overlooked in genetic studies due to the nature of their complexity despite their clinical significance in diagnosis and treatments. Recurrent cerebral microbleeds are observed in a subset of patients with DS and may present an early indicator of cognitive decline and Alzheimer’s disease (AD), which affects at least 40% of all DS cases. Little attention has been directed toward exploring the genomic factors that contribute extensive brain hemorrhages and subsequent connection with dementia, nor to other comorbidities that may also present as an early dementia or AD indicators, such as epilepsy.

Methods

In this study, 1,134 whole-genome sequencing (WGS) samples were examined with DNA derived from blood, including 709 patients diagnosed with DS and 425 healthy individuals representing family members. Among the 709 cases, 20 DS patients have documented history of brain hemorrhage, while 83 exhibited severe epilepsy. Unsupervised machine learning algorithms were applied for the genomic variants identified in WGS data to generate genotype clusters, meanwhile cohort patient’s electronic medical records (EMR) were extracted, encompassing 443 self-reported medical symptoms, 2,206 abnormal lab tests, and 3,499 international classification of diseases (ICD) codes across 10 major pediatric disease categories. The association analysis was conducted between genotype clusters and each phenotype.

Results

For DS patients with brain hemorrhage, we identified exonic mutations in eight genes associated with cerebral hemorrhage (FDR = 0.04) and genomic variants of 11 genes for neovascularization (FDR = 0.003), while genomic variants associated with brain hemorrhage were also found to be significantly enriched in pathways associated with and/or early phase of AD, such as brain inflammation, olfactory impairments, loss of melanin/neuromelanin, G protein-coupled receptor kinase, and casein kinase 1 gamma/ epsilon activities. Of particular interest, genotype clusters of brain hemorrhage and epilepsy were significantly overlapping (p value < 1E-10), and 217 overlapping genes showed enrichment in somatic diversification of immune receptors, a known genetic trait associated with cerebral hemorrhage, epilepsy, and early stage of AD.

Conclusion

This study applies unsupervised machine learning to whole-genome sequencing to identify genomic variants associated with pediatric comorbidities in Down syndrome, integrating these findings with longitudinal electronic health records from a large clinical cohort. These analyses highlight biologically plausible pathways and provide a hypothesis-generating framework for linking genotype and phenotype in this population. Although associations for brain hemorrhage are based on a small number of cases and require replication in independent cohorts, these findings are still set the stage for leveraging genetic insights to inform targeted interventions and treatments for pediatric DS patients.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13195-025-01946-w.

Keywords: Alzheimer’s, Down syndrome, Disease comorbidity, Machine learning, Whole genome sequencing (WGS)

Statement of significance

Our study leverages unsupervised machine learning to analyze WGS variants, avoiding any biased assumptions, and investigates the association between clustered genomic variants and possible phenotypes extracted from EMR, including symptoms, lab tests, and medical diagnosis. This approach empowers investigation of disease comorbidities, particularly in cases where phenotypes exhibit high correlation. Our findings, as illustrated through the analysis of brain hemorrhage and epilepsy, suggest that these potential early indicators of Alzheimer’s in DS patients, reveal highly shared genes and molecular pathways between the two distinct clinical diagnoses in DS patients. These results provide novel insights into disease comorbidities for DS children, offering a potential perspective for exploring therapeutic interventions leveraging these variants.

Introduction

Comorbidity in Down Syndrome (DS) significantly impacts the health and quality of life of affected individuals. Despite the critical importance of understanding these comorbid conditions, the intricate nature of their interconnected pathologies poses substantial obstacles to research methodologies. Particularly noteworthy is the high prevalence of Alzheimer’s disease (AD) within the DS population, affecting at least 40% of individuals [1]. Considering the lifespan of individuals with DS often exceeding 65 years and the early onset of AD (as early as late 30 s to early 40 s), early diagnosis and treatments have emerged as a critical focus within both academic and pharmacological spheres. However, potential early indicators of neurodegenerative processes, e.g., brain hemorrhage and severe epilepsy [2], remain underexplored in the context of DS.

While prior studies have applied supervised models or polygenic risk score (PRS) frameworks to connect genomics and clinical phenotypes in Alzheimer’s disease and related disorders, these approaches depend on established variant–phenotype associations and may miss unanticipated links. For example, Bracher-Smith et al. developed machine learning (ML) models on genome-wide data to recapitulate known AD loci and uncover new ones by combining gradient boosting machines (GBMs), neural networks, and multifactor dimensionality reduction, outperforming classical PRS in some contexts [3]. Likewise, Monk et al. used a neural-network method (netSNP) to prioritize rare SNPs associated with AD beyond standard GWAS signals [4]. These and other ML-based human genetics efforts remain largely supervised or prediction-driven in design. In contrast, our study embraces an unsupervised genome-electronic medical records (EMR) integration at whole-genome scale, clustering recurrent DS-specific variants (≥ 3 occurrences in DS, absent in controls), merging correlated features, filtering by the Kaiser–Meyer–Olkin (KMO) measure, and applying promax-rotated factor analysis before testing associations across 4,000 + EMR phenotypes. This strategy lets us detect pre-diagnostic, shared molecular signals across comorbidities like brain hemorrhage and epilepsy that might evade label-driven models. The approach aligns with recent work showing that conventional GWAS remain underpowered for rare outcomes such as intracerebral hemorrhage, where only two genome-wide significant loci (APOE and 1q22) have been identified to date despite high heritability [5, 6]. GWAS are optimized for common variants, require very large sample sizes, and face a substantial multiple-testing burden [7, 8], while rare and noncoding variants are underestimated [9]. In parallel, recent unsupervised deep-learning frameworks demonstrate that latent phenotypes can yield substantially more associated loci than conventional traits [10, 11]. These studies provide methodological support for our use of unsupervised factor analysis of WGS data as a complementary strategy to traditional GWAS for dissecting the genetic architecture of brain hemorrhage in DS.

In this study, we leveraged one of the most extensive datasets available, encompassing DS individuals within the Gabriella Miller Kids First program project (https://kidsfirstdrc.org/) established by investigators at the Children’s Hospital of Philadelphia (CHOP), along with detailed EMR abstracted by the Center for Applied Genomics (CAG) at CHOP. This invaluable resource enabled us to investigate the genotype and clinical phenotype association by a genome-wide, unbiased, and data-driven approach without relying on prior assumptions. Our primary objective was to dissect the genetic underpinnings and clinical expressions by identifying associations between genotypes and phenotypes. Specifically, we delved into the biological mechanisms potentially leading to dementia in the DS population by examining the intricate relationships between three distinct clinical phenotypes, AD, brain hemorrhage, and epilepsy. This endeavor, aimed to illuminate the complex biological networks driving these complex conditions, will contribute to our understanding of early dementia onset in DS. By integrating extensive genomic sequencing data with rich clinical records, our study underscores the potential of such comprehensive approaches to address the multifaceted challenges posed by comorbid diseases in genetic studies, paving the way for novel insights into the therapeutic avenues of comorbid genetic disorders.

Methods

Patient recruitment & EMR abstraction

Patients diagnosed with DS were recruited through CAG at CHOP. Diagnoses were based on International Classification of Diseases (ICD) codes ICD-9/ICD-10. All CAG patients were captured by the EMR at CHOP, established in 2003. CAG at CHOP maintains a de-identified abstraction of all clinical data from the CHOP EMR databases of research consented patients. This database contains longitudinal information about visits, diagnoses, medical history, prescriptions, procedures, and lab tests, with all information coded and de-identified. More details are provided in the Supplementary Methods file.

A total of 1,134 whole-genome sequencing (WGS) datasets were generated from blood DNA samples of participating individuals (Fig. 1 A). Of those, 709 were DS patients and 425 healthy individuals who are family members of the probands. Among the 709 cases, 20 DS patients had a history of brain hemorrhage, and 83 DS patients had confirmed severe epilepsy based on EMR records from neurologists. Controls were first-degree relatives (parents and/or siblings) of the cases, and these relationships were confirmed by identity-by-descent (IBD) analysis, providing a family-based framework that mitigates confounding by ancestry and shared environment. To minimize the influence of any single family, we required retain variants to recur in at least three independent DS patients and to be absent in family controls; as expected, leave-one-family-out tests did not alter the retained variant set or clustering results. All WGS libraries were prepared and sequenced at a single center by the same technical team using a uniform protocol, reducing batch heterogeneity. Principal component analysis (PCA) of common variants confirmed no clustering by technical variables, and the inclusion of ancestry PCs as covariates did not materially change the results.

All patients were recruited during regular hospital visits, including emergency rooms, ambulatory settings, or surgical settings, through general pediatric clinics or CHOP’s pediatric specialty practices. The patients were in the age range of 0–21 years, receiving health care at CHOP. Parental consent was obtained for individuals under 18 years of age, and assent was also obtained for subjects aged 7–17 years. The informed consent allowed samples to be obtained and analyzed using genomic technologies in this study to address the proposed research questions.

Whole genome sequencing (WGS) & RNA sequencing data processing and variant detection

The flow chart of processing data was shown as Fig. 1C. WGS was conducted at 30X coverage for 1,134 individuals, integral to the Gabriella Miller Kids First project. The variant call format (VCF) files for WGS were generated using the Illumina DRAGEN (Dynamic Read Analysis for GENomics) Bio-IT Platform (Illumina, San Diego, CA), aligned to the GRCh38/hg38 human genome assembly. Variant annotations were produced using the ANNOVAR software developed by our group with default parameters [12]. Variants were then classified into coding regions (encompassing nonsynonymous and synonymous variants), introns, 5’ untranslated regions (UTR), and non-coding RNA (ncRNA) regions. The distances of intronic variants to the closest exon sites were calculated based on the GRCh38/hg38 template, and ncRNA targets were determined using LncTarD version 2.0 [13].

Peripheral blood mononuclear cells (PBMCs) were isolated from whole blood by Ficoll density gradient centrifugation. Total RNA was extracted from PBMCs using the Maxwell RSC simply RNA Cells Kit (Promega, cat#AS1390) following manufacturer’s protocol. Final RNA-seq libraries were quantified with the Quant-iT dsDNA High Sensitivity assay, and insert size was evaluated using HS NGS Fragment Kit (Agilent, cat#DNF474). Technical controls (K-562; Thermo Fisher Scientific, cat# AM7832) were included to monitor batch consistency. Final libraries were normalized to 10nM and sequenced on the Illumina NovaSeq 6000 platform. Base calls were demultiplexed to produce unaligned BAM files using Picard, then converted to FASTQ format using Samtools [14]. Sequences were aligned to GRCh38 reference transcriptome (GENCODE v39) using STAR [15], transcript-level quantification was performed using Salmon [16], and differentia expression tests were performed using DESeq2 [17].

Clustering genomic variants using factor analysis

A variant is considered to be recurrent if it resides at the same genomic locus with the same alternative allele and occurs in more than one individual. The goal is to select the genomic variants from WGS to form the feature vectors that represent the genomic signatures of DS patients. Practically, we want to eliminate less significant variants and form the feature vectors by merging variants based on their biological/technical properties to avoid technical issues, such as overfitting. First, to mitigate technical artifacts and person-specific calls while preserving true signals, candidate variants were required to recur in ≥ 3 independent individuals. We chose a recurrence filter of occurrence ≥ 3 independent individuals for three reasons. First, short-read WGS variant calls, especially in low-complexity regions and for small indels, are susceptible to artifacts. Requiring that a site be observed in three or more individuals is a practical way to down-weight variants that are more likely to reflect caller/alignment noise rather than true biology; secondly, it helps to avoid personalized genomic variants and exclusions for occurrence = 2 because second- or third-degree relatedness is not consistently captured in the EMR, unrecorded kinship could lead to family-specific genomic variants. As a result, setting the threshold at ≥ 3 helps prioritize signals seen in independent individuals. Meanwhile, we also want to balance the sensitivity and specificity, we did not raise the bar to ≥ 4 or ≥ 5 to avoid discarding plausible, phenotype-consistent signals that recur in a modest number of unrelated individuals. In practice, the threshold stringent enough to curb likely artifacts and family duplicates, yet permissive enough to retain true positives. The variants’ loading matrix was constructed for chromosome 1 to 22, respectively, with rows representing patients, columns representing genes containing at least one recurrent variant, and entries denoting the occurrences of specific genes in the corresponding patients. Each column was treated as a feature vector, and variants with high correlation coefficients (Pearson correlation coefficient ≥ 0.5) were merged and the entry values were summed based on occurrences. In the factor analysis step, Kaiser-Meyer-Olkin (KMO) tests were performed to check the matrix adequacy, and any columns that have KMO results lower than 0.4 were removed because variables with low item-level KMO values were removed prior to factor analysis because they showed insufficient shared variance with the rest of the feature set, indicating poor suitability for a common-factor model. Excluding these variables reduces noise, improves factor stability, and yields more interpretable latent genomic modules for downstream association with brain hemorrhage in DS. The factor analysis package from Python was deployed with the rotation function ‘promax’, and any factors with eigenvalue greater than 1 were kept for further downstream analysis using WebGestalt (WEB-based Gene SeT AnaLysis Toolkit) [18] and the DAVID Bioinformatics platform [19], over-representation analyses were conducted using WebGestalt with background = “genome” and FDR (Benjamini–Hochberg) correction, and using DAVID with background = Homo sapiens (human genome) and Bonferroni correction.

Association analysis of genomic clusters and clinical phenotype from EMR

Disorder relevant phenotypes were extracted from EMR with corresponding patient IDs, encompassing 443 self-reported medical symptoms, 2,206 abnormal lab tests, and 3,499 clinical diagnoses across 10 major pediatric disease categories (congenital defects, neoplasm, mental disorders, respiratory system diseases, digestive system diseases, muscular system diseases, endocrine/metabolic disorders, blood diseases, nervous system diseases, and circulatory system diseases, Fig. 1B). Moreover, patient IDs of each gene cluster from factor analysis were also identified, and the association tests were performed between feature vectors (the factors) generated from genomic variant matrix and each phenotype in the EMR, including all self-reported medical symptoms, lab test reports, and clinical diagnosis. If the p value is ≤ 0.05 from the tests, we consider the gene clusters to be associated with the phenotype.

Checking for compounding impacts & effect sizes

For each primary association we report odds ratios (OR) with 95% confidence intervals (CI) and p-values together with FDR-adjusted q-values, where informative we also provide risk differences (RD) with 95% CIs. Enrichment/overlap analyses include fold enrichment, the corresponding 2 × 2 OR with 95% CI, and exact Fisher p. To contextualize null or borderline results, we computed the minimum OR detectable at 80% and 90% power (two-sided α = 0.05) for plausible factor+/factor − allocations (e.g., 50/50, 40/60, 30/70) using the observed cohort size. These thresholds indicate which effect magnitudes our design can reliably detect, particularly for rare hemorrhage events and moderate epilepsy prevalence. To evaluate potential confounding by demographics, we fit logistic regression models within the DS cohort using age, sex (male vs. female), and race (indicator variables; White as reference) as covariates. We report adjusted ORs with 95% CIs and p-values; for interpretability we summarize age effects per 10 years. Complete-case analyses were used for these models.

Permutation-based empirical null for factor–phenotype enrichment

For each factor and phenotype, we calculated the observed overlap and hypergeometric p-value. To assess robustness to distributional artifacts, we generated an empirical null by randomly permuting phenotype labels (brain bleeding, epilepsy) while preserving cohort size, phenotype positives, and factor sizes fixed, repeating this 1,000 times. For factor f, the empirical p-value was computed as

Results

Gene clusters associated with brain hemorrhage and severe epilepsy

The number of genes selected from unsupervised learning processes is not uniformly distributed across chromosomes, and the chromosome sizes are not necessarily proportional to genes associated with targeted phenotypes. More specifically, exonic variants associated with brain hemorrhage tend to localize in chromosomes 2, 3, 9, 16, and 17, as illustrated in Fig. 2A. Conversely, variants located in non-coding regions demonstrate distinct patterns compared to variants within coding regions (Fig. 2B and D). Notably, the number of genes associated with epilepsy is comparatively fewer than those associated with brain hemorrhage, and an interesting observation is the consistent numerical factor patterns for coding region variants in both brain hemorrhage and epilepsy (Fig. 2 A), contrasting with completely different patterns observed in non-coding region variants (Fig. 2B and D). These findings suggest similar gene clusters but alternative functional regulatory mechanisms between these clinical phenotypes.

Fig. 2 — Number of corresponding genes and factor numbers from unsupervised machine learning selections A based on variants in exons; B based on variants in introns; C based on variants in ncRNA exonic regions; D based on variants in 5’ UTR regions

Within the DS analytic set, covariate-adjusted logistic regression showed that sex was associated with hemorrhage (male vs. female OR 0.26, 95% CI 0.09–0.74, p = 0.011), whereas age (per 10 years OR ≈ 0.56, 95% CI 0.28–1.11, p = 0.097) and race were not significant. For epilepsy, none of age, sex, or race reached statistical significance (age per 10 years OR ≈ 0.87, 95% CI 0.62–1.21, p = 0.40). Together with the reported effect sizes, CIs, and detectable-effect thresholds, these findings indicate that the principal associations are not driven by demographic confounding, and they delineate the range of effects that the study is powered to detect. Meanwhile,, we used permutation testing to evaluate whether each factor was associated with brain hemorrhage beyond what would be expected by chance. Because we tested many factors, we applied the Benjamini–Hochberg false discovery rate (FDR) procedure to control for false positives across all tests. In this dataset, none of the factors remained statistically significant after this correction (no factor met q < 0.05 under the empirical null; ranked empirical p-values and q-values are reported in Supplementary Table 1.

Functional enrichment of gene clusters for brain hemorrhage and severe epilepsy associated with alzheimer traits

Significant statistical overlap was uncovered among the selected genes associated with both brain hemorrhage and severe epilepsy, as illustrated in Fig. 3 A, regardless of variant location. These overlaps persist across all types of variants, with p-values of < 1E-100, 9.2E-23, 1.3E-9, and 1.2E-28 corresponding to combined variants, exonic, intronic, and 5’ UTR, respectively. These overlaps suggest there is functional relationship between the pathways involved. As shown in Fig. 3B C, genes implemented for these distinct clinical phenotypes are enriched in gene sets associated with Alzheimer’s traits and may serve as early-stage signatures for AD.

Fig. 3 — Overlap & functional enrichments for selected genes in brain hemorrhage and severe epilepsy. A overlaps of corresponding genes selected between the two phenotypes; B functional enrichment analysis bar chart for brain hemorrhage on a -log 10 (FDR) scale; C functional enrichment analysis bar chart for epilepsy on a -log 10 (FDR) scale

Discussion

Genome wide association studies often involve numerous assumptions [7]. On the genotype front, these studies typically rely on selecting variants based on existing knowledge and experiences, potentially overlooking variants in non-coding regions such as introns, non-coding RNAs, and UTRs. However, these assumptions for genotype data may be valuable because they incorporate insights from previous research and help to avoid overfitting issues on technical aspects. At the same time, the reliance on previous knowledge of genomic variants could be misleading, resulting in biased outputs with reduced novelty. On the phenotype aspect, individual studies often focus on a specific disease trait, ignoring the important linkage between multiple phenotypes that may be needed to fully understand the disease. This assumption is not always valid, especially for pediatric disorders among children with genetic congenital defects, such as DS. In these cases, the assumption of disease independence may not hold, underscoring the importance of considering broader disease connections and interactions, especially in pediatric populations with complex genetic conditions.

To address the above deficiencies, we leveraged one of the largest WGS DS repositories for genetic variant analysis. Furthermore, we extracted detailed longitudinal information from the EMR of all patients. By applying unsupervised machine learning algorithms and integrating multiple pre-processing steps, we clustered the genomic variants derived from the WGS data. Then, a pure data-driven statistical association analysis between the genotype clusters and all possible phenotypes in the EMR was conducted to provide unbiased association output, avoiding preconceived assumptions and thereby facilitating a comprehensive exploration of genotype-phenotype relationships in the DS individuals. In contrast to single-phenotype GWAS and mostly supervised models, we took a discovery route. Our aim was to find structure that links genome-wide variation to the day-to-day clinical course of DS, not just to predict labels. We built an unsupervised, end-to-end genome–EMR framework: starting from recurrent DS-specific variants, we derived genotype factors and tested them against thousands of routine EMR phenotypes with appropriate multiple-testing control. Applied to brain hemorrhage and severe epilepsy, which are two early, clinically meaningful signals in DS, the framework recapitulated and extended AD-related mechanisms (e.g., neovascularization, inflammation, olfactory and kinase pathways) using real-world data, without requiring imaging or CSF.

The validity of this concept has been substantiated through the linkage of clinical disorders among individuals with DS, including brain hemorrhage, severe epilepsy, and AD. 40–80% of individuals with DS develop AD-like dementia by the fifth to sixth decade of life, a much younger age than is typically seen in sporadic AD. This has been attributed to the extra copy of the APP gene, which overproduces amyloid in the brain, leading to cerebral amyloid angiopathy (CAA) [1]. A recent study underscores that AD should be recognized as a critical medical priority in people with DS [20]. Moreover, increasing life expectancy and closing the gap with the general population for DS patients will require effective prevention and management of AD [21]. Intriguingly, brain hemorrhage in DS patients is caused by the so-called ‘gene dosage effect’ [22]. Notably, even micro-bleeding was significantly associated with CAA and AD in DS patients [23]. These findings suggest that brain hemorrhage may serve as a potential indicator for the early detection of AD, particularly given that individuals with DS can develop plaques as early as 12 years of age. Such insights shed light on the underlying mechanisms of CAA formation and underscore the potential of brain hemorrhage as a window into the pathophysiology of AD in DS individuals. To be noticed, these links are based on observational associations and cannot establish that the implicated variants or pathways cause brain hemorrhage or AD-related neurodegeneration in DS. The observed overlap may partly reflect shared downstream consequences of trisomy 21 and residual confounding by vascular risk factors, treatment exposures, survival bias, and differences in clinical surveillance intensity.

Based on the selection model and ICD codes identified in the EMR, 455 genes, including non-coding RNAs, were identified with exonic variants. Enrichment of a gene set related to cerebral (including subarachnoid) hemorrhage (FDR = 0.04, Fig. 3B) was identified, including eight genes (ADAMTS15, AVP, DIP2C, PIGQ, SERPINA3, TNC, ZNF618, ZNF79). Additional gene sets significantly associated with brain hemorrhage include fibromuscular dysplasia (FDR = 0.031) and pathologic neovascularization (FDR = 0.003). The prevalence of intracranial aneurysm was 12.9% in fibromuscular dysplasia [24] and neovascularization following brain hemorrhage is an essential compensatory response that aims at brain repair, modulating the clinical outcome of stroke patients [25].

In addition to pathways related to brain hemorrhages, the corresponding genes with selected variants are also significantly enriched in gene sets related to different AD mechanisms (Table 1). For example, genes associated with brain hemorrhage are found to be enriched in the gene set related to olfactory impairments (FDR = 0.0096), while relevant symptoms often appear in the early phase of AD before cognitive impairment becomes apparent in patients [26]. Another enriched gene set is inflammation (FDR = 0.031), which is a central mechanism in Alzheimer’s disease [27]. Brain hemorrhage genes were also targeted by several key AD kinases, including G protein-coupled receptor kinase 3 (FDR = 0.0059), which associated with AD pathology [28], and casein kinase 1 gamma/epsilon (FDR = 0.0065), therapeutic targets of AD in a mouse model [29], while inhibition of casein kinase 1 gamma improves cognitive-affective behavior of AD [30]. Genomic variants in non-coding regions identified in brain hemorrhage of DS patients also show functions correlated with AD. For instance, LINC01551 that has 59 exon mutations (exon 4/4 or 2/2) targets the gene ADAM10, a biomarker and therapeutic target for AD [31]. For variants in 5’ UTR regions, their corresponding genes (ADCY4, CREBBP, DCT, EDNRB, MAPK3, TCF7L1, TCF7L2) were enriched in melanogenesis (FDR = 0.019), while the loss of melanin and/or neuromelanin is increasingly considered to have infectious etiologies as AD [32].

There is a bi-directional association between AD and epilepsy because epilepsy promotes amyloid deposits, leading to neurodegenerative processes, while AD is an independent risk factor for developing epilepsy [33]. In the DS population, a strong association between seizures and cognitive decline was observed in a previous study [34]. In this study, we found significant overlap between selected genes in brain hemorrhage and patients with severe epilepsy (Fig. 2), regardless of the variant location in the genome, indicating strong connections within the underlying biological networks of these distinct clinical diagnoses, potentially pointing to a highly correlated AD phenotype in the later life of DS patients. Five epilepsy related genes (CBX4, IP6K1, MINAR1, SYN2, TENT4A) were targeted by microRNAs MIR-503 and MIR-410 (FDR = 0.1). MIR-503 was highly regulated (fold change > 1.5) in brain tissues from a previous RNA-seq study of brain hemorrhage [35]. The 217 overlapping genes between brain hemorrhage and epilepsy suggest an enrichment in the somatic diversification of immune receptors (FDR = 0.09), with previously identified mutations associated with brain hemorrhage [36], epilepsy [37], and AD [38], further supporting the close and integrative relationships of these conditions.

To provide replication and assess robustness, we performed whole-transcriptome RNA-seqfrom human PBMCs in a subset of the discovery cohort: 6 DS individuals with a documented history of brain hemorrhage and at least one epilepsy event and 6 DS controls without hemorrhage or epilepsy, all drawn from the WGS cohort analyzed here. Differential expression testing of the a priori candidate genes implicated by the genomic–EMR analyses in Table 1 confirmed 14 genes at FDR q < 0.10, with effect directions concordant with discovery signals (Table 2). These results support that the highlighted genes and pathways reflect disease-relevant biology rather than artifacts of variant calling or clustering, and they provide transcriptomic replication within the same clinical framework.

Table 1.

Functional enrichment pathways/terms identified in brain hemorrhage and epilepsy and their relevance to AD

	Associated phenotype	Mutations	Functional pathways/terms	Relevance	Gene List
Relevant to Brain hemorrhage	Brain hemorrhage	Exonic	Subarachnoid Hemorrhage	bleeding in the space between your brain and the membrane that covers it. Most often due to aneurysm	ADAMTS15, AVP, DIP2C, PIGQ, SERPINA3, TNC, ZNF618, ZNF79
Relevant to Brain hemorrhage	Brain hemorrhage	nonsynonymous	Fibromuscular Dysplasia	the prevalence of intracranial aneurysm was 12.9% in FD	DYNC2H1, OBSCN, RNF213, SERPINA1
Relevant to Brain hemorrhage	Brain hemorrhage	synonymous	Kinase target casein kinase 1 gamma/ epsilon	Inhibition of casein kinase 1 improves cognitive-affective behavior in AD and serve as therapeutic target	GLI1, GLI2, GLI3, PRKD2
Relevant to Brain hemorrhage	Brain hemorrhage	5’ UTR	Melanogenesis	body pigmentation as a risk factor for the formation of intracranial aneurysms	ADCY4, CREBBP, DCT, EDNRB, MAPK3, TCF7L1, TCF7L2
Relevant to Brain hemorrhage	Brain hemorrhage	Combined	Neovascularization	an essential compensatory response that mediates brain repair and modulates the clinical outcome of stroke patients	AGXT, CALCR, CASR, DGKH, FAM20A, GRHPR, HAVCR1, INMT, KL, ORAI1, SPP1
Relevant to Brain hemorrhage	Epilepsy	synonymous	MIR-503 targets	MIR-503 regulated (fold change > 1.5) in brain tissues for brain hemorrhage	CBX4, IP6K1, MINAR1, SYN2, TENT4A
Relevant to Epilepsy	Epilepsy	Combined	Protein Deficiency	Amino acid balance plays important roles in dietary treatments for epilepsy	ABCA1, AGXT, AMPD1, ATR, CUBN, DNM1L, DOCK2, DOCK8, DYSF, F11, F7, FBXW7, FCN2, FKRP, JAK3, KL, LAMA2, LIPA, MLH1, MLYCD, ORAI1, PML, PNP, PTEN, SERPINA1, SLC35C1, SPTAN1, SUCNR1, TP53BP1, TRAF3, UVSSA
Relevant to Epilepsy	Epilepsy	Combined	Stress	chronic stress promotes neuroinflammation and leads to a depressive state and promotes seizure occurrence	ATP5F1A, ATR, CASP2, CHFR, CLPP, CYP2E1, DIAPH1, DNM1L, EIF2AK3, EIF4G1, EP300, ERO1A, HIF1A, HMOX2, HSPD1, HSPH1, KL, KRT8, MTOR, NPM1, PIK3C3, PML, RELA, RPL21, RXFP3, SETD7, SLC6A4, SLC7A11, STK4, SYNPO2, TP53BP1, TUT1, UBQLN1, USP10
Relevant to Epilepsy	Epilepsy	Combined	Leukoencephalopathy with vanishing white matter (VWM)	neuroimaging studies confirm involvement of white matter in patients with epilepsy	EIF2B3, EIF2B5
Relevant to Epilepsy	Epilepsy	Exonic	ESCRT	the cup-shaped PAS membrane encloses a portion of the cytoplasm or damaged organelles through the endosomal sorting complex required for transport	CHMP6, CHMP7, HGS, VPS4B
Relevant to Epilepsy	Epilepsy	synonymous	MIR-410 targets	MIR-410 expanded targets for seizure suppression in temporal lobe epilepsy	CBX4, IP6K1, MINAR1, SYN2, TENT4A
Relevant to Epilepsy	Epilepsy	Intronic	Kinase target mitogen-activated protein kinase 3	mTOR and MAPK control to epilepsy from localized translation	REM, DNM1L, ETV6, IL2RB, KRT8, RPS6KA2, TTK, WWC1
Indicator of AD	Brain hemorrhage	nonsynonymous	Olfactory impairments	often appear in the early phase of AD, before cognitive impairment is apparent in patients	OR10J1, OR10J3, OR2A2, OR2AG2, OR2D3, OR2T33, OR2T34, OR4C11, OR4C3, OR4D2, OR4S2, OR51B4, OR52E4, OR52R1, OR5AR1, OR5AS1, OR5L1, OR5L2, OR5M11, OR5M8, OR6P1, OR8B3, OR8K1
Indicator of AD	Brain hemorrhage	nonsynonymous	Inflammation	Inflammation as a central mechanism in AD	CEACAM8, CHIA, F3, HPSE, IKBKB, IL1R2, IL20, IL5RA, ITGB2, MASP2, NGF, NLRC3, NLRP10, NLRP13, NOS2, PRG4, SERPINA1, SERPINA12, SERPINA4, SPP1, TICAM1, TLR5, TREM1, TREM2
Indicator of AD	Brain hemorrhage	intronic	Autism Spectrum Disorder	middle-aged adults with autism are 2.6 times more likely to be diagnosed with AD and other dementias than those without ASD	ARNT2, DRD4, GLO1, MET, RELN, RIT2, SCN2A, SLC7A5, SLC9A9, ST7, TPH2
Indicator of AD	Brain hemorrhage	5’ UTR	Melanogenesis	Loss of melanin and/or neuromelanin are increasingly thought to have infectious etiologies like AD	ADCY4, CREBBP, DCT, EDNRB, MAPK3, TCF7L1, TCF7L2
Indicator of AD	Overlaps of Brain hemorrhage & Epilepsy	Combined	Somatic diversification of immune receptors	Triggering innate immune receptors is the new therapies in AD	CCR6, EXOSC3, HSPD1, MLH1, POLM, TP53BP1
Indicator of AD	Epilepsy	Intronic	Neurodegenerative Diseases	AD is the most common type of neurodegenerative diseases cause of a decline in cognitive ability	CTNNA3, DMPK, DNM1L, EIF2B5, LIPA, LZTR1, RIT2, SACS, UVSSA, WWC1

Open in a new tab

Table 2.

Differential expression testing of candidate genes

Gene ID	Relevant Phenotypes	log2FoldChange	adjusted p value (FDR)
IL20	Indicator of AD	−2.70	1.449E-05
TRAF3	epilepsy	−1.49	3.368E-03
SLC9A9	Indicator of AD	−3.69	4.618E-03
DNM1L	Indicator of AD	−1.49	8.897E-03
GLI2	brainbleeding	1.57	2.360E-02
ITGB2	Indicator of AD	−3.14	2.527E-02
MTOR	epilepsy	0.68	3.905E-02
SPP1	Indicator of AD	−1.21	4.514E-02
DOCK2	epilepsy	−2.86	5.215E-02
MET	Indicator of AD	0.95	7.516E-02
IL2RB	epilepsy	−1.14	8.057E-02
JAK3	epilepsy	−0.89	8.975E-02
TREM2	Indicator of AD	−1.15	9.685E-02
HMOX2	epilepsy	0.93	9.856E-02
EXOSC3	Indicator of AD	0.72	1.234E-01
RELA	epilepsy	−1.00	1.238E-01
SERPINA12	Indicator of AD	0.65	1.256E-01
DYSF	epilepsy	0.60	1.511E-01
SLC6A4	epilepsy	0.55	1.574E-01
NGF	Indicator of AD	−0.97	2.262E-01
F11	epilepsy	0.72	2.423E-01
CASP2	epilepsy	−1.20	2.608E-01
FBXW7	epilepsy	−1.29	2.758E-01
UVSSA	Indicator of AD	−0.57	2.939E-01
STK4	epilepsy	1.50	3.584E-01
CREBBP	Indicator of AD	0.42	3.622E-01
MLH1	Indicator of AD	0.43	3.822E-01
TPH2	Indicator of AD	−0.49	3.925E-01
OR2AG2	Indicator of AD	0.64	4.094E-01
DRD4	Indicator of AD	0.97	4.477E-01
NLRP13	Indicator of AD	−0.55	4.726E-01
INMT	brainbleeding	1.02	4.921E-01
DIP2C	brainbleeding	−0.81	5.001E-01
IP6K1	epilepsy	0.37	5.422E-01
HPSE	Indicator of AD	0.35	5.437E-01
TUT1	epilepsy	0.37	5.452E-01
DOCK8	epilepsy	−0.30	5.463E-01
MASP2	Indicator of AD	−1.20	5.536E-01
USP10	epilepsy	0.28	5.655E-01
TICAM1	Indicator of AD	0.47	5.917E-01
CASR	brainbleeding	0.37	5.923E-01
MAPK3	Indicator of AD	−0.34	6.101E-01
DGKH	brainbleeding	−0.44	6.119E-01
SLC7A5	Indicator of AD	0.51	6.198E-01
EDNRB	Indicator of AD	−0.37	6.666E-01
GRHPR	brainbleeding	0.25	6.754E-01
TCF7L1	Indicator of AD	0.27	6.890E-01
ARNT2	Indicator of AD	−0.48	6.991E-01
SPTAN1	epilepsy	0.39	7.194E-01
POLM	Indicator of AD	0.37	7.242E-01
TNC	brainbleeding	−0.19	7.399E-01
IKBKB	Indicator of AD	0.96	7.703E-01
CLPP	epilepsy	0.27	7.835E-01
ETV6	epilepsy	−0.31	7.964E-01
CHFR	epilepsy	0.79	7.999E-01
CEACAM8	Indicator of AD	0.19	8.021E-01
RELN	Indicator of AD	−0.46	8.061E-01
RPS6KA2	epilepsy	−0.25	8.067E-01
RNF213	brainbleeding	0.23	8.141E-01
TP53BP1	Indicator of AD	0.23	8.380E-01
CUBN	epilepsy	−0.11	9.038E-01
SERPINA1	Indicator of AD	0.24	9.185E-01
MLYCD	epilepsy	0.16	9.314E-01
IL5RA	Indicator of AD	−0.23	9.415E-01
PIGQ	brainbleeding	0.06	9.415E-01
SERPINA4	Indicator of AD	−0.06	9.468E-01
DYNC2H1	brainbleeding	−0.11	9.516E-01
OBSCN	brainbleeding	0.02	9.766E-01
CBX4	epilepsy	−0.02	9.931E-01

Open in a new tab

Proteomic and metabolomic datasets matched to DS individuals with early cerebrovascular events or epilepsy are not currently available, whereas PBMC RNA-seq from the same patients provides a feasible orthogonal layer for functional validation of genomic findings. Our findings suggest a feasible path from discovery to practice in DS. Because factor scores are derived from routine EMR data and linked to genome-wide variation, they can be rendered as computable markers that flag individuals for closer screening and longitudinal monitoring of cognition, seizures, and cerebrovascular events. In a clinical setting, these scores could support structured care bundles that are already compatible with standard DS care, such as optimized seizure management plans, proactive vascular risk evaluation, and scheduled assessments for sleep and thyroid function, while avoiding reliance on specialized imaging or CSF tests. For therapeutic development, factor-enriched cohorts provide a principled way to stratify trials and to test pathway-informed interventions grounded in the gene sets highlighted by our analysis (for example, inflammation or neovascularization pathways). Any deployment would require prospective validation, clear decision thresholds tied to net clinical benefit, evaluation of fairness across ancestry and site, and integration with clinician workflows and caregiver communication. Framed this way, the genome–EMR factors serve not as stand-alone diagnostics, but as risk navigators that can sharpen screening, guide early interventions, and help design targeted trials in DS. From a translational perspective, implementing factor-based risk scores at scale will also require harmonization of EMR infrastructure, robust performance across health systems and ancestries that differ from our discovery cohort, and careful governance around data privacy and communication of probabilistic risk to families. Looking ahead, any clinical application of these genotype–phenotype factors will require rigorous prospective validation. This includes replication in independent DS cohorts, longitudinal studies to evaluate predictive performance for future seizures, hemorrhagic events, or cognitive decline, and the establishment of clinically actionable thresholds linked to measurable benefit. Validation across ancestries, healthcare settings, and EMR systems, together with assessment of calibration and fairness, will also be essential before integration into practice. Accordingly, these factors should be viewed as hypothesis-generating markers rather than clinical tools until such studies are completed. The practical challenges, together with the need for external validation and health-economic evaluation, mean that our work should be viewed as an early step toward clinical application rather than a ready-to-use precision-medicine tool.

Factor analysis is central to our approach, and it comes with trade-offs. We use variant-based factors to summarize high-dimensional genomic variation into interpretable constructs with subject-level scores that can then be linked to multiple EMR phenotypes. This design matches our scientific aim to explain shared molecular structure across comorbidities, whereas patient-level clustering is better suited to grouping individuals rather than clarifying what the variables represent. At the same time, factor models assume approximately linear latent structure, depending on the choice of factor number and rotation, and can be sensitive to preprocessing. Our layered filters and QC steps, including gene-level aggregation, locus stratification, and chromosome-level separation, were selected to control noise and reduce overfitting in a cohort of about one thousand participants. These steps may down-weight long-range and cross-chromosomal interactions. EMR-derived diagnoses and procedure codes are susceptible to misclassification, under-ascertainment, and site- or era-specific practice patterns, so some of the observed clusters may partly mirror documentation or referral biases rather than underlying biology. It is also possible that the shared factors linking hemorrhage, epilepsy, and AD reflect broader frailty or care-intensity phenotypes in DS, and we cannot fully exclude these alternative explanations in the present design. On the other hand, the core biological signals, particularly those linked to brain hemorrhage and epilepsy, remain consistent. Taken together, this supports factor analysis as a pragmatic and interpretable choice for discovering genome–phenotype structure in DS.

Conclusion

In this study, we’ve made a significant advance in understanding the intricate relationship between genetic variations and clinical comorbidities in DS, focusing on brain hemorrhage and epilepsy as early indicators of AD. By integrating whole-genome variants with detailed clinical data from EMR, employing an unbiased, data-driven approach, we have uncovered novel insights into the molecular underpinnings and potential impacts of these conditions. This innovative analysis elucidates how specific genomic variants may underlie complex comorbidities, challenging traditional views on disease independence and highlighting new pathways for early detection and intervention of DS comorbidities. Our findings not only offer a new perspective on the genetic landscape of DS comorbidities but also pave the way for developing targeted therapeutic interventions. By illuminating the genetic contributors to disease comorbidities in DS, this study opens promising avenues for future research and potential treatments, aiming to improve the lives of individuals with DS by addressing the complex interplay of genetics and disease manifestation in a comprehensive manner. Importantly, this work underscores the critical need for a paradigm shift in how we approach the complex phenotypes of genetic disorders, moving beyond single-gene/single-phenotype analyses to embrace the multifactorial and intertwined nature of disease comorbidities. In doing so, it sets a new benchmark for future studies in the field, potentially revolutionizing our approach to using genetic research to inform patient care. Meanwhile, effective sample sizes within the DS subgroup are limited, especially for small DS brain hemorrhage number due to natural low occurrences of the cerebrovascular event [39, 40], clinical presentations are heterogeneous, and EMR coding can introduce misclassification and ambiguity. These factors constrain precision and discourage causal inference. Accordingly, the associations serve as exploratory and hypothesis-generating rather than definitive, and further replication in larger DS cohorts with deeper clinical adjudication are necessary.

Supplementary Information

Supplementary material 1.^{(38.8KB, csv)}

Acknowledgements

Acknowledgement Sample collection and biobanking for this study was supported by the Institutional Development Funds from the Children’s Hospital of Philadelphia (CHOP) to the Center for Applied Genomics (CAG) and CHOP´s Endowed Chair in Genomic Research (CAG). The sequencing data was provided through the Gabriella Miller Kids First Pediatric Research Program consortium (Kids First), supported by the Common Fund of the Office of the Director of the National Institutes of Health (www.commonfund.nih.gov/KidsFirst), awarded to CAG. We express our gratitude to the CAG staff and the patients and families who generously contributed biosamples to CAG/CHOP.

Author contributions

Conceptualization and supervision, Y.L. and H.H; literature search, Y.L.; data preparation & analysis, Y.L., H.Q.Q., C.X., F.D.M., H.Q., K.N., K.O., G.E; data interpretation, Y.L., H.Q.Q., C.X., J.G., and H.H.; original draft writing, Y.L.; review and revision led by Y.L., H.Q.Q. and H.H, with all authors contributing and approving the manuscript.

Funding

The study was supported by the Institutional Development Funds from the Children’s Hospital of Philadelphia to the Center for Applied Genomics, and The Children’s Hospital of Philadelphia Endowed Chair in Genomic Research to HH.

Data availability

The KidFirst data could be accessed at the Kids First Data Resource Portal (DRC, https://portal.kidsfirstdrc.org/login). The essential core Python codes for the modeling is available at [https://github.com/Edward0012/essential-Python-code-for-Down-Syndrome-Brain-Bleeding-Factor-Analysis](https:/github.com/Edward0012/essential-Python-code-for-Down-Syndrome-Brain-Bleeding-Factor-Analysis).

Declarations

Ethics approval and consent to participate

We confirm that all methods were carried out in accordance with relevant regulatory guidelines and regulations. All experimental protocols were approved by the Institutional Review Board (IRB) of the Children’s Hospital of Philadelphia (CHOP) with the IRB number: IRB 16-013278. Informed consent was obtained from all subjects. If subjects are under 18, consent was obtained from a parent and/or legal guardian with assent from the child if 7 years or older.

Conflict of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Yichuan Liu, liuy5@email.chop.edu.

Hakon Hakonarson, hakonarson@email.chop.edu.

References

1.Salehi A, Ashford JW, Mufson EJ. The link between alzheimer’s disease and down Syndrome. A historical perspective. Curr Alzheimer Res. 2016;13(1):2–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Kirmani BF, Shapiro LA, Shetty AK. Neurological and neurodegenerative disorders: novel concepts and treatment. Aging Dis. 2021;12(4):950–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Bracher-Smith M, Melograna F, Ulm B, Bellenguez C, Grenier-Boley B, Duroux D, et al. Machine learning in alzheimer’s disease genetics. Nat Commun. 2025;16(1):6726. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Monk B, Rajkovic A, Petrus S, Rajkovic A, Gaasterland T, Malinow R. A machine learning method to identify genetic variants potentially associated with alzheimer’s disease. Front Genet. 2021;12:647436. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Muiño E, Carcel-Marquez J, Llucià-Carol L, Gallego-Fabrega C, Cullell N, Lledós M, et al. Identification of genetic loci associated with intracerebral hemorrhage using a Multitrait analysis approach. Neurology. 2024;103(8):e209666. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Wahab KW, Tiwari HK, Ovbiagele B, Sarfo F, Akinyemi R, Traylor M, et al. Genetic risk of spontaneous intracerebral hemorrhage: systematic review and future directions. J Neurol Sci. 2019;407:116526. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019;20(8):467–84. [DOI] [PubMed] [Google Scholar]
8.Mishra A, Malik R, Hachiya T, Jürgenson T, Namba S, Posner DC, et al. Stroke genetics informs drug discovery and risk prediction across ancestries. Nature. 2022;611(7934):115–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Auer PL, Lettre G. Rare variant association studies: considerations, challenges and opportunities. Genome Med. 2015;7(1):16. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Bonazzola R, Ferrante E, Ravikumar N, Xia Y, Keavney B, Plein S, et al. Unsupervised ensemble-based phenotyping enhances discoverability of genes related to left-ventricular morphology. Nat Mach Intell. 2024;6(3):291–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Sieliwonczyk E, Sau A, Patlatzoglou K, McGurk KA, Pastika L, Thami PK, et al. Unsupervised feature extraction using deep learning empowers discovery of genetic determinants of the electrocardiogram. Genome Med. 2025;17(1):118. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Zhao H, Shi J, Zhang Y, Xie A, Yu L, Zhang C, et al. LncTarD: a manually-curated database of experimentally-supported functional lncRNA-target regulations in human diseases. Nucleic Acids Res. 2020;48(D1):D118–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO et al. Twelve years of samtools and BCFtools. Gigascience. 2021;10(2). [DOI] [PMC free article] [PubMed]
15.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14(4):417–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Love MI, Huber W, Anders S. Moderated Estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Liao Y, Wang J, Jaehnig EJ, Shi Z, Zhang B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and apis. Nucleic Acids Res. 2019;47(W1):W199–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022;50(W1):W216–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Tsou AY, Bulova P, Capone G, Chicoine B, Gelaro B, Harville TO, et al. Medical care of adults with down syndrome: A clinical guideline. JAMA. 2020;324(15):1543–56. [DOI] [PubMed] [Google Scholar]
21.Iulita MF, Garzon Chavez D, Klitgaard Christensen M, Valle Tamayo N, Plana-Ripoll O, Rasmussen SA, et al. Association of alzheimer disease with life expectancy in people with down syndrome. JAMA Netw Open. 2022;5(5):e2212910. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Jastrzebski K, Kacperska MJ, Majos A, Grodzka M, Glabinski A. Hemorrhagic stroke, cerebral amyloid angiopathy, down syndrome and the Boston criteria. Neurol Neurochir Pol. 2015;49(3):193–6. [DOI] [PubMed] [Google Scholar]
23.Schoeppe F, Rossi A, Levin J, Reiser M, Stoecklein S, Ertl-Wagner B. Increased cerebral microbleeds and cortical superficial siderosis in pediatric patients with down syndrome. Eur J Paediatr Neurol. 2019;23(1):158–64. [DOI] [PubMed] [Google Scholar]
24.Lather HD, Gornik HL, Olin JW, Gu X, Heidt ST, Kim ESH, et al. Prevalence of intracranial aneurysm in women with fibromuscular dysplasia: A report from the US registry for fibromuscular dysplasia. JAMA Neurol. 2017;74(9):1081–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Rodriguez C, Sobrino T, Agulla J, Bobo-Jimenez V, Ramos-Araque ME, Duarte JJ, et al. Neovascularization and functional recovery after intracerebral hemorrhage is conditioned by the Tp53 Arg72Pro single-nucleotide polymorphism. Cell Death Differ. 2017;24(1):144–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Murphy C. Olfactory and other sensory impairments in alzheimer disease. Nat Rev Neurol. 2019;15(1):11–24. [DOI] [PubMed] [Google Scholar]
27.Kinney JW, Bemiller SM, Murtishaw AS, Leisgang AM, Salazar AM, Lamb BT. Inflammation as a central mechanism in alzheimer’s disease. Alzheimers Dement (N Y). 2018;4:575–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Guimaraes TR, Swanson E, Kofler J, Thathiah A. G protein-coupled receptor kinases are associated with alzheimer’s disease pathology. Neuropathol Appl Neurobiol. 2021;47(7):942–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Adler P, Mayne J, Walker K, Ning Z, Figeys D. Therapeutic targeting of casein kinase 1delta/epsilon in an alzheimer’s disease mouse model. J Proteome Res. 2019;18(9):3383–93. [DOI] [PubMed] [Google Scholar]
30.Sundaram S, Nagaraj S, Mahoney H, Portugues A, Li W, Millsaps K, et al. Inhibition of casein kinase 1delta/epsilonimproves cognitive-affective behavior and reduces amyloid load in the APP-PS1 mouse model of alzheimer’s disease. Sci Rep. 2019;9(1):13743. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Yuan XZ, Sun S, Tan CC, Yu JT, Tan L. The role of ADAM10 in alzheimer’s disease. J Alzheimers Dis. 2017;58(2):303–22. [DOI] [PubMed] [Google Scholar]
32.Berg SZ, Berg J. Melanin: a unifying theory of disease as exemplified by Parkinson’s, Alzheimer’s, and lewy body dementia. Front Immunol. 2023;14:1228530. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Zhang D, Chen S, Xu S, Wu J, Zhuang Y, Cao W, et al. The clinical correlation between alzheimer’s disease and epilepsy. Front Neurol. 2022;13:922535. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Lott IT, Doran E, Nguyen VQ, Tournay A, Movsesyan N, Gillen DL. Down syndrome and dementia: seizures and cognitive decline. J Alzheimers Dis. 2012;29(1):177–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Liu DZ, Tian Y, Ander BP, Xu H, Stamova BS, Zhan X, et al. Brain and blood MicroRNA expression profiling of ischemic stroke, intracerebral hemorrhage, and Kainate seizures. J Cereb Blood Flow Metab. 2010;30(1):92–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Zhang W, Wu Q, Hao S, Chen S. The hallmark and crosstalk of immune cells after intracerebral hemorrhage: immunotherapy perspectives. Front Neurosci. 2022;16:1117999. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Aguilar-Castillo MJ, Cabezudo-Garcia P, Ciano-Petersen NL, Garcia-Martin G, Marin-Gracia M, Estivill-Torrus G et al. Immune mechanism of epileptogenesis and related therapeutic strategies. Biomedicines. 2022;10(3). [DOI] [PMC free article] [PubMed]
38.Piec PA, Pons V, Rivest S. Triggering Innate Immune Receptors as New Therapies in Alzheimer’s Disease and Multiple Sclerosis. Cells. 2021;10(8). [DOI] [PMC free article] [PubMed]
39.Sobey CG, Judkins CP, Sundararajan V, Phan TG, Drummond GR, Srikanth VK. Risk of major cardiovascular events in people with down syndrome. PLoS ONE. 2015;10(9):e0137093. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Pedersen A, Nordenvall AS, Tettamanti G, Nordgren A. Age-related cardiovascular disease in down syndrome: A population-based matched cohort study. J Intern Med. 2025;297(6):683–92. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material 1.^{(38.8KB, csv)}

Data Availability Statement

[CR1] 1.Salehi A, Ashford JW, Mufson EJ. The link between alzheimer’s disease and down Syndrome. A historical perspective. Curr Alzheimer Res. 2016;13(1):2–6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Kirmani BF, Shapiro LA, Shetty AK. Neurological and neurodegenerative disorders: novel concepts and treatment. Aging Dis. 2021;12(4):950–3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Bracher-Smith M, Melograna F, Ulm B, Bellenguez C, Grenier-Boley B, Duroux D, et al. Machine learning in alzheimer’s disease genetics. Nat Commun. 2025;16(1):6726. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Monk B, Rajkovic A, Petrus S, Rajkovic A, Gaasterland T, Malinow R. A machine learning method to identify genetic variants potentially associated with alzheimer’s disease. Front Genet. 2021;12:647436. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Muiño E, Carcel-Marquez J, Llucià-Carol L, Gallego-Fabrega C, Cullell N, Lledós M, et al. Identification of genetic loci associated with intracerebral hemorrhage using a Multitrait analysis approach. Neurology. 2024;103(8):e209666. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Wahab KW, Tiwari HK, Ovbiagele B, Sarfo F, Akinyemi R, Traylor M, et al. Genetic risk of spontaneous intracerebral hemorrhage: systematic review and future directions. J Neurol Sci. 2019;407:116526. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019;20(8):467–84. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Mishra A, Malik R, Hachiya T, Jürgenson T, Namba S, Posner DC, et al. Stroke genetics informs drug discovery and risk prediction across ancestries. Nature. 2022;611(7934):115–23. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Auer PL, Lettre G. Rare variant association studies: considerations, challenges and opportunities. Genome Med. 2015;7(1):16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Bonazzola R, Ferrante E, Ravikumar N, Xia Y, Keavney B, Plein S, et al. Unsupervised ensemble-based phenotyping enhances discoverability of genes related to left-ventricular morphology. Nat Mach Intell. 2024;6(3):291–306. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Sieliwonczyk E, Sau A, Patlatzoglou K, McGurk KA, Pastika L, Thami PK, et al. Unsupervised feature extraction using deep learning empowers discovery of genetic determinants of the electrocardiogram. Genome Med. 2025;17(1):118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Zhao H, Shi J, Zhang Y, Xie A, Yu L, Zhang C, et al. LncTarD: a manually-curated database of experimentally-supported functional lncRNA-target regulations in human diseases. Nucleic Acids Res. 2020;48(D1):D118–26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO et al. Twelve years of samtools and BCFtools. Gigascience. 2021;10(2). [DOI] [PMC free article] [PubMed]

[CR15] 15.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14(4):417–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Love MI, Huber W, Anders S. Moderated Estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Liao Y, Wang J, Jaehnig EJ, Shi Z, Zhang B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and apis. Nucleic Acids Res. 2019;47(W1):W199–205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 2022;50(W1):W216–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Tsou AY, Bulova P, Capone G, Chicoine B, Gelaro B, Harville TO, et al. Medical care of adults with down syndrome: A clinical guideline. JAMA. 2020;324(15):1543–56. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Iulita MF, Garzon Chavez D, Klitgaard Christensen M, Valle Tamayo N, Plana-Ripoll O, Rasmussen SA, et al. Association of alzheimer disease with life expectancy in people with down syndrome. JAMA Netw Open. 2022;5(5):e2212910. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Jastrzebski K, Kacperska MJ, Majos A, Grodzka M, Glabinski A. Hemorrhagic stroke, cerebral amyloid angiopathy, down syndrome and the Boston criteria. Neurol Neurochir Pol. 2015;49(3):193–6. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Schoeppe F, Rossi A, Levin J, Reiser M, Stoecklein S, Ertl-Wagner B. Increased cerebral microbleeds and cortical superficial siderosis in pediatric patients with down syndrome. Eur J Paediatr Neurol. 2019;23(1):158–64. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Lather HD, Gornik HL, Olin JW, Gu X, Heidt ST, Kim ESH, et al. Prevalence of intracranial aneurysm in women with fibromuscular dysplasia: A report from the US registry for fibromuscular dysplasia. JAMA Neurol. 2017;74(9):1081–7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Rodriguez C, Sobrino T, Agulla J, Bobo-Jimenez V, Ramos-Araque ME, Duarte JJ, et al. Neovascularization and functional recovery after intracerebral hemorrhage is conditioned by the Tp53 Arg72Pro single-nucleotide polymorphism. Cell Death Differ. 2017;24(1):144–54. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Murphy C. Olfactory and other sensory impairments in alzheimer disease. Nat Rev Neurol. 2019;15(1):11–24. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Kinney JW, Bemiller SM, Murtishaw AS, Leisgang AM, Salazar AM, Lamb BT. Inflammation as a central mechanism in alzheimer’s disease. Alzheimers Dement (N Y). 2018;4:575–90. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Guimaraes TR, Swanson E, Kofler J, Thathiah A. G protein-coupled receptor kinases are associated with alzheimer’s disease pathology. Neuropathol Appl Neurobiol. 2021;47(7):942–57. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Adler P, Mayne J, Walker K, Ning Z, Figeys D. Therapeutic targeting of casein kinase 1delta/epsilon in an alzheimer’s disease mouse model. J Proteome Res. 2019;18(9):3383–93. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Sundaram S, Nagaraj S, Mahoney H, Portugues A, Li W, Millsaps K, et al. Inhibition of casein kinase 1delta/epsilonimproves cognitive-affective behavior and reduces amyloid load in the APP-PS1 mouse model of alzheimer’s disease. Sci Rep. 2019;9(1):13743. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Yuan XZ, Sun S, Tan CC, Yu JT, Tan L. The role of ADAM10 in alzheimer’s disease. J Alzheimers Dis. 2017;58(2):303–22. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Berg SZ, Berg J. Melanin: a unifying theory of disease as exemplified by Parkinson’s, Alzheimer’s, and lewy body dementia. Front Immunol. 2023;14:1228530. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Zhang D, Chen S, Xu S, Wu J, Zhuang Y, Cao W, et al. The clinical correlation between alzheimer’s disease and epilepsy. Front Neurol. 2022;13:922535. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Lott IT, Doran E, Nguyen VQ, Tournay A, Movsesyan N, Gillen DL. Down syndrome and dementia: seizures and cognitive decline. J Alzheimers Dis. 2012;29(1):177–85. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Liu DZ, Tian Y, Ander BP, Xu H, Stamova BS, Zhan X, et al. Brain and blood MicroRNA expression profiling of ischemic stroke, intracerebral hemorrhage, and Kainate seizures. J Cereb Blood Flow Metab. 2010;30(1):92–101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Zhang W, Wu Q, Hao S, Chen S. The hallmark and crosstalk of immune cells after intracerebral hemorrhage: immunotherapy perspectives. Front Neurosci. 2022;16:1117999. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Aguilar-Castillo MJ, Cabezudo-Garcia P, Ciano-Petersen NL, Garcia-Martin G, Marin-Gracia M, Estivill-Torrus G et al. Immune mechanism of epileptogenesis and related therapeutic strategies. Biomedicines. 2022;10(3). [DOI] [PMC free article] [PubMed]

[CR38] 38.Piec PA, Pons V, Rivest S. Triggering Innate Immune Receptors as New Therapies in Alzheimer’s Disease and Multiple Sclerosis. Cells. 2021;10(8). [DOI] [PMC free article] [PubMed]

[CR39] 39.Sobey CG, Judkins CP, Sundararajan V, Phan TG, Drummond GR, Srikanth VK. Risk of major cardiovascular events in people with down syndrome. PLoS ONE. 2015;10(9):e0137093. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Pedersen A, Nordenvall AS, Tettamanti G, Nordgren A. Age-related cardiovascular disease in down syndrome: A population-based matched cohort study. J Intern Med. 2025;297(6):683–92. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Unsupervised machine learning integrates genomic variants and EMR to unravel mechanisms of brain hemorrhage and epilepsy as early indicators of Alzheimer’s in down syndrome

Yichuan Liu

Hui-Qi Qu

Xiao Chang

Frank D Mentch

Haijun Qiu

Kenny Nguyen

Garnet Eister

Kayleigh Ostberg

Joseph Glessner

Hakon Hakonarson

Abstract

Background

Methods

Results

Conclusion

Supplementary Information

Statement of significance

Introduction

Methods

Patient recruitment & EMR abstraction

Whole genome sequencing (WGS) & RNA sequencing data processing and variant detection

Fig. 1.

Clustering genomic variants using factor analysis

Association analysis of genomic clusters and clinical phenotype from EMR

Checking for compounding impacts & effect sizes

Permutation-based empirical null for factor–phenotype enrichment

Results

Gene clusters associated with brain hemorrhage and severe epilepsy

Fig. 2.

Functional enrichment of gene clusters for brain hemorrhage and severe epilepsy associated with alzheimer traits

Fig. 3.

Discussion

Table 1.

Table 2.

Conclusion

Supplementary Information

Acknowledgements

Author contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Conflict of Conflicting Interests

Competing interests

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases