Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2024 Nov 4;14:26668. doi: 10.1038/s41598-024-77829-1

Prediction and clustering of Alzheimer’s disease by race and sex: a multi-head deep-learning approach to analyze irregular and heterogeneous data

Chun Yin Chang 1,, Diana Slowiejko 1, Nikki Win 1
PMCID: PMC11535522  PMID: 39496718

Abstract

Early detection of Alzheimer’s disease (AD) is crucial to maximize clinical outcomes. Most disease progression analyses include people with diagnoses of cognitive impairment, limiting understanding of AD risk among those with normal cognition. The objective was to establish AD progression models through a deep learning approach to analyze heterogeneous, multi-modal datasets, including clustering analyses of population subsets. A multi-head deep-learning architecture was built to process and learn from biomedical and imaging data from the National Alzheimer’s Coordinating Center. Shapley additive explanation algorithms for feature importance ranking and pairwise correlation analysis were used to identify predictors of disease progression. Four primary disease progression clusters (slow, moderate and rapid converters or non-converters) were subdivided into groups by race and sex, yielding 16 sub-clusters of participants with distinct progression patterns. A multi-head and early-fusion convolutional neural network achieved the most competitive performance and demonstrated superiority over a single-head deep learning architecture and conventional tree-based machine-learning methods, with 97% test accuracy, 96% F1 score and 0.19 root mean square error. From 447 features, 2 sets of 100 predictors of disease progression were extracted. Feature importance ranking, correlation analysis and descriptive statistics further enriched cluster analysis and validation of the heterogeneity of risk factors.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-024-77829-1.

Keywords: Alzheimer’s disease, neural network model, early detection of disease, disease progression, race, ethnicity

Subject terms: Computational biology and bioinformatics, Risk factors, Alzheimer's disease

Introduction

Alzheimer’s disease (AD) is a neurodegenerative disease affecting cognition, memory and behavior. Early detection of disease is crucial to maximize treatment impact and clinical outcomes; however, most current disease progression studies have been conducted on people already diagnosed with mild cognitive impairment (MCI)1, and little research has been done on those who are classified as having asymptomatic normal cognition (NC) but who are at risk of progression to AD.

Compared with White people, Black/African American and Hispanic/Latino people are up to twice as likely to develop AD but are far less likely to be included in clinical trials for AD treatments2. Reasons for lack of inclusion are myriad and include geographic location of study, type of institute conducting study, insurance requirements, patient financial and logistical constraints associated with frequent study visits and distrust of clinical research3,4. Underrepresentation of certain racial and ethnic groups in clinical trials limits the collection of accurate safety and efficacy data for AD therapies.

Many recent clinical analyses in the field of neurology using real-world data have adopted machine learning for disease progression prediction analysis, as the use of predictive models may lead to earlier risk detection, diagnosis and treatment510.

Conventional machine learning models can only process one single longitudinal dataset of fixed length at a time and, precludes the detection of complex patterns and relationships across multiple inputs and heterogeneous risk factors. Deep learning approaches have also been used to explore datasets, but variable number and window sizes are limited11. Due to the exponential growth of biomedical data by volume and dimension, a predictive model’s scalability becomes critical to extract meaningful and accurate insights. To address this issue, we built and applied a multi-head predictive model on the foundation notion of recurrent neural network architecture. An integrative deep learning approach allows for effectively performing disease progression prediction, feature extraction and clustering at scale. Four major clusters and 16 sub-clusters of participants with AD were identified and analyzed by disease progression rate, sex and race.

This analysis has 2 main hypotheses: (1) an integrative multi-head and multi-modal deep learning model enables learning patterns from multiple predictors at scale and (2) accurate identification of the top predictors, descriptive statistics and the combined depiction of disease trajectories are necessary to reflect the heterogeneity of the AD population. The objective of this analysis was to predict participants’ cognitive status at their last visit from their first visit while asymptomatic. We also performed clustering analysis to group participants based on disease progression speed, race and sex.

Results

Study population

The study population consisted of 6110 participants with 447 distinct features across multiple domains, including data points collected from a total of 36,632 annual clinic visits (Fig. 1). The 447 distinct features spanned 3 input datasets, each with a different number of dimensions and time steps (Table 1).

Fig. 1.

Fig. 1

Study population selection criteria flowchart. MCI mild cognitive impairment, MRI magnetic resonance imaging, NACC National Alzheimer’s Coordinating Center, NC normal cognition, UDS uniform data set.

Table 1.

Study population and input datasets breakdown.

Input datasets Type of data Unique participants (n) Features (n) Time steps (n)
NACC61 uniform data set Time series 6110 246 15
NACC61 uniform data set Cross-sectional 6110 59 1
NACC61 MRI imaging data MRI time series 1918 142 6

MRI magnetic resonance imaging.

Of the 6110 unique participants, 63% were female and 37% were male, 87% were White and 13% were Black/African American; approximately 5% of the White participants also identified themselves as Hispanic/Latino. At the initial visit, 85% of the study participants were 61 years of age or older. Approximately 2% of the participants had less than 12 years of education, 53% had 12 to 18 years and 45% had 18 or more years. A total of 62% of participants were married, 14% were widowed and 14% were divorced. Among the participants, 61% did not have the risk gene apolipoprotein E4 (APOE4) allele, 27% had 1 copy and 3% had 2 copies; APOE4 status was unknown in 9%.

Disease progression prediction model performance

Of the 15 models evaluated, the multi-head, multi-modality, early-fusion, 2-layer convolutional neural network (CNN) model performed best across all metrics (accuracy, root mean square error, precision, recall, F1 score and run time), although the performance of the first CNN late-fusion or fourth long-short term memory (LSTM) late-fusion models came very close (Table 2). The loss and accuracy learning curves (Fig. S1) indicated that over- or under-fitting issues were properly addressed via cross-validation and hyper-parameter tuning.

Table 2.

Comparison of 15 predictive models of various architecture designs and configurations.

Classifiers Accuracy RMSE Precision Recall F1 Score RT (minutes)
1. CNN_Multihead_Late_Fusion 0.96 0.22 0.95 0.96 0.95 0.9
2. CNN_Multihead_Early_ Fusion_1 Layer 0.95 0.25 0.95 0.95 0.95 0.1
3. CNN_Multihead_Early_ Fusion_2 Layersa 0.97 0.19 0.96 0.97 0.96 0.1
4. LSTM_Multihead_ Late_Fusion 0.97 0.21 0.95 0.97 0.96 0.5
5. LSTM_Multihead_Early_ Fusion_1 Layer 0.96 0.20 0.95 0.96 0.95 0.2
6. LSTM_Multihead_Early_ Fusion_2_Layers 0.96 0.22 0.93 0.96 0.95 0.2
7. CNN_Single_Modal (TS) 0.96 0.21 0.95 0.96 0.95 5.8
8. CNN Single Modal (CS) 0.72 1.12 0.70 0.72 0.71 1.0
9. CNN Single Modal (MRI) 0.81 0.99 0.65 0.81 0.72 1.2
10. LSTM Single Modal (TS) 0.96 0.22 0.95 0.96 0.95 17.5
11. LSTM Single Modal (CS) 0.73 1.11 0.69 0.73 0.71 2.1
12. LSTM Single Modal (MRI) 0.80 1.01 0.67 0.80 0.72 3.0
13. XGB Single Modal (TS) 0.95 0.23 0.96 0.95 0.95 0.8
14. XGB Single Modal (CS) 0.70 1.15 0.71 0.70 0.70 0.1
15. XGB Single Modal (MRI) 0.40 1.98 0.68 0.40 0.49 0.3

CNN convolutional neural network, CS cross-sectional, LSTM long short-term memory, MRI magnetic resonance imaging, RMSE root mean square error, RT run time, TS time series, XGB XGBoost.

aWinning model.

Significant values are given in bold.

The CNN model successfully distinguished 4 class clusters (Fig. 2). The data points are activation values extracted from the model’s hidden layers. Four clusters labeled by cognitive status and recoded from zero (0, NC; 1, impaired-not-MCI; 2, MCI; 3, dementia) are cleanly separable, which indicates that the model has learned complex patterns and has strong predictability to distinguish even extremely imbalanced classes.

Fig. 2.

Fig. 2

Two-dimensional t-SNE plot. 2D t-SNE plot uses activation values (weights) extracted from the hidden layers of the best CNN model. The visualization of 4 cleanly separated predicted cognitive statuses demonstrates the model’s effective predictability, even for imbalanced classes. 2D 2-dimensional, CNN convolutional neural network, MCI mild cognitive impairment, NC normal cognition, t-SNE Student t-distributed stochastic neighbor embedding.

Clustering analysis

Participants were grouped into 4 different conversion clusters based on rate of disease progression: non-converters or slow (NC to MCI), moderate (NC to MCI to dementia) and rapid converters (rapid conversion from NC to MCI to dementia or NC to dementia). Non-converters (n = 4919) accounted for 80% of the study population (N = 6110), followed by slow converters (14%; n = 827), moderate converters (4%; n = 230) and rapid converters (2%; n = 134).

When examined by sex and race for participants with and without magnetic resonance imaging (MRI) scans, a similar pattern of results was observed: non-converters were the most common, followed by slow, moderate and rapid converters (Table 3). Among rapid converters, Black/African American female and male participants had a faster rate of disease progression; overall, Black/African American participants progressed at earlier visit cycles and with more variability than their White counterparts (Fig. 3).

Table 3.

Patient counts and % distribution breakdown by conversion type, race and sex.

Study population, n (%) N = 6110
Female 3877 (63)
Male 2233 (37)
Subtype Black/AA female Black/AA male White female White male
 Non-converters 496 (8.1) 158 (2.6) 2700 (44.3) 1565 (25.7)
 MCI 82 (1.3) 29 (0.5) 388 (6.4) 328 (5.4)
 NC to MCI to dementia 11 (0.2) 3 (< 0.1) 119 (2.0) 97 (1.6)
 Dementia 8 (0.1) 3 (< 0.1) 73 (1.2) 50 (0.8)
Participants with MRI, n (%)a n = 1918
Female 1236 (64)
Male 682 (36)
Subtype Black/AA female Black/AA male White female White male
 Non-converters 147 (7.7) 48 (2.5) 887 (46.3) 482 (25.1)
 MCI 26 (1.4) 10 (0.5) 132 (6.9) 112 (5.8)
 NC to MCI to dementia 5 (0.3) 2 (0.1) 29 (1.5) 24 (1.3)
 Dementia 0 1 (< 0.1) 10 (0.5) 3 (0.2)

AA African American, MCI mild cognitive impairment, MRI magnetic resonance imaging, NC normal cognition.

aMRI patients (n = 1918) are a subset of the study population.

Fig. 3.

Fig. 3

Mean and 95% CI point plots by race and sex for (A) NC-to-MCI converters cluster, (B) NC-to-MCI-to-dementia converters cluster and (C) NC-to-dementia converters cluster. AA African American, Cog cognition, MCI mild cognitive impairment, NC normal cognitive impairment, NIH National Institutes of Health.

Slow converter cluster

Within the slow converter cluster, disease progression patterns between Black/African American and White participants were similar, regardless of sex. Black/African American participants progressed from NC to the MCI stage within 1.5 years by mean points and at earlier visit cycles than their White counterparts, whose progression occurred slightly in 1.5 years or longer and at later visit cycles (Fig. 3A). However, Black/African American male participants manifested larger variation in the occurrence time point compared with that of all other subgroups in this cluster.

Investigation of the within-cluster descriptive statistics (mean and standard deviation) using the National Alzheimer’s Coordinating Center (NACC) Uniform Data Set (UDS; 22 predictors) and the NACC Commercial MRI dataset (24 predictors) showed that this cohort did not perform more poorly compared with other cohorts by race and sex on the 22 UDS numeric predictors. However, the Black/African American female participants and White male participants in this cluster had 23 out of 24 MRI predictors showing the smallest values (shrinkage) compared with all other converter cohorts by race and sex. These 23 MRI predictors1217 represent either gross brain volumes (cc), regional gray matter volumes (cc) or regional cortical thickness (mm) (Tables S1, S2).

Moderate converter cluster

Comparison of mean point values showed that female participants in the moderate converter cluster in both race cohorts shared almost identical trajectories of progression to MCI and dementia, but Black/African American females had higher variability than White females on the occurrence time point (Fig. 3B). At approximately the fifth annual visit, female participants in both race groups converted from NC to MCI and progressed to dementia between the seventh and eighth annual visits.

Black/African American male participants in the moderate converter cluster showed a different pattern. It took, on average, more than 4 years for Black/African American male participants to convert from NC to MCI, but these participants then experienced accelerated progression to dementia, which occurred within a year on average. Both MCI and dementia transitions showed wider variability, spanning across 3 to 7 years. The mean point values occurred earlier for Black/African American males than for their White male counterparts.

White male participants had a slower disease progression speed. The White male cohort in the moderate conversion cluster converted from NC to MCI within 2 years by their fifth annual visit and took approximately another 2 to 3 years to develop into dementia around year 7 on average but with less variability. Examination of the descriptive statistics over key numeric predictors revealed differences by race. Both Black/African American female and male participants had lower means and standard deviations than White participants on several predictors of disease progression, including Boston Naming Test, Clinical Dementia Rating (CDR) sum of boxes (CDRSUM)18, total geriatric depression scale score (NACCGDS)19 and total Mini-Mental State Examination (MMSE) score (NACCMMSE)20 (Table S1).

Rapid converter cluster

Black/African American female participants in the NC-to-dementia converters cluster (Fig. 3C) skipped the MCI transitional stage and progressed directly to dementia across 3 years but with high variability. In contrast, White female participants transitioned to an MCI intermediary phase within 2 years with high variability and then progressed to dementia within another 1.5 years by mean point values. Compared with White female participants, Black/African American female participants showed much larger variability, faster deterioration speed and an earlier occurrence of dementia in their annual visit cycles.

Black/African American female participants in this cluster did not have MRI scan data available, but this same group of participants also had the largest number of risk factors showing lower mean and standard deviation values compared with their female counterparts across all the other clusters. Examples of these risk factors included but were not limited to total number of animals named in 60 seconds (Animals), all 4 craft story 21 recall (delayed vs. immediate and verbatim vs. paraphrasing) scores (CRAFTDRE, CRAFTDVR, CRAFTURS, CRAFTVRS), Multilingual Naming Test (MINT) - total correct without semantic cue (MINTTOTW), Multilingual Naming Test -total score (MINTTOTS)21, total geriatric depression scale score (NACCGDS), Montreal Cognitive Assessment total score corrected for education (NACCMOCA), Trail Making Test Part A (TRAILA), Trail Making Test Part B (TRAILB)22, total score for copy of benson figure (UDSBENTC) and total score for 10- to 15-minute delayed drawing of benson figure (UDSBENTD)23 (Table S1). Male participants from both race cohorts in the same cluster showed similar trajectories of disease progression (Fig. 3C). Black/African American male participants had rapid progression from NC to dementia within 1.5 years by mean points. White male participants progressed to dementia after Black/African American male participants but still experienced a similarly rapid progression pattern by the sixth annual visit. In both male race cohorts, the MCI transition stage was rapid and short.

White male participants had low mean scores on CRAFTDRE, CRAFTDVR, CRAFTURS and CRAFTVRS, whereas Black/African American male participants had low mean scores on MINTOTW, MINTOTS, NACCMMSE and NACCMOCA. Black/African American male participants also had the highest body mass index (NACCBMI)24 and the largest total number of medications reported at each visit (NACCAMD) (Table S1).

Predictors importance analysis

Two sets of the top 100 predictors of disease progression were extracted from 447 distinct features. The first set of predictors was ranked by their contribution to predicting all conversion classes, and the second set by their contribution to predicting converters to MCI and/or dementia. The commonalities to both sets included 16 clinical and non-clinical diagnosis domains. Leading domains included Physical/Neurological Exam Findings, Clinician Diagnosis, Neuropsychological Battery Summary Scores, Subject Health History and Subject Demographics (Table 4). MRI imaging predictors were captured within the pool of top 100 predictors but were found in the 51st to 100th rank in either set. Twenty MRI predictors contributed to the prediction when all conversion classes were considered. Only 5 MRI predictors found in the top 100 feature ranking set were specific to predicting the MCI/dementia classes alone. There is only 1 duplicate across these 2 sets of predictors. A mix of all 24 MRI numeric predictors and their descriptive statistics are listed in Table S2.

Table 4.

Top 28 common predictors’ groupings, acronyms and descriptions.

Predictor group Acronym Descriptorsa
A1. Patient demographics INDEPEND Level of independence
MARISTAT Marital status
A5. Patient health history B12DEF Vitamin B12 deficiency
CBSTROKE Stroke
CVHATT Heart attack or cardiac arrest
DIABETES Diabetes
PACKSPER Average number of packs smoked per day
PD Parkinson’s disease
B4. CDR® Plus NACC FTLD CDRSUM CDR® sum of boxes
B8. Physical/neurological examination findings ALSFIND Findings suggesting ALS (e.g., muscle wasting, fasciculation, upper motor and/or lower motor neuron signs)
BRADY Bradykinesia
NORMEXAM Abnormal neurological exam findings
PARKSIGN Signs of Parkinson’s disease
PSPCBS Findings suggestive of progressive supranuclear palsy, corticobasal syndrome, or other related disorders
RIGIDL Rigidity in left arm
SLOWINGR Slowing of fine motor movements on left side
C. Neuropsychological battery summary scores COGSTAT Per clinician, based on the neuropsychological examination, the subject’s cognitive status is deemed
D. Clinician diagnosis AMNDEM Dementia syndrome—amnestic multidomain dementia syndrome
DATSCAN Dopamine transporter scan
DEPTREAT Depression
NACCBVFT Dementia syndrome—posterior cortical atrophy syndrome (or primary visual presentation)
NACCLBDS Dementia syndrome – Lewy body dementia syndrome
NACCMCIA MCI domain affected—attention
NACCMCIE MCI domain affected—executive function
NACCMCIL MCI domain affected—language
NACCMCIV MCI domain affected—visuospatial
NACCTMCI Mild cognitive impairment type
PCA Dementia syndrome—posterior cortical atrophy syndrome (or primary visual presentation)

ALS amyotrophic lateral sclerosis, CBS corticobasal syndrome, CDR, Clinical Dementia Rating, FTLD frontotemporal lobe dementia, NACC, National Alzheimer’s Coordinating Center.

aMost are categorical features with yes or no labels.

When the two sets were narrowed to the top 50 predictors, only 28 predictors of disease progression were found in common (Fig. 4; Table 4). These 28 common predictors both validate the heterogeneity of risk factors influencing AD progression and suggest risk factors that can affect AD progression regardless of conversion class. Predictors highly correlated with the target label (NACCUDSD) used in the model included CDR sum of boxes, MCI type, cognitive status (per clinician judgment based on the neuropsychological examination), stroke, heart attack/cardiac arrest, depression and diabetes.

Fig. 4.

Fig. 4

Top 28 predictors pairwise correlation forest plot. The 28 predictors are common in the 2 sets of Top 50 predictors for all classes vs. MCI/dementia after being extracted from Shapley additive explanation feature importance ranking algorithms. The plot shows the correlation between 28 predictors and response variable (cognitive status) individually. See Table 4 for an explanation of each predictor. NACCUDSD National Alzheimer’s Coordinating Center Uniform Data Set Researchers Data Dictionary.

Discussion

A CNN-based multi-head and early-fusion deep learning model achieved competitive performance among 15 experiments and validated the hypothesis that an integrative multi-head and multi-modal deep learning model allows for learning patterns from multiple predictors at scale. A total of 16 sub-clusters were identified and 12 converters sub-clusters were deep dived to explore heterogeneous disease progression by rate of change, sex and race. Different sets of top predictors, pairwise correlation analysis and visualization, and associated descriptive statistics further enriched the cluster analysis and insight findings.

The main advantage of the CNN early-fusion model is its capacity to ingest 3 input datasets simultaneously using a multi-head design. Its early-fusion parsimonious architecture allows for early model convergence on 447 predictors and learning optimal patterns through CNN’s feed-forward and backpropagation loops while retaining complete information (no dropout or global max pooling is used). In addition to the competitive performance, the model’s low runtime demonstrates computational efficiency compared with LSTM and other late-fusion models. Both deep learning and machine learning single-modality models performed well on processing UDS Time Series input data alone but did poorly in other single input datasets (CS or MRI); even the widely used, tree-based XGBoost machine learning model performed poorly in processing CS and MRI inputs. The major drawback of any of the single-modality models is that the model is not able to ingest all 447 predictors concurrently and learn their interactions at scale in one model.

Black/African American female and male participants had a faster disease trajectory and earlier occurrence of disease progression during visit cycles but more variability compared with their White counterparts, which may suggest a more variable disease presentation among Black/African American individuals.

Previous studies have shown differences by race and ethnicity for incidence and prevalence, timing of diagnosis, AD presentation and course, although literature on underrepresented populations is sparse25. The prevalence of cognitive impairment of AD is up to 2 times higher in Black/African American individuals than individuals from other ethnic or racial groups, but it is unclear why Black/African American individuals have an increased burden26. A recent analysis of over 1500 MRI scans showed that middle-aged Black/African American individuals were more likely than Hispanic and White individuals to have a higher prevalence of white matter lesions, which are markers of cerebrovascular disease associated with cognitive decline and AD27. Consistent with these results, this analysis study showed a faster AD trajectory among Black participants than among White participants. Results from other longitudinal studies, however, have been mixed, with some showing equivalent rates of cognitive decline between Black/African American and White participants and others showing slower or faster rates for Black/African participants than for White participants26. Collection of additional data from Black/African American individuals and other underrepresented populations is needed to better understand causes of variability.

While the models performed well in predicting progression, further validation is needed in more diverse populations and over longer periods. Deep learning models for longitudinal biomedical data analysis remain active research topics and areas of interest among clinicians, data scientists and researchers28. Existing challenges include high dimensionality, irregularity of longitudinal data and large number of predictors with complex and non-linear relationships not known a priori and correlations in repeated measures. None of these challenges are trivial to solve. Additional methodological challenges included determining 3-dimensional padding strategy and using concatenating techniques to vectors and matrices involving different window sizes,; and handling missing data coded uniquely by NACC schema to mitigate information loss while avoiding excessive manipulation that could skew the findings and interpretation. Challenges inherent to AD disease itself include biological complexity, heterogeneity and diagnosis difficulty. Black/African American female participants in this cluster did not have MRI scan data available. Reasons for lack of MRI scan data are not provided in the database, but health disparities may contribute. UDS data showed that the mean age (NACCAGEB) of Black/African American female participants was 78 years, the oldest among the cross-converter clusters by race and sex; 88% were either divorced, never married or widowed, and half of them required assistance in basic or complex tasks when evaluating their living independence. Distinguishing normal from pathological aging on MRI measures such as reduction of gross brain volumes, regional gray matter volumes and regional cortical thickness is another challenge that requires additional research. Finally, this case study was not meant to be exhaustive. Additional scientific questions, alternate artificial intelligence methodologies, or empirical experiments remain open for further exploration.

In this use case, we aimed to overcome the limitations of single-modality analysis for progression of AD, a complex disease. The multi-head deep learning model demonstrated superiority in complex disease progression prediction and clustering. This analysis validated high heterogeneity in the 12 converter sub-clusters characterized not only by risk factors but also by race and sex. Black/African American female and male participants consistently demonstrated earlier occurrence of disease progression but had more variability compared with their White counterparts. Insights generated from this use case may help to identify patients most at risk of AD progression.

Methods

Data source

This analysis used biomedical data compiled by the NACC, funded by the National Institute on Aging and National Institutes of Health Grant U24 AG07212229. The NACC dataset is one of the largest and most comprehensive longitudinal, standardized datasets containing clinical and neuropathological data on AD in the world. This use case processed 2 input datasets: the NACC Commercial UDS and NACC commercial MRI data. The NACC Commercial UDS is the larger of the 2 data sources and contains data collected across 31 AD centers from 2006 to 2023. The NACC commercial imaging data includes measurements extracted from MRI scans; these data were collected at 24 AD centers from 2000 to 2023.

The NACC Commercial UDS data offers rich, multi-domain, neurocognitive clinical and phenotypic data, including data from participants with NC, MCI and dementia. The disease stage nomenclatures of NC, MCI, impaired-not-MCI and dementia used throughout this case study follow the NACC-derived variable definition and description for cognitive status rating, documented in the NACC Uniform Data Set Researchers Data Dictionary. The data were captured for participants before and after onset of dementia. The participating AD centers applied robust diagnosis criteria considered to be industry standard among US medical and clinical practitioners. The NACC data are highly dimensional and include both longitudinal and cross-sectional (CS) variables. The data represent more than 25 categories, including demographics, participant and family health history, genetic mutations, neuropsychological battery results, clinical diagnosis, biomarkers and MRI measurements (e.g., gross brain volume, regional gray matter volume, cortical thickness).

Study population and feature selection

NACC’s raw data are not limited to AD clinical diagnosis alone. In addition to heavily relying on and consulting with clinical scientists specializing in AD and neurology, this analysis conducted iterative procedures to subset the final study population consisting of 6110 participants and 447 distinct features. These procedures included exploratory data analysis, data wrangling, hand crafting initial list of features, creation of 28 new features via feature engineering and dummy coding transformation of categorical features. The final list of features from 3 input datasets covered multiple domains (neurocognitive clinical diagnosis, phenotypic data and MRI scan measurements). Data points were collected and compiled from a total of 36,632 annual clinic visits (Fig. 1).

Missing data handling

NACC coded all data points in a number format including missing, unknown or uncollected data points. For example, the raw data could have cell values coded as 8, not applicable; 88, not applicable; 9, other or unknown; 9, missing/unknown/not assessed; 99, unknown; −4, not available; −4, did not complete medications form; 0, no/unknown. Most feature data types classified by NACC as “numeric longitudinal” or "numeric cross-sectional" were not continuous and numerical data in the mathematical sense. They were treated as multi-class categorical/nominal features and were transformed prior to computing. Descriptive statistics were first applied to address the missing values by checking the percentile distribution of each class in a feature. If the missing values exceeded 75%, the entire feature was dropped. For features with less than 75% of missing values, one-hot-encoding/dummy encoding technique was applied. Once the categorical features converted each of the multiple class levels into stand-alone features and cell values were in Boolean format (0, 1), those original missing values were removed as new features while retaining all other valid and meaningful classes turned features. Although the entire dataset’s dimensionality increased significantly, that was the compromise for retaining the maximum amount of valuable data.

Unbalanced data handling

Exploratory data analysis revealed categorical data imbalances occurring across multiple key features at an early stage. This analysis applied algorithm-based class stratification at cognitive status and window size features during the data wrangling and train-test dataset splitting procedures. Therefore, validation and test sets had consistent proportional class distribution as the study population. The input datasets were later split into train vs. test sets at the 75% vs. 25% ratio. Training data were further split into train vs. validation sets at 90% vs. 10% ratio during learning.

Model selection strategy

The strong motivation of model selection strategy was driven by how well a model can handle highly dimensional, irregular and multi-modal data such as NACC’s with configurable architecture design flexibility. The other driver emphasized seeking a balance between the model’s fit-for-purpose, parsimonious design and computational efficiency. Combining neural networks and multi-modal fusion strategies offers interesting potential for classification tasks, but the optimal fusion strategy for many applications is still under development and yet to be determined. One report in human activity pattern recognition applied CNN and fusion strategy for multi-modal data. Their results showed a clear performance improvement by a multi-modal fusion and substantial advantage of early fusion strategy30.

This analysis designed a multi-head neural network model construct to ingest 3 distinct inputs of different sizes and modalities. Each head learns independently in the beginning and fuses all features jointly to learn patterns and interactions at different points of depth before reaching to the final multi-class classification prediction step. The intent is to mirror as closely as possible how brain neurons and the AD disease biology work in reality under the influence of numerous associated clinical and social determinants of health factors and interactions. Variants of early or late fusion strategy allow performance and efficiency comparisons to find an optimal neuron (nodes) and filter size and depth (numbers of input, hidden and output layers) of a deep learning model.

Disease progression prediction modeling

To validate the superiority of the multi-head and multi-modal deep learning model vs. a single modal model, this analysis built 15 variants of CNN, LSTM network and tree-based XGBoost predictive models. CNN and LSTM are built from the notion of neural network, belonging to deep learning’s body of knowledge. They share similarities of applying neurons (nodes) in various sizes or depths by number of input, output and hidden layers, but they still differ from function, structure and use cases. The 15 variants differ in architecture and multi-head and early- or late-fusion or single-modality configuration3135.

Clinician-judged cognitive status, as described in the NACC Uniform Data Set Researchers Data Dictionary Version 3, March 2015 edition (NACCUDSD)36, was selected as the response variable. The participant’s cognitive status was determined at every visit (time step). Participants assessed as having NC were defined as NACCUDSD = 1. Participants who were cognitively impaired but did not meet the criteria for MCI were defined as NACCUDSD = 2. Participants with either amnestic or non-amnestic MCI were defined as NACCUDSD = 3 and those with a diagnosis of dementia were defined as NACCUDSD = 436. The response variable is commonly termed a “label” or “target” feature and is used interchangeably in supervised learning and in this analysis. The label has 4 classes—NC, impaired-not-MCI, MCI and dementia—which exhibit extremely imbalanced distributions in the study population. The last visit label was used to evaluate the predictive accuracy of the model.

In addition to test accuracy, 5 other performance metrics were selected to evaluate the predictability and generalizability of the model. A 2-dimensional Student t-distributed stochastic neighbor embedding plot was constructed by using activation values extracted from the best model’s hidden layers. The activation values represent the automatically learned weights by the deep learning model instead of manually engineered features associated with conventional machine learning methods.

Model hyper-parameters tuning approach

This analysis applied hyper-parameters tuning techniques to handle any potential overfitting or under-fitting challenges common in machine learning or deep learning tasks in order to achieve optimal model convergence results. The model tuning approach combined 5-fold cross-validation technique and grid search methods specifically designed for deep learning models built under the Tensorflow and Keras framework. The two grid search methods are Keras Random Search Cross Validation and Keras Bayesian Search Cross Validation. A wide variety of hyper-parameter components were tested to search for the best parameters:

  1. Filter size: set list vector range between 8 and 512 by doubling any immediately preceding element to create the next element from 8 to 512 for each of the 3 layers.

  2. Node size: set a range of integers between 32 and 256 for each of the 3 layers.

  3. Drop rate: set a uniform statistical distribution range between 0.1 and 0.5.

  4. Learning rate: set a uniform statistical distribution range between 1e−4 and 0.1.

  5. Activation function: include Exponential Linear Unit (ELU), Rectified Linear Unit (ReLu) and Hyperbolic Tangent Function (tanh).

  6. Epochs: set a range of integers between 25 and 200.

  7. Batch Size: set a range of integers between 32 and 512.

Convergence results were compared and the final set of best parameters was plugged back to the deep learning models for final outcomes. Please reference Fig. S1 to see the learning curve convergence result after hyper-parameter tuning processes.

Converters and non-converters clustering

This analysis calculated the cluster mean and standard deviation of participant’s disease progression rate by averaging the sum of within-cluster accumulated cognitive rating differences between each visit. Each participant received a cognitive status rating by a clinician at each annual visit. This use case re-coded the class ratings into ordinal values ranging from 0 to 3 (0, NC; 1, impaired-not-MCI; 2, MCI; 3, dementia) in line with Python zero-based indexing schema. Participants were grouped into conversion clusters based on rate of disease progression: slow, moderate and rapid converters or non-converters (Table S5 and Fig. S2). The non-converter cluster maintained NC status throughout the course. Slow converters were defined as those whose disease progressed to a transitional MCI stage but not further. Moderate converters were defined as participants whose disease progressed to a transitional MCI stage and continued to dementia. Rapid converters were those participants whose disease progressed rapidly to the MCI transitional stage or skipped it and progressed directly to the dementia stage.

Feature extraction and importance ranking

Shapley Additive exPlanation (SHAP) is a technique frequently used in machine learning and deep learning domains to compute the contribution of predictors in a model to understand its impact on each input. This analysis applied deep learning-centric SHAP methods37,38 to extract the top 100 predictors contributing to model predictability of all classes vs. another set of top 100 predictors influencing the prediction of MCI and dementia converters alone. Each set of ranked predictors was split into 2 groups (1st to 50th and 51st to 100th) for further investigation if predictors in the 2 sets were differentiable according to the class prediction. Subsequent pairwise correlation plotting and clustering analyses were generated to validate our second hypothesis (i.e., accurate identification of top risk factors and distinct depiction of disease trajectories are necessary to reflect the heterogeneity of the AD population) (Tables S3, S4).

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (544.4KB, pdf)

Acknowledgements

The authors thank Maxime Usdin, Research and Development Informatics at Genentech, Inc, Derrek Hibar of the Product Development at Genentech, Inc, and Suyash Mishra of Global Project Strategy at Roche Ltd, Basel, Switzerland, for their valuable feedback. The NACC database is funded by National Institute on Aging and National Institutes of Health Grant U24 AG072122. NACC data are contributed by the National Institute on Aging–funded ADRCs: P30 AG062429 (PI James Brewer, MD, PhD), P30 AG066468 (PI Oscar Lopez, MD), P30 AG062421 (PI Bradley Hyman, MD, PhD), P30 AG066509 (PI Thomas Grabowski, MD), P30 AG066514 (PI Mary Sano, PhD), P30 AG066530 (PI Helena Chui, MD), P30 AG066507 (PI Marilyn Albert, PhD), P30 AG066444 (PI John Morris, MD), P30 AG066518 (PI Jeffrey Kaye, MD), P30 AG066512 (PI Thomas Wisniewski, MD), P30 AG066462 (PI Scott Small, MD), P30 AG072979 (PI David Wolk, MD), P30 AG072972 (PI Charles DeCarli, MD), P30 AG072976 (PI Andrew Saykin, PsyD), P30 AG072975 (PI David Bennett, MD), P30 AG072978 (PI Neil Kowall, MD), P30 AG072977 (PI Robert Vassar, PhD), P30 AG066519 (PI Frank LaFerla, PhD), P30 AG062677 (PI Ronald Petersen, MD, PhD), P30 AG079280 (PI Eric Reiman, MD), P30 AG062422 (PI Gil Rabinovici, MD), P30 AG066511 (PI Allan Levey, MD, PhD), P30 AG072946 (PI Linda Van Eldik, PhD), P30 AG062715 (PI Sanjay Asthana, MD, FRCP), P30 AG072973 (PI Russell Swerdlow, MD), P30 AG066506 (PI Todd Golde, MD, PhD), P30 AG066508 (PI Stephen Strittmatter, MD, PhD), P30 AG066515 (PI Victor Henderson, MD, MS), P30 AG072947 (PI Suzanne Craft, PhD), P30 AG072931 (PI Henry Paulson, MD, PhD), P30 AG066546 (PI Sudha Seshadri, MD), P20 AG068024 (PI Erik Roberson, MD, PhD), P20 AG068053 (PI Justin Miller, PhD), P20 AG068077 (PI Gary Rosenberg, MD), P20 AG068082 (PI Angela Jefferson, PhD), P30 AG072958 (PI Heather Whitson, MD), and P30 AG072959 (PI James Leverenz, MD).

Author contributions

CYC was responsible for study design, deep learning analytics and manuscript authoring. DS and NW provided scientific expertise. All authors participated in manuscript reviews and approved of the final submission to Scientific Reports.

Data availability

The data analyzed in this study were obtained from the NACC. The following licenses/restrictions apply: researchers at recognized research organizations can request data after signing a Data Use Agreement. Requests to access these datasets should be directed to NACC: https://naccdata.org/requesting-data/data-request-process.

Declarations

Competing interests

CYC, DS, and NW are employees of Genentech, Inc. and shareholders of F. Hoffmann-La Roche Ltd.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Grueso, S. & Viejo-Sobera, R. Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer’s disease dementia: a systematic review. Alzheimers Res. Ther.13, 162 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Reardon, S. Alzheimer’s drug trials plagued by lack of racial diversity. Nature. 620, 256–257 (2023). [DOI] [PubMed] [Google Scholar]
  • 3.Clark, L. T. et al. Increasing diversity in clinical trials: overcoming critical barriers. Curr. Probl. Cardiol.44, 148–172 (2019). [DOI] [PubMed] [Google Scholar]
  • 4.Schmotzer, G. L. Barriers and facilitators to participation of minorities in clinical trials. Ethn. Dis.22, 226–230 (2012). [PubMed] [Google Scholar]
  • 5.Guo, A., Smith, S., Khan, Y. M., Langabeer, J. R. II & Foraker, R. E. Application of a time-series deep learning model to predict cardiac dysrhythmias in electronic health records. PloS One. 16, e0239007. 10.1371/journal.pone.0239007 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wang, Y., Sun, L. & Peng, D. A multihead ConvLSTM for time series classification in eHealth industry 4.0. Wirel. e8773900. 10.1155/2022/8773900 (2022).
  • 7.Mouchet, J. et al. Classification, prediction, and concordance of cognitive and functional progression in patients with mild cognitive impairment in the united states: a latent class analysis. J. Alzheimers Dis.82, 1667–1682 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Joshi, P. S. et al. Temporal association of neuropsychological test performance using unsupervised learning reveals a distinct signature of Alzheimer’s disease status. Alzheimers Dement.5, 964–973 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cohen, S., Cummings, J., Knox, S., Potashman, M. & Harrison, J. Clinical trial endpoints and their clinical meaningfulness in early stages of Alzheimer’s disease. J. Prev. Alzheimers Dis.9, 507–522 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Jarrett, D. et al. Clairvoyance: a pipeline toolkit for medical time series. Presented at: International Conference on Learning Representations. May 3–7, 2021; Virtual. (2021).
  • 11.Lee, G., Nho, K., Kang, B., Sohn, K-A. & Kim, D. Predicting Alzheimer’s disease progression using multi-modal deep learning approach. Sci. Rep.9, 1952. 10.1038/s41598-018-37769-z (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Menary, K. et al. Associations between cortical thickness and general intelligence in children, adolescents and young adults. Intelligence. 41, 597–606 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zarei, M. et al. Cortical thinning is associated with disease stages and dementia in Parkinson’s disease. J. Neurol. Neurosurg. Psychiatry. 84, 875–882 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Demirci, N. & Holland, M. A. Cortical thickness systematically varies with curvature and depth in healthy human brains. Hum. Brain Mapp.43, 2064–2084 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Thambisetty, M. et al. Longitudinal changes in cortical thickness associated with normal aging. Neuroimage. 52, 1215–1223 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Namkung, H., Kim, S-H. & Sawa, A. The insula: an underestimated brain area in clinical neuroscience, psychiatry, and neurology. Trends Neurosci.40, 200–207 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Churchwell, J. C. & Yurgelun-Todd, D. A. Age-related changes in insula cortical thickness and impulsivity: significance for emotional development and decision-making. Dev. Cogn. Neurosci.6, 80–86 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.O’Bryant, S. E. et al. Staging dementia using clinical dementia rating scale sum of boxes scores. Arch. Neurol.65, 1091–1095 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Greenberg, S. A. The geriatric depression scale (GDS). Hartford Institute for Geriatric Nursing (2023). https://hign.org/consultgeri/try-this-series/geriatric-depression-scale-gds Accessed September 25, 2023.
  • 20.Gluhm, S. et al. Cognitive performance on the mini-mental state examination and the montreal cognitive assessment across the healthy adult lifespan. Cogn. Behav. Neurol.26, 1–5 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Stasenko, A., Jacobs, D. M., Salmon, D. P. & Gollan, T. H. The multilingual naming test (MINT) as a measure of picture naming ability in alzheimer’s disease. J. Int. Neuropsychol. Soc.25, 821–833 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ashendorf, L. et al. Trail making test errors in normal aging, mild cognitive impairment, and dementia. Arch. Clin. Neuropsychol.23, 129–137 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Jiskoot, L. C. et al. The Benson Complex Figure Test detects deficits in visuoconstruction and visual memory in symptomatic familial frontotemporal dementia: a GENFI study. J. Neurol. Sci.446, 120590. 10.1016/j.jns.2023.120590 (2023). [DOI] [PubMed] [Google Scholar]
  • 24.Vidoni, E. D., Townley, R. A., Honea, R. A. & Burns, J. M. Alzheimer disease biomarkers are associated with body mass index. Neurology. 77, 1913–1920 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Babulal, G. M. et al. Perspectives on ethnic and racial disparities in Alzheimer’s disease and related dementias: update and areas of immediate need. Alzheimers Dement.15, 292–312 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Barnes, L. L. Alzheimer disease in African American individuals: increased incidence or not enough data? Nat. Rev. Neurol.18, 56–62 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Turney, I. C. et al. Brain aging among racially and ethnically diverse middle-aged and older adults. JAMA Neurol.80, 73–81 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Cascarano, A. et al. Machine and deep learning for longitudinal biomedical data: a review of methods and applications. Artif. Intell. Rev.56, 1711–1771 (2023). [Google Scholar]
  • 29.About, NACC data. National Alzheimer’s Coordinating Center (2024). https://naccdata.org/requesting-data/nacc-data. Accessed September 25, 2023.
  • 30.Gadzicki, K., Khamsehashari, R. & Zetzsche, C. Presented at: Institute of Electrical and Electronics Engineers 23rd International Conference on Information Fusion. 10.23919/FUSION45008.2020.9190246. July 6–9, 2020; Rustenburg, South Africa.
  • 31.Sun, C., Hong, H., Song, S. & Li, M. A review of deep learning methods for irregularly sampled medical time series data. arXiv. 10.48550/arXiv.2010.12493 (2020).33758769 [Google Scholar]
  • 32.Baytas, I. M. et al. Patient subtyping via time-aware LSTM networks. Presented at: Association for Computing Machinery Special Interest Group on Knowledge Discovery in Data 23rd International Conference on Knowledge Discovery and Data Mining; 10.1145/3097983.3097997. August 13–17, 2017; Halifax, Canada.
  • 33.Auffarth, B. Machine learning for time-series with python (Packt Publishing, 2021).
  • 34.Géron, A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow 2nd edn (O’Reilly Media, Inc, 2019).
  • 35.Cao, L. Beyond i.i.d.: non-IID thinking, informatics, and learning. IEEE Intell. Syst.37, 5–17 (2022). [Google Scholar]
  • 36.Uniform data set (UDS). v3. National Alzheimer’s Coordinating Center (2015). https://naccdata.org/data-collection/forms-documentation/uds-3 Accessed September 25, 2023.
  • 37.Welcome to the SHAP documentation. SHapley Additive exPlanations (2018). https://shap.readthedocs.io/en/latest/index.html# Accessed September 25, 2023.
  • 38.Lundberg, S. & Lee, S-I. A unified approach to interpreting model predictions. arXiv. 10.48550/arXiv.1705.07874 (2017).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (544.4KB, pdf)

Data Availability Statement

The data analyzed in this study were obtained from the NACC. The following licenses/restrictions apply: researchers at recognized research organizations can request data after signing a Data Use Agreement. Requests to access these datasets should be directed to NACC: https://naccdata.org/requesting-data/data-request-process.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES