Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Apr 14.
Published in final edited form as: J Alzheimers Dis. 2024;97(1):459–469. doi: 10.3233/JAD-230893

Predicting four-year’s Alzheimer’s onset using longitudinal neurocognitive tests and MRI data using explainable deep convolutional neural networks

Rohan Bapat 1, Da Ma 2, Tim Q Duong 1,*
PMCID: PMC13071850  NIHMSID: NIHMS2153085  PMID: 38143361

Abstract

Background:

Prognosis of future risk of dementia from neuroimaging and cognitive data is important for optimizing clinical management for patients at early stage of Alzheimer’s Disease. However, existing studies lack an efficient way to integrate longitudinal information from both modalities to improve prognosis performance.

Objective:

In this study, we aim to develop and evaluate an explainable deep learning-based framework to predict mild cognitive impairment (MCI) to AD conversion within four years using longitudinal whole-brain 3D MRI and neurocognitive tests.

Methods.

We proposed a two-stage framework that first uses a 3D convolutional neural network to extract single-timepoint MRI-based AD-related latent features, followed by multi-modal longitudinal feature concatenation and a 1D convolutional neural network to predict the risk of future dementia onset in four years.

Results.

The proposed deep learning framework showed promising to predict MCI to AD conversion within 4 years using longitudinal whole-brain 3D MRI and cognitive data without extracting regional brain volumes or cortical thickness, reaching a balanced accuracy of 0.834, significantly improved from models trained from single timepoint or single modality. The post hoc model explainability revealed heatmap indicating regions that are important for predicting future risk of AD.

Conclusions.

The proposed framework sets the stage for future studies for using multi-modal longitudinal data to achieve optimal prediction for prognosis of AD onset, leading to better management of the diseases, thereby improving the quality of life.

Keywords: dementia, mild cognitive impairment, MRI, artificial intelligence, deep learning, amyloid, tau

Introduction

Alzheimer’s disease (AD), a neurodegenerative disease that affects cognitive function, including the ability to think, learn, and remember [1], is typically diagnosed through a combination of clinical evaluation, medical history review, and various diagnostic tests, Although definite diagnosis of AD can only be made post-mortem. Neuroimaging methods, such as magnetic resonance imaging (MRI) to measure brain volume and positron emission tomography (PET), play important roles in the diagnosis and monitoring of AD. Other tests, such as blood or genetic tests, may also be used to help exclude other potential causes of cognitive decline. Mild cognitive impairment (MCI), characterized by mild but measurable declines in cognitive function, is a precursor to AD, but not all individuals with MCI will develop AD [2, 3]. While there is currently no effective treatment for AD, early diagnosis and prognosis may enable or encourage lifestyle changes and neurocognitive enrichment to improve symptoms or slow the rate of neurocognitive decline, thereby improving the quality of life [4].

Deep learning algorithms are increasingly being used in medicine due to their ability to analyze spaciotemporal relationships between large datasets and extract meaningful information that can be used for disease classification. [5, 6]. One of the most widely used deep learning algorithms for image classification is the convolutional neural network (CNN) [79]. In contrast to conventional feature-engineered machine learning approaches which use extracted regional brain volumes or thicknesses, CNN used whole-brain MRI data without the need to extract regional brain volumes or thicknesses a priori [1012]. There are many CNN studies that classify cognitive normals (CN), MCI and AD based on MRI images.

In addition to classification, deep learning models can also be used to predict outcomes. CNNs and recursive neural networks (RNNs) have been applied to predicting future onset of dementia and they have been used to predict MCI to AD conversion using 2D MRI slices or FDG-PET scans [1017]. Prediction performs better when multiple timepoints of patient data are used in principle [1820] and transfer learning can be used to advantage by applying knowledge gained from solving one task to a related task and this can be applied on longitudinal data [2125]. Most studies to date have used 2D images in which multiple slices are used to augment sample size and to reduce computation cost. Models using 3D images are more appropriate but are limited by small sample size and high computation cost [26, 27]. In addition, neurocognitive tests and other non-imaging clinical data can be incorporated into prediction models to improve accuracy [19, 28, 29]. Finally, only a few studies provide heatmaps to assist interpretability, identify neural correlates of outcomes (brain regions most relevant for prediction), and provide confidence of deep-learning results. To our knowledge, there are no studies to date using deep learning of longitudinal whole-brain 3D MRI along with incorporation of neurocognitive data to predict MCI to AD conversion.

The goal of this study was thus to develop and evaluate a novel deep-learning algorithm to predict MCI to AD conversion four years post MCI diagnosis using longitudinal whole-brain 3D MRI and neurocognitive data. Whole-brain 3D MRI volumes were used to achieve improved prognosis performance without a priori segmentation of either regional structural volumes or cortical thicknesses.

Methods

Source of data

Data from this study was obtained from the publicly available Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The aggregated dataset derived from subjects recruited in ADNI1, ADNIGO, ADNI2, and ADNI3.

Participants and cohort definition

This study was composed of two separate training steps, each of which used a different cohort of data: i) the first step used a cohort of single-timepoint cross-sectional data for classification task (classification step) using all patients that were diagnosed with either NC or dementia on their initial visit, and ii) the second step used a cohort of longitudinal data for prediction task (prediction step) using of all the MCI patients. For the prediction step, only patients with three visits, all six months apart, were considered. If a patient was diagnosed with MCI in the first visit and remained as MCI four years after the first visit, then they were classified as stable MCI (sMCI). Patients who were diagnosed with MCI on the first visit but were diagnosed with dementia four years later were classified as progressive MCI (pMCI). With this criteria, 797 patients with NC and 582 patients with dementia were selected for the classification cohort. To prevent data imbalance and bias during training, we randomly selected 500 patients from each group to ensure a balanced dataset. For the prediction cohort, 129 sMCI and 86 pMCI were selected to form a dataset of 215 patients. Table 1 summarizes these two cohorts along with their demographic information.

Table 1.

Distribution of patient age and gender within each cohort.

Cohort Age (μ ± σ) N Male Female
AD 74.23 ± 5.9 500 224 256
CN 74.70 ± 7.5 500 280 220
sMCI 72.54 ± 7.5 129 90 39
pMCI 73.72 ± 6.41 86 54 32

Outcomes

The primary outcome was conversion of MCI to clinical diagnosis of AD within 4 years from the time of the current visit.

Input variables

For both cohorts, structural T1-weighted 3D MRI scans were used as inputs to 3D convolutional neural networks (CNN). In the prediction cohort, we randomly selected one MRI scan from each of the three visits, and for the classification cohort we selected a single MRI scan from the first visit. To reduce variability in the dataset, all MRI scans were oriented and then linearly registered to the standard 1mm MNI152 image template using FSL FLIRT (cite). MRI scans were then skull-stripped using FSL BET (cite) to remove extraneous non-brain tissue. Raw volume intensities were rescaled into the range [0, 1] through min-max normalization.

In addition to the MRI scans, we also extracted the following clinical and cognitive tests for each patient: 1) Mini-Mental State Examination (MMSE), 2) Clinical Dementia Rating – Sum of Boxes (CDRSB), 3) Modified Preclinical Alzheimer Cognitive Composite with Trails test (mPACCtrailsB), 4) Modified Preclinical Alzheimer Cognitive Composite with Digit test (mPACCdigit), 5) Logical Memory Test – Delayed Total Recall (LDELTOTAL), 6) Alzheimer’s Disease Assessment Scale-Cognitive Subscale 11 (ADAS11), 7) Alzheimer’s Disease Assessment Scale-Cognitive Subscale 13 (ADAS13), 8) Rey’s Auditory Verbal Learning Test – forgetting (RAVLT_forgetting), 9) Rey’s Auditory Verbal Learning Test – immediate (RAVLT_immediate), 10) Rey’s Auditory Verbal Learning Test – learning (RAVLT_learning), 11) Functional Assessment Questionnaire (FAQ), 12) Trail Making Test-B (TRABSCOR), 13) ADAS-Cog Delayed Word Recall (Q4) and 14) APOE4. To minimize the effect of outliers during training, each variable was standardized to have a mean of 0 and standard deviation of 1 derived from the training set. We denoted these non-imaging variables as NI.

Model Development and Specification

Single-Timepoint Alzheimer Disease Classification

A modified DenseNet-BC model with 3D MRI scans as an input was used classify NC or dementia (cite). The densely connected convolutional blocks labeled “Dense Block” consist of a series of convolution operations stacked on top of each other. An overview of this architecture can be seen in Figure 1. The DenseNet used for this study had four densely connected convolutional blocks of lengths 6, 12, 32, and 24 with the normal pooling transition layer in between the dense blocks. We set the compression factor between dense blocks to 0.5, the growth factor to 32, and removed any layer-wise dropout. There was a single dropout layer before the final linear layer with a drop probability of 0.5.

Figure 1.

Figure 1.

DenseNet architecture. (A) The DenseNet used in this study consists of 4 dense blocks (Figure 1B) connected by convolution and pooling operations. The final prediction is done by pooling the output of the last DenseBlock and feeding that through a linear layer. (B) The DenseBlocks used in the DenseNet architecture. It consists of several sequential convolutions, where the input to one convolution is the concatenation of all the previous outputs. The “+” operation in this figure denotes concatenation.

Multi-Timepoints Longitudinal Disease Progression Prediction

We trained a smaller model that uses the DenseNet CNN described above to extract salient information from each MRI scan for predictions. The smaller model predicts if a patient will convert to dementia from MCI based on a feature embedding derived from the pretrained model and concatenated from multiple time points of data. To create the feature embedding, the MRI scans from each timepoint were fed into the pretrained DenseNet model. We then discarded the final classification layer, and only extracted the latent feature maps from the last convolutional layer after global average pooling and right before the classification layer, which is a one-dimensional tensor of 288 values. We also concatenate the neurocognitive scores of each timepoint as additional clinical variables to the respective tensor and then stack all three timepoints tensors together. The shape of the final feature embedding was (3, 302), since there were 14 clinical variables being concatenated for each of the three timepoints.

We then constructed a 1D CNN (Figure 2) that uses the above mentioned multi-timepoints embedded and concatenated feature as an input and computes whether the given patient will convert from MCI to dementia within four years. This 1D CNN consisted of two 1D convolutions followed by 1D max pooling, and finally a linear layer to predict class logits.

Figure 2.

Figure 2.

High-level overview of how the prediction model works. MRI scans for each timepoint are run through a pretrained CNN, and the output is combined with clinical variables for each timepoint. All three timepoints are then stacked on top of each other and then run through a smaller CNN to predict if a patient will convert from MCI to Dementia.

Training Pipeline

For classification, data were split into an 80% training and 20% testing dataset. The model was optimized through a Stochastic Gradient Descent optimizer with a batch size of 5, learning rate of 0.01, weight decay of 0.001, and Nesterov momentum factor of 0.9. The model was implemented using the PyTorch Python library and all training was done on Google Compute Platform virtual instances with Tesla T4 GPU acceleration.

For prediction, a smaller model was trained to predict if a patient will convert from MCI to Dementia within four years using three timepoints of brain MRI scans and cognitive tests. After training the classification model, the weights were frozen and used to train a smaller model that predicts whether a patient will convert from MCI to Dementia within 4 years using multiple timepoints of input MRI scans and neurocognitive scores. To prevent data leakage between cohorts, the prediction cohort only consisted of patients that were diagnosed with MCI on their first visit. Nested cross-validation was used to train and optimize the model while making sure it is robust enough to perform well on new data. Figure 3 shows how the cross-validation folds were acquired. The prediction cohort was split into five outer folds. In each training fold, four of these folds are combined to form the training set, and the fifth fold is used as an independent testing set. Each training set is split into 3 inner folds. Two of these folds were used to train the model, and the third fold is used to evaluate model performance and tune hyperparameters. After optimal hyperparameters were selected in the inner fold, the model was then retrained on the entire training set and evaluated on the independent testing set. To prevent any data imbalances, all training data splits were stratified, so there were an equal number of converters and non-converters within each subset. To comprehensively evaluate the model performance, the balanced accuracies, sensitivities, and specificities of the trained model on each validation and test set were recorded. Because the feature embeddings were much smaller than the MRI scans, the batch size was increased to 8 and the learning rate was decreased to 0.0001, and the other optimization parameters remained the same.

Figure 3.

Figure 3.

Training, testing, and validation splits acquired through nested cross-validation. The entire dataset is split into five folds; four of them are used as a training set and the fifth is used as an independent testing set to evaluate model performance. The training set is split into three inner folds, and two of them are used to train the model while the last set is used for hyperparameter tuning.

Note that, the phase one training was done using only the CN and AD subject, while the phase two training was done using the MCI patient only, and there was no “data leakage” from the validation data from the phase 1 training to the phase two training. To evaluate the effect of combining multiple data modalities, ablation studies were also designed. One alternative model was also trained separately using only MRI images, and another model was trained using only NI variables. To evaluate the effect of including multiple timepoints of data, all the experiments were run using a single timepoint of patient data, and then again with three timepoints of patient data.

Saliency-map-based model explainability

After training, the best-performing model was re-evaluated on the testing set and heatmaps were generated based on where the model focused on the most when making predictions. To visualize the most salient brain regions, the Grad-CAM technique was modified to work in 3 dimensions. Since the model reduced the resolution of the MRI volume to 2×3×2 voxels, the 3D Grad-CAM technique was applied to the output of the first convolutional block, which had an output resolution of 32×32×32 to obtain a better visualization of the heatmap. Finally, the heatmaps were averaged over all 3 timepoints. This approach enabled visual highlighting of the sections of the images that were most significant to the network.

Results

Model performance

Figure 4 shows the balanced accuracy of the classification model for each epoch as it trained. The CNN achieved an optimal accuracy of 0.88 around epoch 100 and had a loss of 0.38. After this point, the validation loss and accuracy began to level off and gains in performance decreased with increasing epochs.

Figure 4.

Figure 4.

Training accuracy and loss curves for classification model. Data is split 80/20 into a training and testing set, and then used to train the DenseNet model to classify 3D MRI scans as having NC or Dementia. The model overfits as it approaches epoch 100 and achieves an optimal testing accuracy of 88%.

Figure 5 shows the average balanced accuracy of the prediction model for each epoch as it trained using 4 timepoints of input data. This model achieved its optimal average accuracy of 0.83 and optimal loss of 0.41 around epoch 500, at which point the model performance plateaued. Multiple experiments were run to compare how the model performance changes based on the input modality, and the validation and testing performance from each of these experiments for one time point and three time points are recorded in Table 2. For single timepoint experiments, the NI+MRI model outperformed both the single data modality models with a testing balanced accuracy of 0.8075. Similarly, the three timepoint NI+MRI model outperformed both the single data modality models with a testing balanced accuracy of 0.8347. In all experiments, the three timepoint models outperformed the one and two timepoint models and the NI+MRI models outperformed the individual NI and MRI models. This accuracy is an average of the optimal accuracy in each testing fold, rather than the average accuracy at a single epoch. In the experiments that achieved a balanced accuracy of 0.8347, the optimal accuracy in each testing fold occurred at epochs 127, 148, 429, 456, and 510.

Figure 5.

Figure 5.

Average training and validation accuracy of prediction model over all cross-validation folds for each epoch. The darker line represents the average accuracy over all folds during training, and the lighter shaded region is the lower and upper accuracy bounds encountered during training. The green curve corresponds to the performance on the training subset, and the red curve corresponds to the performance on the validation subset.

Table 2.

Model performance trained and evaluated on one timepoint of MRI scans and clinical data for (A) 1 time point and (B) 3 time points. Validation metrics were acquired by averaging model performance over all inner validation folds. Testing metrics were acquired by averaging model performance over all outer testing folds.

(A) 1 Timepoint Validation Metrics (mean ± std dev) Testing Metrics (mean ± std dev)
BA Sensitivity Specificity AUC BA Sensitivity Specificity AUC
NI @ 1tp 0.7851 ± 0.0409 0.8494 ± 0.0912 0.8187 ± 0.0973 0.8457 ± 0.0333 0.7965 ± 0.070 0.8510 ± 0.0890 0.7961 ± 0.074 0.8434 ± 0.0597
MRI @ 1tp 0.6648 ± 0.0531 0.8222 ± 0.1075 0.6961 ± 0.1602 0.7087 ± 0.0959 0.6599 ± 0.0567 0.7823 ± 0.0669 0.6026 ± 0.1875 0.7048 ± 0.0974
NI + MRI @ 1tp 0.7987 ± 0.0547 0.9059 ± 0.0832 0.8354 ± 0.0809 0.8638 ± 0.0483 0.8075 ± 0.0572 0.8705 ± 0.0517 0.8307 ± 0.1035 0.8696 ± 0.0606
(B) 3 Timepoint Validation Metrics (mean ± sth dev) Testing Metrics (mean ± sth dev)
BA Sensitivity Specificity AUC BA Sensitivity Specificity AUC
NI @ 3tp 0.8049 ± 0.0363 0.8486 ± 0.1055 0.9689 ± 0.0523 0.8938 ± 0.0248 0.8195 ± 0.0547 0.8366± 0.1139 0.9375± 0.0708 0.8967 ± 0.0535
MRI @ 3tp 0.7382 ± 0.0401 0.8232± 0.1789 1.0 ± 0.0 0.7506 ± 0.0732 0.7632 ± 0.0535 0.8235 ± 0.1644 1.0 ± 0.0 0.7745 ± 0.0686
NI + MRI @ 3tp 0.8100 ± 0.0325 0.913± 0.0986 0.9499 ± 0.0705 0.9148 ± 0.0486 0.8347 ± 0.1206 0.8941± 0.1206 0.992± 0.0179 0.9191 ± 0.0560

To visualize the important regions that the model focused on when making the prediction, heatmaps were generated using the 3D Grad-CAM algorithm. The average heatmap over all patients within the testing set is shown in Figure 6. The regions on MRI that were important for prediction were putamen, thalamus, amygdala, frontal pole, frontal gyrus, and planum polare.

Figure 6.

Figure 6.

Heatmap visualization for a single patient in the prediction task. The model predicted if the patient would convert from MCI to Dementia and used the Grad-CAM algorithm to compute a heatmap of the salient features it looked at from the MRI scan. This heatmap was superimposed onto the original MRI scan and then averaged over all three timepoints. Areas highlighted in red were most important in the prediction, and each row represents a different slice in the different anatomical planes.

Discussion

This study developed and evaluated DL algorithms to predict which MCI patients would convert to AD within four years using longitudinal neurocognitive test scores whole-brain 3D MRI without a priori segmentation of regional brain volumes or cortical thicknesses. MRI data used for prediction were obtained at baseline, six months after baseline, and one year after MCI diagnosis. The best CNN model for predicting MCI conversion to AD at 4 years yielded a BA of 0.8347 ± 0.0261. Saliency maps highlight the brain regions (putamen, thalamus, amygdala, frontal pole, frontal gyrus, and planum polare) that the best model deems important for prediction.

Transfer learning

The neural network used for MRI feature extraction in this study was first trained to classify between normal cognition and dementia using MRI scans. This transfer learning approach has been found to be more effective than training a single large model on the final prediction task because it requires much less data for each cohort and shrinks the parameter space for the final task, resulting in faster training times [19]. However, this also means the performance of the prediction model is dependent on the quality of features extracted by the classification model. Earlier studies were able to train 3D CNNs with an optimal classification accuracy of 82% [24] and 86% [13] and use them as feature extractors, and thus our classification model was a good candidate for transfer learning.

Multiple time points

To make predictions that factor in the temporal relationship between multiple timepoints of feature embeddings, we trained a 1D CNN, which achieved an improved performance in predicting risk of dementia onset in four years. This novel approach decoupled the computationally intensive 3D spatial MRI analysis from the smaller model that focuses on the temporal relationship within the data. The 1D convolutions also allowed us to combine the MRI feature embeddings and NI values and use them for prediction without any bias towards one type of data or the other. To our knowledge, there are no other studies that use a 1D CNN for longitudinal analysis of dementia. In the current study, we choose to use 1D CNN that can take the concatenated feature embeddings as multi-channel input, thereby effectively captureing the temporal information from multi-timepoint. Alternative approaches can also be considered to capture the temporal information from longitudinal data across multiple timepoint, such as recurrent neural network (RNN), specifically LSTM [3537]. These RNN-based approach would be more effective to learn longitudinal patterns for longer sequence data with more number of timepoints or with varing length of time sequence. On the other hand, the research question that the current study aims to tackle focuses on utilizing a fixed longitudinal time window of four years, for which a 1D CNN is suitable to capture the inter-timepoint through the convolution kernal.

Heatmaps

Deep learning models are often seen as “black boxes” that can be difficult to interpret. This is especially problematic in medical image diagnosis, where it is important to understand why a particular diagnosis was made. While efforts have been made to develop methods to explain the decision-making process of deep learning models, the explainability of these models is still limited.

Heatmaps enabled visualization the distribution of the attentions of the model across different functional and anatomical regions in the brain. Such visualizations reflect brain regions that were most relevant to ML algorithms to predict MCI and AD conversion, and could potentially improve the explainability of deep-learning models. In our study, the most salient brain structures shown on the heatmaps were the putamen, thalamus, amygdala, frontal pole, frontal gyrus, and planum polare. Although the brain structures implicated in the progression from MCI and AD in the literature varied depending on dataset, variables included in the models, prediction models used among others, most of these regions are known to be frequently affected by neurodegenerative diseases like dementia. A meta-analysis by Zakzanis et al. found that the temporal lobes, amygdala, thalamus, and hippocampi exhibit the most significant differences when comparing normal brains to those with dementia [30]. Other studies using the ADNI dataset have also shown that the gyri, anterior hippocampus and amygdala tend to have more pronounced deterioration within the pMCI cohort, which supports the results from our heatmaps [31]. Some studies that applied CNNs for classification in the ADNI dataset also found the same structures to be important in heatmaps generated through gradient and occlusion-based algorithms [32]. Some regions like the frontal gyrus and planum polare were consistently highlighted by our neural network but were not found to be subject to significant deterioration in other studies concerning dementia. Other brain regions that have been shown to be associated with development of AD, such as the default node network and ventricles, were not consistently highlighted in the heatmaps.Differences in findings could be due to different dementia subtypes, heterogeneity of various cohorts, sample sizes, data types included, and methodologies, among others.

Prior studies:

Previous studies have shown that models that used multiple timepoints or multiple modality data modalities improve prediction of MCI to dementia conversion compared to those using a single timepoint or a single data modality. Grassi et al. trained several models to predict the onset of dementia from patient demographic information and clinical test scores and found that these models reached an optimal balanced accuracy of 79% [22]. Similarly, Zhang et al. also obtained an accuracy of 78.79% for this task, but they trained a 3D CNN that used structural MRI scans as an input rather than using non-imaging data [21]. Both studies used only a single data modality as inputs to their models. Other studies by Spasov et al. and Huang et al. have explored combining MRI data with non-imaging data and PET data to train more comprehensive models and achieved accuracies of 86% and 76% respectively [23, 25]. These studies were successful in combining multiple types of imaging and non-imaging data to make predictions but did not explore how the change in this data over time can aid prediction. Our study built on prior studies to include longitudinal MRI and non-imaging data to predict the onset of dementia.

Previous studies that have utilized longitudinal data to train models for this task have achieved performance levels like those of models based on single timepoints and have also produced useful visualizations of important brain regions that are involved in prediction. Ocasio et al. evaluated different CNN approaches and found the sequential convolutional approach yielded slightly better performance than the residual-based architecture, the zero-shot transfer learning approach yielded better performance than fine tuning, and CNN using longitudinal data performed better than CNN using a single timepoint MRI in predicting MCI conversion to AD [18]. The best CNN model for predicting MCI conversion to AD at three years yielded a balanced accuracy of 0.793. Heatmaps of the prediction model showed regions most relevant to the network including the lateral ventricles, periventricular white matter and cortical gray matter. Our study built on Ocasio et al.’s study by replacing the Siamese network architecture with a two-phase training methodology, which allowed us to incorporate more than two timesteps of patient data. Additionally, we adopted a much larger residual CNN architecture and incorporated non-imaging data into the final model.

Limitations and future directions

The sample size of MRI data was small compared to feature space that could result in overfitting, although the two-stage framework helped train the embedding model on the limited data set. A drawback of the two-stage framework was that the prediction model learned the temporal relationships between lossy feature embeddings rather than full MRI scans. Future studies may investigate an end-to-end approach to predicting conversion from MCI to AD where the model can learn temporal relationships using the entire MRI scan. Large and diverse datasets are needed to improve generalizability.

This study only explored a single type of CNN architecture and evaluated one way of creating a feature extractor model. This model doesn’t need to be created using a classification cohort consisting of normal/dementia patients, so future studies should explore other ways of pretraining a feature extracting network such as convolutional autoencoders. Additionally, we only used a subset of the available NI data in ADNI. With more advanced data filling and analysis, it may be possible to include more descriptive clinical variables or cognitive tests.

The study used only anatomical MRI data. Multiparametric MRI (such as diffusion-tensor imaging, task functional MRI and resting-state MRI) will be incorporated into these models in the future. Similarly, other modalities such at PET and additional non-imaging clinical data can also be included in the model [20]. Other DL methods, such as deep survival analysis [33, 34] should also be explored to the prediction of MCI conversion to AD.

Conclusions

We developed a deep-learning model to predict MCI to AD conversion within 4 years using longitudinal whole-brain 3D MRIs and along with neurocognitive test scores. This framework sets the stage for further studies of additional data time points, different image types, and non-image data to further improve prediction accuracy of MCI to AD conversion. Accurate prognosis could lead to better management of the diseases, thereby improving the quality of life.

Acknowledgements and Funding

DM is funded by Wake Forest University School of Medicine Alzheimer’s Disease Research Center (P30AG072947) and Wake Forest Center for Artificial Inteligence, Center for Biomedical Informatics Pilot Award.

Footnotes

Conflict of Interest

The authors have no conflict of interest to report.

Data Availability

The raw data that was processed and used for model evaluation for this study is open and publicly available at the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and was acquired by completing an application at https://adni.loni.usc.edu/.

The source code for all experiments conducted and patients IDs used during this study is available at https://github.com/rbapat/predict-conversion.

References

  • [1].McKhann GM, Knopman DS, Chertkow H, Hyman BT, Jack CR Jr., Kawas CH, Klunk WE, Koroshetz WJ, Manly JJ, Mayeux R, Mohs RC, Morris JC, Rossor MN, Scheltens P, Carrillo MC, Thies B, Weintraub S, Phelps CH (2011) The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement 7, 263–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Petersen RC, Smith GE, Waring SC, Ivnik RJ, Tangalos EG, Kokmen E (1999) Mild cognitive impairment: clinical characterization and outcome. Arch Neurol 56, 303–308. [DOI] [PubMed] [Google Scholar]
  • [3].Jak AJ, Bondi MW, Delano-Wood L, Wierenga C, Corey-Bloom J, Salmon DP, Delis DC (2009) Quantification of five neuropsychological approaches to defining mild cognitive impairment. Am J Geriatr Psychiatry 17, 368–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Epperly T, Dunay MA, Boice JL (2017) Alzheimer Disease: Pharmacologic and Nonpharmacologic Therapies for Cognitive and Functional Symptoms. Am Fam Physician 95, 771–778. [PubMed] [Google Scholar]
  • [5].de Bruijne M (2016) Machine learning approaches in medical image analysis: From detection to diagnosis. Med Image Anal 33, 94–97. [DOI] [PubMed] [Google Scholar]
  • [6].Erickson BJ, Korfiatis P, Akkus Z, Kline TL (2017) Machine Learning for Medical Imaging. Radiographics 37, 505–515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324. [Google Scholar]
  • [8].Krizhevsky A, Sutskever I, Hinton GE (2012) in Advances in neural information processing systems, pp. 1097–1105. [Google Scholar]
  • [9].Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. [Google Scholar]
  • [10].Lian C, Liu M, Zhang J, Shen D (2018) Hierarchical Fully Convolutional Network for Joint Atrophy Localization and Alzheimer’s Disease Diagnosis Using Structural MRI. IEEE Trans Pattern Anal Mach Intell 42, 880–893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Liu M, Zhang J, Adeli E, Shen D (2018) Landmark-based deep multi-instance learning for brain disease diagnosis. Med Image Anal 43, 157–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Wen J, Thibeau-Sutre E, Diaz-Melo M, Samper-Gonzalez J, Routier A, Bottani S, Dormont D, Durrleman S, Burgos N, Colliot O, Alzheimer’s Disease Neuroimaging I, Australian Imaging B, Lifestyle flagship study of a (2020) Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation. Med Image Anal 63, 101694. [DOI] [PubMed] [Google Scholar]
  • [13].Lin W, Tong T, Gao Q, Guo D, Du X, Yang Y, Guo G, Xiao M, Du M, Qu X, TAsDNI (2018) Convolutional Neural Networks-Based MRI Image Analysis for the Alzheimer’s Disease Prediction From Mild Cognitive Impairment. Frontiers in Neuroscience 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Shmulev Y, Belyaev M (2018) in Graphs in Biomedical Image Analysis and Integrating Medical Imaging and Non-Imaging Modalities, eds. Stoyanov D, Taylor Z, Ferrante E, Dalca AV, Martel A, Maier-Hein L, Parisot S, Sotiras A, Papiez B, Sabuncu MR, Shen L Springer International Publishing, Cham, pp. 83–91. [Google Scholar]
  • [15].Basaia S, Agosta F, Wagner L, Canu E, Magnani G, Santangelo R, Filippi M, Alzheimer’s Disease Neuroimaging I (2019) Automated classification of Alzheimer’s disease and mild cognitive impairment using a single MRI and deep neural networks. Neuroimage Clin 21, 101645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Liu M, Cheng D, Yan W, Initiative AsDN (2018) Classification of Alzheimer’s Disease by Combination of Convolutional and Recurrent Neural Networks Using FDG-PET Images. Frontiers in Neuroinformatics 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Gorji Kaabouch (2019) A Deep Learning approach for Diagnosis of Mild Cognitive Impairment Based on MRI Images. Brain Sciences 9, 217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Yue L, Gong X, Chen K, Mao M, Li J, Nandi AK, Li M IEEE. [Google Scholar]
  • [19].Ocasio E, Duong TQ (2021) Deep learning prediction of mild cognitive impairment conversion to Alzheimer’s disease at 3 years after diagnosis using longitudinal and whole-brain 3D MRI. PeerJ Comput Sci 7, e560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Cao E, Ma D, Nayak S, Duong TQ (2022, submitted) Deep Learning combining FDG-PET and neurocognitive data accurately predicts MCI conversion to Alzheimer’s disease 3-year post MCI diagnosis. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Zhang J, Zheng B, Gao A, Feng X, Liang D, Long X (2021) A 3D densely connected convolution neural network with connection-wise attention mechanism for Alzheimer’s disease classification. Magnetic Resonance Imaging 78, 119–126. [DOI] [PubMed] [Google Scholar]
  • [22].Grassi M, Rouleaux N, Caldirola D, Loewenstein D, Schruers K, Perna G, Dumontier M, Alzheimer’s Disease Neuroimaging I (2019) A Novel Ensemble-Based Machine Learning Algorithm to Predict the Conversion From Mild Cognitive Impairment to Alzheimer’s Disease Using Socio-Demographic Characteristics, Clinical Information, and Neuropsychological Measures. Front Neurol 10, 756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Spasov S, Passamonti L, Duggento A, Liò P, Toschi N (2019) A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer’s disease. NeuroImage 189, 276–287. [DOI] [PubMed] [Google Scholar]
  • [24].Bae J, Stocks J, Heywood A, Jung Y, Jenkins L, Hill V, Katsaggelos A, Popuri K, Rosen H, Beg MF, Wang L (2021) Transfer learning for predicting conversion from mild cognitive impairment to dementia of Alzheimer’s type based on a three-dimensional convolutional neural network. Neurobiology of Aging 99, 53–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Huang Y, Xu J, Zhou Y, Tong T, Zhuang X, Alzheimer’s Disease Neuroimaging I (2019) Diagnosis of Alzheimer’s Disease via Multi-Modality 3D Convolutional Neural Network. Front Neurosci 13, 509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Wang S, Wang H, Shen Y, Wang X (2018) in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 517–523. [Google Scholar]
  • [27].Korolev S, Safiullin A, Belyaev M, Dodonova Y (2017) in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 835–838. [Google Scholar]
  • [28].Bhagwat N, Viviano JD, Voineskos AN, Chakravarty MM, Alzheimer’s Disease Neuroimaging I (2018) Modeling and prediction of clinical symptom trajectories in Alzheimer’s disease using longitudinal data. PLoS Comput Biol 14, e1006376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Ostertag C, Beurton-Aimar M, Urruty T (2019) in 10th International Conference on Pattern Recognition Systems (ICPRS-2019) IET, pp. 18–23. [Google Scholar]
  • [30].Zakzanis KK, Graham SJ, Campbell Z (2003) A meta-analysis of structural and functional brain imaging in dementia of the Alzheimer’s type: a neuroimaging profile. Neuropsychol Rev 13, 1–18. [DOI] [PubMed] [Google Scholar]
  • [31].Misra C, Fan Y, Davatzikos C (2009) Baseline and longitudinal patterns of brain atrophy in MCI patients, and their use in prediction of short-term conversion to AD: Results from ADNI. NeuroImage 44, 1415–1422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Dyrba M, Hanzig M, Altenstein S, Bader S, Ballarini T, Brosseron F, Buerger K, Cantré D, Dechent P, Dobisch L, Düzel E, Ewers M, Fliessbach K, Glanz W, Haynes J-D, Heneka MT, Janowitz D, Keles DB, Kilimann I, Laske C, Maier F, Metzger CD, Munk MH, Perneczky R, Peters O, Preis L, Priller J, Rauchmann B, Roy N, Scheffler K, Schneider A, Schott BH, Spottke A, Spruth EJ, Weber M-A, Ertl-Wagner B, Wagner M, Wiltfang J, Jessen F, Teipel SJ, for the Adni ADsg (2021) Improving 3D convolutional neural network comprehensibility via interactive visualization of relevance maps: evaluation in Alzheimer’s disease. Alzheimer’s Research & Therapy 13, 191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Ranganath R, Perotte A, Elhadad N, Blei D (2016) in Proceedings of the 1st Machine Learning for Healthcare Conference, eds. Finale D-V, Jim F, David K, Byron W, Jenna W PMLR, Proceedings of Machine Learning Research, pp. 101--114. [Google Scholar]
  • [34].Nakagawa T, Ishida M, Naito J, Nagai A, Yamaguchi S, Onoda K, Initiative AsDN (2020) Prediction of conversion to Alzheimer’s disease using deep survival analysis of MRI images. Brain communications 2, fcaa057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Cui R, Liu M, Alzheimer’s Disease Neuroimaging Initiative. RNN-based longitudinal analysis for diagnosis of Alzheimer’s disease. Computerized Medical Imaging and Graphics. 2019. Apr 1;73:1–0. [DOI] [PubMed] [Google Scholar]
  • [36].Nguyen M, He T, An L, Alexander DC, Feng J, Yeo BT, Alzheimer’s Disease Neuroimaging Initiative. Predicting Alzheimer’s disease progression using deep recurrent neural networks. NeuroI [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Aqeel A, Hassan A, Khan MA, Rehman S, Tariq U, Kadry S, Majumdar A, Thinnukool O. A long short-term memory biomarker-based prediction framework for Alzheimer’s disease. Sensors. 2022. Feb 14;22(4):1475. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The raw data that was processed and used for model evaluation for this study is open and publicly available at the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and was acquired by completing an application at https://adni.loni.usc.edu/.

The source code for all experiments conducted and patients IDs used during this study is available at https://github.com/rbapat/predict-conversion.

RESOURCES