Abstract
Computer-aided diagnosis (CAD) systems for medical images are seen as effective tools to improve the efficiency of diagnosis and prognosis of Alzheimers disease (AD). The current state-of-the-art models for many images analyzing tasks are based on Convolutional Neural Networks (CNN). However, the lack of training data is a common challenge in applying CNN to the diagnosis of AD and its prodromal stages. Another challenge for CAD applications is the controversy between the requiring of longitudinal cortical structural information for higher diagnosis/prognosis accuracy and the computing ability for processing varied imaging features. To address these two challenges, we propose a novel computer-aided AD diagnosis system CNN-Multitask Stochastic Coordinate Coding (MSCC) which integrates CNN with transfer learning strategy, a novel MSCC algorithm and our effective AD-related biomarkers–multivariate morphometry statistics (MMS). We applied the novel CNN-MSCC system on the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset to predict future cognitive clinical measures with baseline Hippocampal/Ventricle MMS features and cortical thickness. The experimental results showed that CNN-MSCC achieved superior results. The proposed system may aid in expediting the diagnosis of AD progress, facilitating earlier clinical intervention, and resulting in improved clinical outcomes.
Keywords: Computer-aided Diagnosis, Multi-task Dictionary Learning, Convolutional Neural Networks (CNN), Transfer Learning, Alzheimer’s Disease
1. Introduction
AD and its early stage, Mild Cognitive Impairment (MCI), are becoming the most prevalent neurodegenerative brain diseases in elderly people worldwide [1]. Mini Mental State Examination (MMSE) and Alzheimers Disease Assessment Scale cognitive subscale (ADAS-Cog) are well-known AD assessment scales [2, 3]. Clinicians and researchers have used them to evaluate currently individual cognitive decline led by AD factors. It is valuable to predict future cognitive progression for an early intervention or prevention.
Brain changes due to AD occur even before amnestic symptoms appearing [4], AD Studies on magnetic resonance (MR) images have shown that objective brain structure measures, such as hippocampal, ventricle structures and cortical thickness, could identify significant cortical structural deformations related with AD pathology [5, 6] even before observing obviously lower MMSE/ADAS-Cog scores [7–9]. Our previous studies proposed a novel 3D surface measure, multivariate morphometry statistics (MMS), consisting of multivariate tensor-based morphometry (mTBM) and radial distance (distances from the medial core to each surface point) and demonstrated that MMS of ventricles and hippocampi are potential preclinical AD imaging biomarkers [10, 11]. Our previous work [12, 13] applied cortical thickness measures to predict future clinical scores related with AD symptoms. Seldom existing CAD systems considered the inherent correlations among the above effective brain structure measures that may be useful for robust and accurate predictions of MMSE/ADAS-Cog scales. So this work expect to develop a CAD system using these three structure measures to predict future MMSE/ADAS-Cog scales, and then by referring the scale trends, the clinicians can provide early intervention or prevention to slow or even stop the degenerate trends.
Deep learning models [14, 15] are capable of learning the hierarchical structure of features extracted from real-world images. Convolutional Neural Networks (CNNs) are a class of multi-layer, fully trainable models that are able to capture highly nonlinear mappings between inputs and outputs [16]. Recently, CNNs have been successfully applied to the domain of brain imaging analysis, including image classification [17], segmentation [18], and autistic diagnosis [19]. But there is still little CNNs studies on AD diagnosis, a key challenge is a lack of training data. Transfer learning (TF) can help feature learning in the data-scarce target domain by transferring knowledge from similar data-rich source domain, it is potential to address this challenge [15, 20].
After using CNNs with TF, we confront another challenge that is high dimensional feature maps derived from small number of individual MR images in AD research domain. To address this so called large p, small n problem, dictionary learning was proposed to use a small number of basis vectors termed dictionary to represent high dimensional features effectively and concisely [21–23]. However, most existing studies on dictionary learning focus on the prediction of target on a single brain structure measure [24, 25]. In general, a joint analysis of tasks from multiple cortical measures is expected to improve the performance but remains a challenging problem.
Recently, Multi-Task Learning (MTL) has been successfully used on regression under the different cortical structure measures [26]. Further, there have been studies to combine TF and MTL together to solve the issue of small sample size. Zhang et al. integrated CNN, transfer learning and MTL for biological image analysis [27]. These studies indicated that integrating abnormal features among different cortical structures performed better than using single type of structural measures. Based on MTL, we made a multi-task Stochastic Coordinate Coding (MSCC) algorithm to partition the dictionaries into the common and individual parts and more suitable for the situation of individual features of varied structural measures. And we proposed a CAD system CNN-MSCC which utilized TF strategy to get one initial CNN model pre-trained from millions of images in ImageNet dataset [28, 29], employed MSCC to refine and fuse features from varied structural measures, and performed the Lasso [30] to predict future MMSE/ADAS-Cog scales representing AD progression. The proposed algorithm CNN-MSCC aims to resolve the CNN application challenges of the limited sample size, high dimension feature maps refining and multiple sources of features integrations.
Our main contributions can be summarized as follows:
-
–
We employ transfer learning and CNN to explore whether the transfer learning property of CNN can be enhanced to generate features from geometry meshes of biological images since the current bottleneck for CNNs to be applied to many biological problems is the limited amount of available labeled training data. We pre-train the deep neural network on the ImageNet data and transfer the knowledge of natural images to generate the neuroimaging features for the real world application.
-
–
We considered the variance of subjects from different cortical structure measures and proposed a novel unsupervised dictionary learning method, termed Multi-task Stochastic Coordinate Coding (MSCC), learning the different tasks simultaneously and utilizing shared and individual dictionary to encode both consistent and varied imaging features. To the best of our knowledge, it is the first deep model to integrate multi-task learning with dictionary learning research for brain imaging analysis.
-
–
We tested CNN-MSCC on three baseline brain structure measures to better predict the future clinical cognitive scores. Specifically, we used multiple baseline structure measures as multiple tasks input to predict three future time points clinical scores. Our new approach is able to boost the performance of diagnoses ranging from cognitively unimpaired to AD.
2. Multi-task Dictionary Learning based Convolutional Neural Networks
Our first goal here is to explore whether this transfer learning framework of CNN can be generalized to biological image studies. Specifically, we pre-train the CNN model using ImageNet data [28], containing millions of labeled natural images with thousands of categories to obtain initial parameters and subsequently generate the features on the longitudinal data for each task. In the experiments, we apply Alexnet [17], which contains 7 layers, including convolutional layers with fixed filter sizes and different numbers of feature maps. We employ rectified non-linearity, max-pooling on each layer in our CNN model. We pretrain the CNN model on the ImageNet dataset, then remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet). Finally, we treat the rest of the CNN as a fixed feature extractor for the publicly available Alzheimer’s Disease Neuroimaging Initiative (ADNI) database [31].
We further propose to use multi-task learning strategy to boost the future clinical score regression accuracy. The entire pipeline of our method is illustrated in Fig. 1. To be specific, we train the deep CNN model on the Imagenet dataset firstly. Then we employ the pretrained network as a feature extractor for the ADNI dataset from multiple baseline structure measures. The AlexNet has a seven layer structure deep neural network. As a result, we generate seven-deep output features for each structure measure. We further employ MSCC to conduct the multi-task learning simultaneously, generating the sparse features and dictionaries from the deep features of different structure measures. In MSCC, we utilize shared and individual dictionaries to encode both consistent and varied imaging features along multiple structure measures. In the end, we employ the sparse codes generated from MSCC to perform the Lasso [30] and predict the future AD progression. MSCC is one kind of online learning methods and the advantage of online learning method is to solve the cases that the size of the input data might be too large (sample size up to 2867562 in this work) to fit into memory or the input data comes in a form of a stream.
Fig. 1.
The streamline of our proposed framework. We pre-train the deep CNN model on the Imagenet dataset and use the pre-trained model as a feature extractor for the ADNI dataset. We employ the extracted features from three cortical structure measures to conduct the multi-task dictionary learning for AD progression prediction, generating the sparse features for different structure measures. Finally, we use Lasso regression on the learnt features to predict future MMSE and ADAS-Cog scores.
3. Multi-task Stochastic Coordinate Coding
3.1. Dictionary Learning
Given a finite training set of signals where . Each xi is an image patch and . Dictionary learning aims to learn a dictionary D where and a sparse code matrix . The original signals X is modeled by a sparse linear combination of D and Z as X ≈ DZ. Given one image patch xi, we can formulate the following optimization problem:
| (1) |
where , Dj denotes the jth column of D. λ is the positive regularization parameter. zi is the learnt sparse codes for xi and .
The optimization of Eq.1 can be decomposed into an alternative learning process in the Online Dictionary Learning methods (ODL) [22]. Given each image patch xi, ODL keeps the D fixed and learn zi, then keep zi fixed and learn D. The learning process runs κ (a fixed constant) iterations until there are no more changes on D and Z.
3.2. The Proposed Algorithm
Given features from T different tasks: , our objective is to learn a set of sparse codes for each task where and . pt is the image patch number of Xt (i.e. cortical measure),n is the number of subjects for Xt and lt is the dimension of each sparse code in Zt. When employing the ODL to learn the sparse codes Zt by Xt individually, we obtain a set of dictionary but there is no correlationship between learnt dictionaries. Another solution is to construct the features into one matrix X to obtain the dictionary D. However, if there is no latent common information shared by the same subject with different cortical structure measures, only one dictionary D is not enough to show the variation among features from different structure measures. Such fact is supposed to be easily revealed in the variance of dictionary atoms and the sparsity of their corresponding sparse code matrices. To address this challenge, we integrate the idea of multi-task learning into the online dictionary learning method. We propose a novel dictionary learning algorithm, termed as Multi-task Stochastic Coordinate Coding (MSCC), to learn the sparse codes of subjects from different structure measures. In this work each task corresponds one special structure measure
Algorithm 1.
Multi-task Sparse Coordinate Coding
| Require: Samples from different structure measures: and for each |
| Ensure: Dictionaries and sparse codes for each structure measure: and |
| 1: for k = 1 to κ do |
| 2: for t = 1 to T do |
| 3: for i = 1 to nt do |
| 4: Get an image patch xt(i) from sample Xt. |
| 5: Update . |
| 6: Update and index set by a few steps of CCD: |
| 7: . |
| 8: Update the and by one step SGD: |
| 9: . |
| 10: Normalize and based on the index set . |
| 11: Update the shared dictionary . |
| 12: end for |
| 13: end for |
| 14: end for |
For the subjects’ feature matrix Xt of a particular task, MSCC learns a dictionary Dt and sparse codes Zt. Dt is composed of two parts: where and . is the same among all the learnt dictionaries while is different from each other and only learnt from the corresponding subjects’ feature matrix Xt. Therefore, objective function of MSCC can be reformulated as follows:
| (2) |
where and is the jth column of Dt.
Fig. 2 illustrates the framework of MSCC with features of ADNI from three different tasks (i.e. cortical structure measures), which represents as X1, X2 and X3, respectively. Through the multi-task learning process of MSCC, we obtain the dictionary and sparse codes for features from each task t: Dt and Zt. In MSCC, a dictionary Dt is composed by a shared part and an individual part . For the individual part of dictionaries, MSCC learns a different only from the corresponding feature matrix Xt. We vary the number of columns in to introduce the variant in the learnt sparse codes Zt. As a result, the dimensions of learnt sparse codes matrix Zt are different from each other.
Fig. 2.
Illustration of the learning process of MSCC.
The initialization of dictionaries in MSCC is critical to the entire learning process. We propose a random patch method to initialize the dictionaries from different tasks. The main idea of the random patch method is to randomly select l image patches from n subjects to construct D where . It is a similar way to perform the random patch approach in MSCC. In MSCC, the way we initialize is to randomly select subjects’ feature from features’ matrices across different tasks to construct it. For the individual part of each dictionary, we randomly select subjects’ feature from the corresponding matrix Xt to construct . After initializing dictionary Dt for each task, we set all the sparse code Zt to be zero at the beginning. The key steps of MSCC are summarized in Algorithm 1.
In algorithm 1, k denotes the epoch number where . Φ represent the shared part of each dictionary Dt which is initialized by the random patch method. For each subject’s feature xt(i) extracted from Xt, we learn the ith sparse code from Zt by several steps of Cyclic Coordinate Descent (CCD) [32]. Then we use learnt sparse codes to update the dictionary and by one step Stochastic Gradient Descent (SGD) [33]. Since is very sparse, we use the index set to record the location of non-zero entries in to accelerate the update of sparse codes and dictionaries. Φ is updated in the end of kth iteration to ensure is the same among all the dictionaries.
The learning process of sparse code is shown in algorithm 2. At first, we generate the non-zero index set by one step of CCD to record the non-zero entry of . Then we perform S steps CCD to update the sparse codes only on the non-zero entries of , accelerating the learning process significantly. Ω is a sparse matrix multiplication function that has three input parameters. Take as an example, A denotes a matrix, b is a vector and l is an index set that records the locations of non-zero entries in b. The return value of function Ω is defined as: . When multiplying A and b, we only manipulate the non-zero entries of b and corresponding columns of A
Algorithm 2.
Updating sparse codes
| Require: The image patch , dictionaries and , sparse codes and index set |
| Ensure: The updated sparse code and the index set . |
| 1: for j = 1 to lt do |
| 2: |
| 3: |
| 4: if then |
| 5: Put j into the index set . |
| 6: end if |
| 7: end for |
| 8: for s = 1 to S do |
| 9: for every element μ in the index set do |
| 10: |
| 11: |
| 12: end for |
| 13: end for |
Algorithm 3.
Updating dictionaries and
| Require: The image patch , dictionaries and , sparse codes and index set . |
| Ensure: The updated dictionaries and |
| 1: Update the Hessian matrix . |
| 2: |
| 3: for j = 1 to p do |
| 4: for every element μ in the index set do |
| 5: |
| 6: end for |
| 7: end for |
based on the index set I, speeding up the calculation by utilizing the sparsity of b. Γ is the soft thresholding shrinkage function [34] and the definition of Γ is given by: .
The procedure of updating dictionaries is shown in Algorithm 3. We perform one step SGD to update the dictionaries: and . The learning rate is set to be an approximation of the inverse of the Hessian matrix , which is updated by the sparse codes in kth iteration. For the μth column of dictionary, we set the learning rate as the inverse of the diagonal element of the Hessian matrix, which is . Since in equation (2), it is necessary to normalize the dictionaries after updating them. We can perform the normalization on the corresponding columns of non-zero entries from because the dictionaries updating only occurs on these columns. Utilizing the non-zero information from can accelerate the whole learning process.
4. Experiments
AD and its early stage, Mild Cognitive Impairment (MCI), are becoming the most prevalent neurodegenerative brain diseases in elderly people worldwide [1]. To this end, there have been a lot of studies [5, 7, 11, 35] on investigating the underlying biological or neurological mechanisms and also discovering biomarkers for early diagnosis of AD and MCI. Data for testing the performances of our proposed framework were obtained from the ADNI database (adni.loni.usc.edu). [36], which has been considered as the benchmark database for performance evaluation of various methods for AD diagnosis. The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI is to test whether biological markers such as serial MRI and positron emission tomography (PET), combined with clinical and neuropsychological assessment can measure the progression of MCI and early AD. The structural MR images were acquired from 1.5T scanners. The raw MR images and MMSE/ADAS-Cog scales were downloaded from the public ADNI website (www.loni.ucla.edu/ADNI). The information about subjects criteria are in www.adni-info.org. In the experimental dataset, there are 837 baseline subjects between 68–82 years of age.
4.1. Image Patches of Three Cortical Structure Measures
We utilized three baseline structural measures of brain, which are hippocampal multivariate morphometry statistics (MMS), lateral ventricle MMS and cortical thickness. For the hippocampal surface features, we used FIRST software [37] and marching cube method [38] to automatically segment and reconstructed hippocampal surfaces for each brain MR image. Then, we registered and computed hippocampal MMS [39]. For each subject, we obtained a 120,000 dimensional features of the hippocampal surfaces and we used a 50 × 50 window to obtained a collection of image patches, then we get 220968 baseline hippocampal image patches.
For the ventricular surface features we did the following. First, we segmented images of the lateral ventricles to build the ventricular structure surface models using a level-set based topology preserving method [40]. Then we computed surface registrations using the canonical holomorphic one-form segmentation method [41]. Finally, MMS [39] were computed and obtained as a 308,247 dimensional features of the ventricular surfaces for each subject. The cortical thickness was computed by FreeSurfer [42] which deforms the white surface to pial surface and measures deforming distance as the cortical thickness. The spherical parameter surface and weighted spherical harmonic representation [43] are used to register pial surfaces across subjects, which means each subjects have the same dimension (161,800) cortical thickness. After preprocessing the data with the 50 × 50 patch window size, we have 2867562, 1504926 image patches for ventricle and cortical thickness measures, respectively.
4.2. CNN-MSCC Modeling and Evaluation
We build a prediction model on these multiple task geometry surface features from 837 baseline subjects of AD, MCI and cognitively unimpaired (CU) categories, we expect to predict future MMSE/ADAS-Cog scales (from 6th month to 24th month). In this study, we took Alexnet structure [17] as the initial CNN model, which contains 7 layers, including convolutional layers with fixed filter sizes and different numbers of feature maps and the architecture of our CNN is shown in Tab. 1. We pretrained the CNN model on the ImageNet dataset [28], containing millions of labeled natural images with thousands of categories then removed the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet). Finally, the rest of the CNN was transferred and used to extract feature maps from MMS image patches of special cortices and cortical thickness [36].
Table 1.
The architecture of CNN used in this work.
| Deep Layer | Function | # of neurons |
|---|---|---|
| 1 | Convolutional Layer | 253440 |
| 2 | Pooling Layer | 186624 |
| 3 | Convolutional Layer | 64896 |
| 4 | Convolutional Layer | 64896 |
| 5 | Convolutional Layer Pooling Layer |
43264 9216 |
| 6 | Fully Connected Layer | 4096 |
| 7 | Fully Connected Layer | 4096 |
We implemented our CNN model using the Caffe toolbox [44]. The network was trained on a Intel (R) Xeon (R) 48-core machine, with 2.50 GHZ processors, 256 GB of globally addressable memory and a single Nvidia GeForce GTX TITAN black GPU. In the experimental setting of MSCC, the sparsity λ = 0.1. Also, we selected 10 epochs with a batch size of 1 in Algorithm 1 and 3 iterations of CCD in Algorithm 2 (P is set to be 1 and S is set to be 3) in all the experiments. After we get the MSCC features, we used Max-Pooling for further dimension reduction. Therefore, the feature dimentsion of each subject is a 1 × 2000 vector. To predict future clinical scores, we used Lasso regression following CNN-MSCC. For the parameter selection, 5-fold cross validation is used to select model parameters in the training data (between 10−3 and 103).
In order to evaluate CNN-MSCC model, we randomly split the data (image patches and their corresponding MMSE/ADAS-Cog scores) into training and testing sets using an 8:2 ratio and used 10-fold cross validation to avoid data bias. Lastly, we evaluated the overall regression performance of our proposed system using normalized mean square error (nMSE), weighted correlation coefficient (wR) and root mean square error (rMSE) for task-specific regression performance measures [45]. The three measures are defined as follows:
For nMSE and wR, Yi is the ground truth of target of task i and is the corresponding predicted value, σ(Yi) is the Standard deviation of Yi, Corr is the correlation coefficient between two vectors and ni is the number of subjects of task i. For rMSE, y is the ground truth of target at a single task and is the corresponding prediction by a prediction model. The smaller nMSE and rMSE, as well as the bigger wR mean the better results. nMSE and wR are used to evaluate the overall performances of the proposed system across three times points, rMSE is used to evaluate CNN-MSCC performance of each time point. We reported the mean and standard deviation based on 40 iterations of experiments on different splits of data.
4.3. Performance Analysis
After We constructed CNN-MSCC model with the image patches from the combined three kinds of features, we used Lasso to individually predict 6-month (M06), 12-month (M12) and 24-month (M24) MMSE and ADAS-cog scores with 8:2 ratio on training and testing data sets. The prediction results are reported in Fig. 3(a) and Fig. 3(b). Fig. 3(a) shows the overall nMSE and wR measures of CNN-MSCC to predict MMSE and ADAS-Cog scores across three time points (M06, M12 and M24), for MMSE and ADAS-Cog score predictions, CNN-MSCC achieves nMSE of 0.274±0.051 and 0.762±0.012 respectively, it also achieves wR of 0.751±0.083 and 0.862±0.045, respectively.
Fig. 3.
MMSE and ADAS-Cog prediction performances in terms of nMSE and wR (a) and in terms of rMSE (b) using proposed CNN-MSCC model.
Fig. 3(b) shows the rMSE measures for MMSE and ADAS-Cog scores at M06, M12 and M24, respectively, for MMSE and ADAS-Cog scores at M06, CNN-MSCC achieves rMSE of 2.198±0.062 and 4.322±0.269, for MMSE and ADAS-Cog scores at M12, the proposed method achieves rMSE of 2.211±0.459 and 4.930±0.912, for MMSE and ADAS-Cog scores at M24, we achieves rMSE of 2.290±0.601 and 5.521±0.816. We can observe that the rMSE of predicting M06, M12 and M24 scores of MMSE and ADAS-Cog are stable across all three time points. Especially there is an obvious improvement of the proposed CNN-MSCC for later time points (12, 24-month). This may be due to the data sparseness in later time points, as the proposed sparsity-inducing models are expected to achieve better prediction performance.
Zhou et al. proposed similar multi-task learning models to predict future MMSE and ADAS-Cog scores related with AD progression using baseline MRI features, MRI data were from 648 subjects of ADNI dataset, 5 categories of MRI features were used, they were cortical thickness, volumes or surface measures [45]. The nMSE, wR and rMSE results estimated from our proposed system are very comparable to the best results reported in [45]. There are three main reasons about the outperformances of the proposed multi-task model: advanced cortical structure measure MMS was introduced, this effective biomarker had been well studied in our previous work and outperformed volume measure [10, 11]; since deep model is good at modeling the complex cortical structure in the medical imaging field [46], pretrained CNN was used to generate hierarchical structure features from high-dimension MMS and cortical thickness measures; we developed a novel unsupervised learning method MSCC to improve the computing ability for processing hierarchical structure features of multi-tasks.
5. Conclusions and Future Work
In this work, we proposed a deep learning model, CNN-MSCC, to model multiple cortical structure features, for predicting future MMSE/ADAS-Cog scores. The proposed model is validated by extensive experimental studies and shown to be more efficient than similar studies. In future work, we will optimize our method and investigate its capability on brain multimodality imaging datasets.
6. Acknowledgement
Algorithm development and image analysis for this study was funded, in part, by the National Institute on Aging (RF1AG051710 to QD, JZ, PMT, JY and YW, R01EB025–032 to YW, R01HL128818 to QD and YW, R01AG031581 and P30AG19610 to RJC, U54EB020403 to PMT and YW), the National Science Foundation (IIS-1421165 to JZ and YW), and Arizona Alzheimers Consortium (JZ, RJC and YW). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.
Data collection and sharing for this project was funded by the ADNI (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12–2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abb-Vie, Alzheimers Association; Alzheimers Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimers Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
* Acknowledgments:
Data used in preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
References
- 1.Brookmeyer Ron, Johnson Elizabeth, Ziegler-Graham Kathryn, and Arrighi H Michael. Forecasting the global burden of alzheimer’s disease. Alzheimer’s & dementia, 3(3):186–191, 2007. [DOI] [PubMed] [Google Scholar]
- 2.Folstein ME. A practical method for grading the cognitive state of patients for the children. J Psychiatr res, 12:189–198, 1975. [DOI] [PubMed] [Google Scholar]
- 3.Rosen Wilma G, Mohs Richard C, and Davis Kenneth L. A new rating scale for alzheimer’s disease. The American journal of psychiatry, 1984. [DOI] [PubMed] [Google Scholar]
- 4.Buckner Randy L. Memory and executive function in aging and ad: multiple factors that cause decline and reserve factors that compensate. Neuron, 44(1):195–208, 2004. [DOI] [PubMed] [Google Scholar]
- 5.Thompson Paul M, Hayashi Kiralee M, De Zubicaray Greig I, Janke Andrew L, Rose Stephen E, Semple James, Hong Michael S, Herman David H, Gravano David, Doddrell David M, et al. Mapping hippocampal and ventricular change in alzheimer disease. Neuroimage, 22(4):1754–1766, 2004. [DOI] [PubMed] [Google Scholar]
- 6.Chung Moo K, Robbins Steve, and Evans Alan C. Unified statistical approach to cortical thickness analysis. In Biennial International Conference on Information Processing in Medical Imaging, pages 627–638. Springer, 2005. [DOI] [PubMed] [Google Scholar]
- 7.Frisoni Giovanni B, Fox Nick C, Jack Clifford R Jr, Scheltens Philip, and Thompson Paul M. The clinical use of structural mri in alzheimer disease. Nature Reviews Neurology, 6(2):67, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cacciaglia Raffaele, Molinuevo José Luis, Falcón Carles, Brugulat-Serrat Anna, Sánchez-Benavides Gonzalo, Gramunt Nina, Esteller Manel, Morán Sebastián, Minguillón Carolina, Fauria Karine, et al. Effects of apoe-ε4 allele load on brain morphology in a cohort of middle-aged healthy individuals with enriched genetic risk for alzheimer’s disease. Alzheimer’s & Dementia, 14(7):902–912, 2018. [DOI] [PubMed] [Google Scholar]
- 9.Operto Grégory, Cacciaglia Raffaele, Grau-Rivera Oriol, Falcon Carles, Brugulat-Serrat Anna, Ródenas Pablo, Ramos Rubén, Morán Sebastián, Esteller Manel, Bargalló Nuria, et al. White matter microstructure is altered in cognitively normal middle-aged apoe-ε4 homozygotes. Alzheimer’s research & therapy, 10(1):48, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dong Qunxi, Zhang Wen, Wu Jianfeng, Li Bolun, Schron Emily H, McMahon Travis, Shi Jie, Gutman Boris A, Chen Kewei, Baxter Leslie C, et al. Applying surface-based hippocampal morphometry to study apoe-e4 allele dose effects in cognitively unimpaired subjects. NeuroImage: Clinical, page 101744, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Shi Jie, Stonnington Cynthia M, Thompson Paul M, Chen Kewei, Gutman Boris, Reschke Cole, Baxter Leslie C, Reiman Eric M, Caselli Richard J, and Wang Yalin. Studying ventricular abnormalities in mild cognitive impairment with hyperbolic Ricci flow and tensor-based morphometry. NeuroImage, 104:1–20, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Fan Yonghui, Wang Gang, Lepore Natasha, and Wang Yalin. A tetrahedron-based heat flux signature for cortical thickness morphometry analysis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 420–428. Springer, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhang Jie, Tu Yanshuai, Li Qingyang, Caselli Richard J, Thompson Paul M, Ye Jieping, and Wang Yalin. Multi-task sparse screening for predicting future clinical scores using longitudinal cortical thickness measures. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pages 1406–1410. IEEE, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Razavian Ali Sharif, Azizpour Hossein, Sullivan Josephine, and Carlsson Stefan. Cnn features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 806–813, 2014. [Google Scholar]
- 15.Zhang Jian. Deep transfer learning via restricted boltzmann machine for document classification. In Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on, volume 1, pages 323–326. IEEE, 2011. [Google Scholar]
- 16.LeCun Yann, Bottou Léon, Bengio Yoshua, and Haffner Patrick. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [Google Scholar]
- 17.Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [Google Scholar]
- 18.Turaga Srinivas C, Murray Joseph F, Jain Viren, Roth Fabian, Helmstaedter Moritz, Briggman Kevin, Denk Winfried, and Seung H Sebastian. Convolutional networks can learn to generate affinity graphs for image segmentation. Neural computation, 22(2):511–538, 2010. [DOI] [PubMed] [Google Scholar]
- 19.Hazlett Heather Cody, Gu Hongbin, Munsell Brent C, Kim Sun Hyung, Styner Martin, Wolff Jason J, Elison Jed T, Swanson Meghan R, Zhu Hongtu, Botteron Kelly N, et al. Early brain development in infants at high risk for autism spectrum disorder. Nature, 542(7641):348, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Pan Sinno Jialin and Yang Qiang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2010. [Google Scholar]
- 21.Donoho David L and Elad Michael. Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization. Proceedings of the National Academy of Sciences, 100(5):2197–2202, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Mairal Julien, Bach Francis, Ponce Jean, and Sapiro Guillermo. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ‘09, pages 689–696, New York, NY, USA, 2009. ACM. [Google Scholar]
- 23.Lin Binbin, Li Qingyang, Sun Qian, Lai Ming-Jun, Davidson Ian, Fan Wei, and Ye Jieping. Stochastic coordinate coding and its application for drosophila gene expression pattern annotation. arXiv preprint arXiv:1407.8147, 2014. [Google Scholar]
- 24.Zhang Jie, Shi Jie, Stonnington Cynthia, Li Qingyang, Gutman Boris A, Chen Kewei, Reiman Eric M, Caselli Richard, Thompson Paul M, Ye Jieping, et al. Hyperbolic space sparse coding with its application on prediction of alzheimer?s disease in mild cognitive impairment. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 326–334. Springer, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhang Jie, Stonnington Cynthia, Li Qingyang, Shi Jie, Bauer Robert J, Gutman Boris A, Chen Kewei, Reiman Eric M, Thompson Paul M, Ye Jieping, et al. Applying sparse coding to surface multivariate tensor-based morphometry to predict future cognitive decline. In Biomedical Imaging (ISBI), 2016 IEEE 13th International Symposium on, pages 646–650. IEEE, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhang Daoqiang, Shen Dinggang, Alzheimer’s Disease Neuroimaging Initiative, et al. Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in alzheimer’s disease. NeuroImage, 59(2):895–907, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zhang Wenlu, Li Rongjian, Zeng Tao, Sun Qian, Kumar Sudhir, Ye Jieping, and Ji Shuiwang. Deep model based transfer and multi-task learning for biological image analysis. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1475–1484. ACM, 2015. [Google Scholar]
- 28.Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009. [Google Scholar]
- 29.Daniel S Kermany, Goldbaum Michael, Cai Wenjia, Valentim Carolina CS, Liang Huiying, Baxter Sally L, McKeown Alex, Yang Ge, Wu Xiaokang, Yan Fangbing, et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172(5):1122–1131, 2018. [DOI] [PubMed] [Google Scholar]
- 30.Tibshirani Robert. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996. [Google Scholar]
- 31.Weiner Michael W, Veitch Dallas P, Aisen Paul S, Beckett Laurel A, Cairns Nigel J, Green Robert C, Harvey Danielle, Jack Clifford R, Jagust William, Liu Enchi, et al. The alzheimer’s disease neuroimaging initiative: a review of papers published since its inception. Alzheimer’s & Dementia, 9(5):e111–e194, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Canutescu Adrian A and Dunbrack Roland L. Cyclic coordinate descent: A robotics algorithm for protein loop closure. Protein science, 12(5):963–972, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zhang Tong. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning, page 116. ACM, 2004. [Google Scholar]
- 34.Combettes Patrick L and Wajs Valérie R. Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation, 4(4):1168–1200, 2005. [Google Scholar]
- 35.Duchesne Simon, Caroli Anna, Geroldi Cristina, Collins D Louis, and Frisoni Giovanni B. Relating one-year cognitive change in mild cognitive impairment to baseline mri features. NeuroImage, 47(4):1363–1370, 2009. [DOI] [PubMed] [Google Scholar]
- 36.Jack Clifford R Jr, Bernstein Matt A, Fox Nick C, Thompson Paul, Alexander Gene, Harvey Danielle, Borowski Bret, Britson Paula J, Whitwell Jennifer L., Ward Chadwick, et al. The alzheimer’s disease neuroimaging initiative (adni): Mri methods. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine, 27(4):685–691, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Patenaude Brian, Smith Stephen M, Kennedy David N, and Jenkinson Mark. A bayesian model of shape and appearance for subcortical brain segmentation. Neuroimage, 56(3):907–922, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Lorensen William E. and Cline Harvey E.. Marching cubes: A high resolution 3d surface construction algorithm. In Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ‘87, pages 163–169, New York, NY, USA, 1987. ACM. [Google Scholar]
- 39.Wang Yalin, Song Yang, Rajagopalan Priya, An Tuo, Liu Krystal, Chou Yi-Yu, Gutman Boris, Toga Arthur W, Thompson Paul M, Alzheimer’s Disease Neuroimaging Initiative, et al. Surface-based tbm boosts power to detect disease effects on the brain: an n= 804 adni study. Neuroimage, 56(4):1993–2010, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Han Xiao, Xu Chenyang, and Prince Jerry L.. A topology preserving level set method for geometric deformable models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(6):755–768, 2003. [Google Scholar]
- 41.Wang Yalin, Chan Tony F, Toga Arthur W, and Thompson Paul M. Multivariate tensor-based brain anatomical surface morphometry via holomorphic one-forms. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 337–344. Springer, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Fischl Bruce. Freesurfer. Neuroimage, 62(2):774–781, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Chung Moo K, Dalton Kim M, and Davidson Richard J. Tensor-based cortical surface morphometry via weighted spherical harmonic representation. IEEE transactions on medical imaging, 27(8):1143–1151, 2008. [DOI] [PubMed] [Google Scholar]
- 44.Jia Yangqing, Shelhamer Evan, Donahue Jeff, Karayev Sergey, Long Jonathan, Girshick Ross, Guadarrama Sergio, and Darrell Trevor. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. [Google Scholar]
- 45.Zhou Jia, Liu Jun, Narayan Vaibhav A., and Ye Jieping. Modeling disease progression via multi-task learning. Neuroimage, 78:233–248, September 2013. [DOI] [PubMed] [Google Scholar]
- 46.Suzuki Kenji. Overview of deep learning in medical imaging. Radiological physics and technology, 10(3):257–273, 2017. [DOI] [PubMed] [Google Scholar]



