Abstract
In this paper, we provide an extensive overview of machine learning techniques applied to structural magnetic resonance imaging (MRI) data to obtain clinical classifiers. We specifically address practical problems commonly encountered in the literature, with the aim of helping researchers improve the application of these techniques in future works. Additionally, we survey how these algorithms are applied to a wide range of diseases and disorders (e.g. Alzheimer's disease (AD), Parkinson's disease (PD), autism, multiple sclerosis, traumatic brain injury, etc.) in order to provide a comprehensive view of the state of the art in different fields.
Keywords: Neuroimaging, Structural magnetic resonance imaging, Machine learning, Predictive modeling, Alzheimer, Autism, Multiple sclerosis, Parkinson, SVMs, Ensembling, Cross-validation
Highlights
-
•
Machine learning is incredibly useful as a tool to build predictive systems using structural MRI.
-
•
The whole data analysis pipeline is not always correctly implemented, which may yield wrong results.
-
•
We review how the field has applied these techniques with all its shortcomings, and propose improvements.
1. Introduction
Machine learning (ML) algorithms (Kotsiantis et al., 2007, Kotsiantis et al., 2006) are currently employed in an extensive range of fields, from e-mail filtering (Guzella and Caminhas, 2009), movie recommendations (Park et al., 2012) and energy grid maintenance (Rudin et al., 2012), to cite a few. In general, supervised ML consists of algorithms capable of generalizing rules or patterns from a labeled set of input data, and using that knowledge to generate predictions or classifications on data not seen before (Kotsiantis et al., 2007). The field of neuroscience has also greatly benefited from ML. For years, ML algorithms have been widely used to build classifiers or predictors for a wide range of diseases using magnetic resonance imaging (MRI) information as input features. These inputs can be structural gray matter (GM) readings, obtained from cortical thickness (CT) (Ad-Dab'bagh et al., 2006; Fischl and Dale, 2000) or GM density (GMd) values from voxel-based morphometry (VBM) (Ashburner and Friston, 2000), microstructural changes in the white matter (WM) from diffusion-weighted imaging (DWI) (fractional anisotropy (FA)) (Mandl et al., 2008), connectivity matrices (Iturria-Medina et al., 2007), or parameters derived from network analyses (Iturria-Medina, 2013; Rubinov and Sporns, 2010; Zeighami et al., 2015), and resting/task state fMRI information (Pereira et al., 2009). These values can be obtained per voxel or averaged over anatomical regions using atlases to reduce feature dimensionality. Once the imaging features have been computed, they are fed into the ML algorithm of choice in order to learn disease patterns.
Here, we present a review of publications that use structural MRI data, including DWI techniques, to build classifiers aimed both at a) predicting a given clinical state and b) extracting brain regions related to the disease of interest. As certain generalizations can be made across modalities, in some cases we refer to fMRI studies, though they will not be the main subject of this work. Readers interested in the intersection between ML and fMRI should refer to (Haynes, 2015; Pereira et al., 2009; Schrouff, 2013). While other modalities (PET, EEG, MEG) can also be used either in isolation or in conjunction with MRI data, we only focus on structural MRI, as it already offers considerable morphological findings.
While there are many studies devoted to finding group level differences, they do not necessarily imply accurate predictions and may not be very informative when it comes to predicting the clinical outcome of individual subjects (Davatzikos, 2004; Iturria-Medina, 2013; Lo et al., 2015). Furthermore, the clinical utility of imaging metrics should be assessed by their predictive power on new data samples (Gabrieli et al., 2015; Libero et al., 2015). As we want to center this review on studies that provide predictive classification, we do not include papers that only provide correlational analyses. Following the three different definitions of the term prediction detailed on Gabrieli et al. (2015) (section Analytic Approaches: From Correlation to Individualized Prediction) (Gabrieli et al., 2015), we focus on the third, in which the goodness of the method is tested on out-of-sample predictions (i.e. data that has not been used for training the model). This definition also includes cross-validation techniques, where the reported accuracy rates are more likely to generalize to out-of-sample data. In addition, this review focuses on ML techniques that work with relatively small feature sets (compared to the number of image voxels) which require feature extraction. We acknowledge that there are ML approaches that do not necessarily need this feature extraction step such as deep learning classifiers (Deng et al., 2014; LeCun et al., 2015) in which both feature extraction and classifier learning are incorporated into a unified framework (Betechuoh et al., 2006; F. Li et al., 2014a, Li et al., 2014b; Liu et al., 2014; Payan and Montana, 2015; Suk et al., 2014, Suk et al., 2015; Suk and Shen, 2013; Vincent et al., 2008). However, such techniques generally require much larger datasets and more computational power, and present interpretability challenges such that they are typically regarded as black boxes, and for these reasons, won't be included in this review.
On a last note for the introduction, we would like to warn that it is outside the scope of this paper to provide a detailed explanation of different ML algorithms. Support vector machines (SVMs) and linear discriminants have been explained in detail in existing reviews (Lemm et al., 2011; Pereira et al., 2009). For other algorithms such as logistic regression or random forests, and for ML techniques in general, refer to Hastie et al. (2009) (https://web.stanford.edu/~hastie/ElemStatLearn/). A more introductory version of that text (James et al., 2013) is also available at http://www-bcf.usc.edu/~gareth/ISL/.
2. From imaging to prediction: an overview
This section provides a brief summary of the steps involved in the development of a predictive ML model using raw imaging data as input features.
2.1. Image processing
Data coming from imaging studies needs be processed in order to be used as input for ML systems. This step, here referred to as feature extraction, typically takes place in three steps (see Fig. 1 for a schematic diagram):
-
1.
Raw images are processed to extract quantitative information. Structural T1 images can be used as the input for CIVET (Ad-Dab'bagh et al., 2006), FreeSurfer (Fischl, 2012), MINC (Aubert-Broche et al., 2013), or SPM (Penny et al., 2011) software packages, in order to extract CT per surface vertex (CIVET and FreeSurfer) or GMd per image voxel (SPM). Such processing steps generally include denoising (Manjón et al., 2010; Power et al., 2014; Wink and Roerdink, 2004), intensity inhomogeneity correction (Sled et al., 1998; Tustison et al., 2010; Vovk et al., 2007), and image intensity normalization. The images are then registered to an average brain atlas (e.g. MNI-ICBM152) (Fonov et al., 2011, Fonov et al., 2009). Tissue or structure segmentation or cortical surface extractions are then performed using these preprocessed and normalized images in the standardized space. DWI sequences (Iturria-Medina et al., 2007; Smith et al., 2004) can also be processed using available toolboxes to extract measurements of WM microstructural changes, such as FA, mean diffusivity, radial diffusivity, connectivity matrices, and network metrics (Bullmore and Sporns, 2012). In this step, a registration procedure is also typically performed. This registration involves obtaining a series of mathematical mappings to transform the images into the same spatial domain. In other words, regardless of individual morphological differences, registration ensures that region R for a given subject corresponds to the same voxels or vertices (i.e. same spatial locations) as region R for the rest of the population (Hill et al., 2001; Maintz and Viergever, 1998).
-
2.
The computed results (e.g. 3D volumetric matrices, 2D connectivity matrices, 1D vectors of network metrics, etc.) are then flattened in order to obtain a single feature vector per subject by removing spatial information (x, y, z locations per data point) and extracting the numerical values. For instance, if CT values are computed for 40,000 vertex points, a 40,000 × 1 vector is generated, regardless of the position of the vertices within the computed surface. The necessary information to revert the values to their original spatial locations can be stored.
-
3.
Feature vectors from all subjects are then aggregated into a N × M matrix, where N is the number of subjects in the study and M is the length of the feature vector, which can also include information from sources other than imaging (demographics, behavioral, etc.). Finally, the output label containing the clinical states of the subjects is used as the target variable.
Fig. 1.
Image processing workflow, from the raw datasets to final input matrix for the ML system. This example assumes two different MRI modalities are used: structural T1 and DTI. The complete pipeline, from image to data input matrix, involves 3 steps: a) image processing to obtain quantitative information (e.g. CT surfaces, FA volumes, or connectivity matrices); b) removal of spatial information (flattening) to obtain single feature vectors per subject; and c) aggregation of all feature vectors into a single data matrix. A corresponding label output vector contains the classification target (e.g. the clinical state) for each subject. This process can involve more modalities such as PET, CSF, rs-fMRI, EEG, genetic, and behavioral information, but the final aggregated product would be similar.
2.2. Building a predictive model
This subsection provides a summary on how to apply ML algorithms to processed data, such as the data matrix obtained in the previous step, as well as brief comments on potential pitfalls/aspects that might prove useful in practice. For more extensive reviews, see (Lemm et al., 2011). A ML classification algorithm is an a priori unknown function that relates a set of inputs with an output label (in this case, the clinical status of the subjects that form the sample). That function is then trained on a set of known data to obtain the parameters that relate the input vectors to categorical output values, therefore producing a classification output. This process is not sufficient by itself, as the classifier needs to be tested in a dataset not used during the training phase. Since imaging data is generally scarce, it is not common to have testing data reserved. Instead, cross-validation (Duda et al., 2001; Hastie et al., 2009) is typically used: the full dataset is split into N different folds: N − 1 are assigned for training and the remaining one for testing. The algorithm is trained and an accuracy score (e.g. percent of correctly classified subjects, sensitivity, specificity or other suitable metrics) is reported on the test set. The process is repeated until each fold has been assigned once to the test set to obtain an overall accuracy score. If the number of folds is the equal to the number of subjects in the sample, this process is called leave-one-out cross-validation, as each subject is tested individually.
2.3. Model ensembling and stacking
Ensembling and stacking techniques allow to combine different models (and even several instances of the same model, with different initialization parameters) in order to achieve higher accuracies and, at the same time, reduce the probability of overfitting (Hastie et al., 2009). Ensembling refers to the combination of predictions by (weighted) averaging their results, or using a voting schema (Caruana et al., 2004). On the other hand, model stacking uses the output from different classifiers as the input of another algorithm which yields the final classification score (Džeroski and Ženko, 2004). This last algorithm can be any of the ML algorithms whose results are being merged, or a completely different one.
While these approaches may yield better and more robust results than any of the best models individually, it has to be taken into account that the interpretability of the resulting classifier might not be as straightforward as it would normally be with a single model. As mentioned previously, in this field, accuracy rates are important, but so is the interpretability of the biological causes of the different diseases or disorders, such as which regions are particularly relevant for a given classification task. As a result, one might opt for a single model with slightly lower accuracy in favor of higher interpretability. In some cases, an ensemble approach may also provide feature importance as the output. For instance, random forests are by themselves an ensembling approach (a combination of individual decision trees).
3. Practical issues
Missteps in performing cross-validation commonly lead to overly optimistic error rates (i.e. the classifier is reported to do better than it actually does). Thus, this step should be implemented with extensive care. In the following section, we comment on details that need to be taken into consideration when implementing cross-validation loops in ML pipelines. The optimal workflow for building a robust ML classifier is depicted in Fig. 2 (Gabrieli et al., 2015). For more information, see Appendix A from (Plitt et al., 2015).
Fig. 2.
Optimal workflow for constructing a classifier or predictor. Splitting the data into N folds using a cross-validation approach is not the only step required to ensure generalizability. Internal cross-validation loops are necessary to obtain a subset of relevant features (if feature selection is needed) and to tune model hyperparameters (e.g. C in Gaussian SVMs, number of neurons in neural networks, or number of trees in random forests). Performing these steps on the full sample will result in an excessively optimistic classifier. Additionally, the cross-validation evaluation could be enhanced by performing permutation tests (Golland and Fischl, 2003; Ojala and Garriga, 2010). Figure reproduced with permission from the original (Gabrieli et al., 2015).
3.1. Feature preprocessing
Once the data matrix has been formed, it can be beneficial to perform an initial feature preprocessing before proceeding with the main ML pipeline. As certain algorithms expect the features to represent data in the same scale and with a certain distribution, it is common to perform a centering and scaling operation: each continuous variable is replaced by new values, obtained from subtracting the original mean and dividing by the original standard deviation (i.e. creating variables with mean = 0 and standard deviation = 1). During this phase, dimensionality reduction algorithms can be used, such as principal component analysis (PCA) or independent component analysis (ICA) (Duda et al., 2001; Hastie et al., 2009). While ICA is frequently used in fMRI data analysis, few studies use these techniques in the literature included in this review. This may be due to the fact that PCA and similar methods yield new variables which are linear combinations of the original ones, and hence come at the cost of reduced interpretability of the features. Other more complex feature selection techniques such as sparse feature selection can also be used in this step, depending on the specific application and dataset (Ahsen et al., 2017; Z. Li et al., 2014a, Li et al., 2014b; Tan et al., 2010).
Depending on the application, more specific preprocessing steps may be performed, specially when a large confounding effect is encountered. Building classifiers to differentiate AD versus healthy controls, Dukart et al. found that misclassified patients were younger than misclassified control subjects (Dukart et al., 2011). Removing age-related effects from the input VBM data improved accuracy by approximately 2%. A slightly larger effect (5%) was later observed using the same technique applied to mild cognitive impairment (MCI) subjects when predicting their conversion status to AD (Moradi et al., 2015).
3.2. Feature selection and hyperparameter tuning
The result of the image processing step typically consists of data matrices of relatively small numbers of rows (corresponding to subjects) with significantly larger numbers of columns (corresponding to different variables), sometimes several orders of magnitude higher (e.g. several hundreds of subjects, at best, and thousands or tens of thousands of variables). These variables can be CT, GMd, or VBM measures for each voxel, or FA values in the WM. For instance, CT values extracted using the CIVET software (Ad-Dab'bagh et al., 2006) consist of more than 160,000 vertices per subject if high-resolution surfaces are used. In order to initially reduce the number of features, from thousands to just a few hundreds, it is common to use ROI-based approaches: voxels or surfaces are averaged over regions defined by a brain atlas, such as AAL (Tzourio-Mazoyer et al., 2002) or DKT (Klein and Tourville, 2012). Note that this averaging might result in losing potential differences in cases where the defined regions are too large (Dyrba et al., 2015).
As it is known, not all diseases affect every brain region, and not always in the same way. Therefore, some of the input variables might not be related to the output labels and some of them may contain information already conveyed by other features. Reducing the number of irrelevant and redundant variables both reduces the computational time and improves generalization (Dash and Liu, 1997; Guyon and Elisseeff, 2003; Moradi et al., 2015). In the field of neuroscience, feature selection is relevant not only because it helps to achieve higher accuracy rates (Ad-Dab'bagh et al., 2006), but also, and mainly, because it allows to investigate which features are relevant for the specific classification problem of interest, offering an insight to the underlying brain regions that account for group differences (Plitt et al., 2015). This interpretation can make ML results complementary to those obtained by more classical inferential approaches. From this point of view, it is also important to note that some ML algorithms (e.g. linear SVMs and random forests) assign to each variable, a weight which is directly related to their importance within the model. Said weights can then be used to rank the input variables and create maps of brain regions relevant for the classification task, even when no feature selection is performed a priori. Storing the spatial information for the features, it is possible to report this feature importance using a parametric map (Fig. 3). In that sense, certain ML algorithms can also be used as feature selection methods (Rakotomamonjy, 2003) in combination with techniques such as Recursive Feature Extraction (RFE) (Kuncheva and Rodríguez, 2010). While SVMs are capable of dealing with multiple irrelevant features (Lemm et al., 2011; Zarogianni et al., 2013), their accuracy is nonetheless diminished compared to an optimal situation in which only relevant features are used (Liu et al., 2012).
Fig. 3.
A) Using a SVM multi-kernel approach, Zhang et al. (2011) found 11 relevant cortical regions for AD classification: left and right amygdala, left and right hippocampal formations, left and right uncus, left entorhinal cortex, left middle temporal gyrus, left temporal lobe, left perirhinal cortex and left parahippocampal gyrus. This assessment of the importance of different features supports the usage of ML techniques in order to understand the biological bases of diseases. B) It is also possible to report variable importance without using the spatial distribution. Figure reproduced with permission from the original (Westman et al., 2012; Zhang et al., 2011).
Within the same family of classifiers, the number of features used may also have an impact. Song et al. found that Gaussian SVMs behaved better than linear SVMs in lower dimensionality problems (fewer features) (Song et al., 2011). Non-linear SVMs can be more prone to overfitting (finding noisy patterns that do not improve generalization) (Wottschel et al., 2015). In such cases, they may behave better (i.e. higher validation accuracy) if the dimensionality of the problem is reduced.
3.2.1. Leakage in cross-validation techniques
Leakage (Johnston et al., 2013; Kuncheva and Rodríguez, 2010; Pereira et al., 2009) is the creation and usage, commonly by accident, of variables that carry information about the outcome of the problem (the classification labels, in our case). Leakage generally occurs during feature selection if the entire dataset is used to identify potentially informative variables outside of the cross-validation loop.
A rule of thumb can be established to detect leakage. Consider the case of leave-one-out cross-validation, in which the ith case, denoted Xi, (intuitively, the ith row in the input matrix from Fig. 1) is kept as the test set and the rest is used as the training set for that case. In that schema, involving the label for the test case (yi) in any step during training would be leakage; yi should only be used when evaluating the accuracy of the model (Lemm et al., 2011). This includes somewhat common procedures such as performing t-tests, correlations or more advanced feature selection techniques on the entire sample in order to identify features strongly related to the output label before proceeding with the cross-validation. Selecting variables with the highest variance from the whole sample, on the other hand, would not be considered as leakage since output labels are not used. Sections 7.10 of Hastie et al. (Cross-Validation) and 7.10.2 (The Wrong and Right Way to Do Cross-validation) (Hastie et al., 2009) provide an overview on cross-validation. (Ambroise and McLachlan, 2002) also provides extensive comments on feature selection for microarray gene-expression data, a quintessential example in which the number of samples is much lower than the number of features. This dimensionality problem is not different from the one encountered in neuroimaging field where similar precautions may be applied.
It is not rare to detect leakage, as we will explore in the following sections. While this does not invalidate the reported findings, it makes comparison of the results difficult, as the reported accuracies will likely be overly optimistic.
Similar considerations should also be applied for hyperparameter tuning (i.e. the inherent parameters of ML algorithms, such as C in Gaussian SVMs, number of neurons in neural networks, or number of trees in random forests). To avoid synthetically increasing the accuracy, this procedure also has to be done in an inner loop within each cross-validation fold, or the model selection would be done based on the entire sample. As in the case of feature selection, it is also possible to report summary statistics about optimal hyperparameter values (Cuingnet et al., 2011).
3.2.2. Bias-variance trade-off
The relationship between the sample size (i.e. the number of subjects) and the dimensionality of the problem (i.e. the number of features) has been extensively studied in the literature (Hastie et al., 2009; Hughes, 1968; Kanal and Chandrasekaran, 1971; McKnight et al., 2002). As was mentioned previously, the number of features that are extracted from MR images are generally much larger than the sample size. In such high-dimensional cases, if the model parameters are estimated to fit the data without any form of regularization (e.g. PCA), there will be a high likelihood of overfitting to the training data, and consequently a poor generalization to out-of-sample test data (Hastie et al., 2009). On the other hand, too much regularization (e.g. using a very small number of features) might also lead to underfitting; not using all the available information from data. Determining the optimal amount of regularization is a bias-variance problem based on the sample size and specific task of interest; a high variance leads to overfitting, while a high bias leads to underfitting (Raudys and Jain, 1991). For more information of model selection, see Varoquaux et al. (2017).
4. Machine learning applied to structural neuroimaging
In the following subsections, we will discuss in more depth works that report classifiers built for specific diseases or disorders. In a few cases, publications that deal with non-categorical output variables (such as ADOS scores in autism) have also been included, but they are the exception. Classification accuracy is defined as the percentage of correct predictions; i.e. the sum of true positive and true negative predictions divided by the total number of predictions. For consistency, all accuracy scores are reported as a number between 0 and 1 (e.g. 0.67) instead of a percentage.
4.1. Alzheimer's disease/ mild cognitive impairment
Alzheimer's disease (AD) is a progressive neurodegenerative disorder leading to mild cognitive impairment (MCI) and dementia. The increased knowledge of the clinical manifestations and the complex biology of AD has led to the redefinition of the different disease stages in 2011 (Albert et al., 2011; McKhann et al., 2011).
Although MCI was traditionally considered as a risk factor for developing AD (Boyle et al., 2006), now it has been proposed that MCI patients who progress to AD should be reclassified as prodromal AD. On the contrary, patients who do not progress to dementia and do not show common biomarkers of AD should be considered as MCI patients (Dubois et al., 2010). Here, in order to be consistent with the terminology used in the reviewed publications, we maintain the traditional terminology, referring as MCI converted (MCI-c) to those that progress to AD dementia, and MCI non-converted (MCI-nc) to those that do not. Also, only classificatory studies of sporadic AD have been reviewed.
Typically, works on this topic have been focused on three different classification targets: a) AD patients vs. healthy controls; b) AD or controls vs. patients with MCI; and c) identification of MCI patients that will progress to AD within a certain time period (MCI-nc vs. MCI-c). The first two classification tasks address disease diagnosis, whereas the third addresses prognosis of the likely course of the disease. AD vs. MCI classification is by itself a more difficult problem than AD vs. controls (see Fig. 2 from (Zhang et al., 2011)), as MCI diagnosis sometimes sits in a gray area (Iturria-Medina, 2013) and can be easily confounded with either mild AD or healthy controls. It is even more challenging to predict which MCI patients will progress to AD within a certain time window (typically ranging between 6 months and 3 years) and which will remain stable (Wee et al., 2012; Westman et al., 2012). Table 1 provides a summary of these papers, including input data modality, the algorithm used and achieved accuracies. In the following, relevant aspects from some of the listed works are discussed.
Table 1.
Summary of the classification papers in Alzheimer's disease. Unless otherwise noted, reported accuracy rates are the highest found in the paper for different groups, methods and input modalities.
| Reference | Groups (N) | Method | Input Modalities | Accuracy | Comments |
|---|---|---|---|---|---|
| (Klöppel et al., 2008) | Controls-AD (several groups) | SVM (linear) | T1 | Up to 0.964 | Independent samples for training and testing |
| (Moradi et al., 2015) | MCI-c (100) - MCI-nc (164) | SVM | T1, cognitive | 0.66 | Features selected on independent AD vs. Control samples. |
| LDS | 0.745 | ||||
| (Dyrba et al., 2015) | Controls (25) – AD (28) | SVM (Gaussian) | rs-fMRI | 0.74 | AUC = 0.8 |
| DTI | 0.85 | AUC = 0.87 | |||
| T1 | 0.81 | AUC = 0.6 | |||
| SVM (multi-kernel) | rs-FMRI, DTI, T1 | 0.79 | AUC = 0.82 | ||
| DTI, T1 | 0.85 | AUC = 0.89 | |||
| (Cuingnet et al., 2011) | Controls (162) – AD (137) | Multiple classifiers tested, linear SVM had best accuracy. | T1 | 0.82⁎ | MCI conversion or non-conversion at 18 months. Results for MCIc vs. MCI-nc are non-significant. |
| Controls (162) – MCI-c (76) | 0.76⁎ | ||||
| MCI-c (76) – MCI-nc (134) | 0.62⁎ | ||||
| (Desikan et al., 2009) | Controls (143) – MCI (113) | Logistic regression | T1 | 0.823⁎ | Independent samples for training and testing (AUC = 0.95). |
| (Zhang et al., 2011) | Controls (52) – AD (51) | SVM (linear) | T1 | 0.862 | No hyperparameter search. Fixed C = 1. |
| Controls (51) – MCI (99) | SVM (multikernel) | PET | 0.865 | ||
| SVM (linear) | CSF | 0.821 | |||
| SVM (multikernel) | T1, PET, CSF | 0.932 | |||
| T1 | 0.72 | ||||
| PET | 0.716 | ||||
| CSF | 0.714 | ||||
| T1, PET, CSF | 0.745 | ||||
| (Wee et al., 2011) | Controls (17) – MCI (20) | SVM (linear) | DTI | 0.667 | Leakage (features selected on full dataset). |
| SVM (Gaussian) | 0.889 | ||||
| (Schmitter et al., 2015) | Controls (229)-AD (188) | SVM (linear) | T1 | 0.883⁎ | Leakage (voxels excluded based on statistical assessment on full dataset). Data mixed 1.5 T and 3 T images and was processed with 3 different software packages (FreeSurfer, MorphoBox, SPM). SVM hyperparameter search performed outside cross-validation loop. |
| Controls (229)-MCI (401) | 0.779⁎ | ||||
| MCI (401)-AD (188) | 0.687⁎ | ||||
| MCI-nc (130)-MCI-c (111) (2 y) | 0.688⁎ | ||||
| MCI-nc (103)-MCI-c (137) (3 y) | 0.698⁎ | ||||
| (Beheshti and Demirel, 2015) | Controls (130) – AD (130) | SVM (linear) | T1 | 0.896 | |
| SVM (Gaussian) | 0.893 | ||||
| (Dukart et al., 2011) | Controls (79) – AD (80) | SVM (linear) | T1 | 0.832 | Effect of age is removed from data. |
| (Gray et al., 2013) | Controls (35) – AD (37) | Random forest | T1, CSF, genetic, FDG-PET | 0.89 | |
| Controls – MCI (75) | 0.746 | ||||
| MCI-nc (41) – MCI-c (34) | 0.580 | ||||
| (Davatzikos et al., 2011) | MCI-nc (170) – MCI-c (69) | SVM (linear kernel) | T1, CSF | 0.734 | Part of the features come from a method trained on Controls vs. AD. |
| (Davatzikos et al., 2008) | Controls (15) – MCI (15) | SVM (linear kernel) | T1 | 0.9 | Longitudinal dataset. |
| (Aguilar et al., 2013) | Controls (110) – AD (116) | OPLS | T1, genetic, demographic | 0.876 | Accuracies for Controls – AD for different classifiers may use different input data. |
| MCI-nc (98) – MCI-c (21) | SVM (Gaussian kernel) Decision trees | 0.867 | |||
| Neural networks | 0.827 | ||||
| OPLS | 0.872 | ||||
| SVM (Gaussian kernel) Decision trees | 0.747 | ||||
| Neural networks | 0.709 | ||||
| 0.674 | |||||
| 0.701 | |||||
| (Oliveira et al., 2010) | Controls (20) – AD (15) | SVM (Gaussian) | T1 | 0.882 | Hyperparameters optimized in the outer loop. Features selected on full dataset. |
| (Westman et al., 2011) | Controls (112) – AD (117) | OPLS | T1 | 0.92⁎ | |
| Controls (112) – MCI (122) | 0.769⁎ | ||||
| MCI (122) – AD (117) | 0.71⁎ | ||||
| (Westman et al., 2012) | Controls (111) – AD (96) | OPLS | T1, CSF | 0.918 | |
| Controls (111) – MCI (162) | 0.776 | ||||
| MCI-nc (81) – MCI-c (81) | 0.685 | ||||
| (Xu et al., 2015) | Controls (117) – AD (113) | wmSRC | T1, PET | 0.948 | |
| Controls (117) – MCI (110) | 0.745 | ||||
| MCI-nc (83) – MCI-c (27) | 0.778 | ||||
| (Wee et al., 2012) | Controls (17) – MCI (10) | SVM (multi-kernel) | DTI, fMRI | 0.963 | Features selected on full dataset. |
| (Nir et al., 2015) | Controls (50) – AD (37) | SVM (Gaussian) | DTI | 0.849 | Features selected on full dataset. |
| Controls (50) – late MCI (39) | 0.79 | ||||
| (Fan et al., 2008a) | Controls (66) – AD (56) | SVM (linear) | T1 | 0.965 | |
| Controls (66) – MCI (88) | 0.846 | ||||
| MCI (88) – AD (56) | 0.759 | ||||
| (Young et al., 2013) | MCI-nc (96) – MCI-c (47) | Gaussian Process | T1, PET, APOE, CSF | 0.643 | Trained on healthy subjects + AD, tested on MCI cohort. |
| (Salvatore et al., 2016) | Controls (162) – AD (137) | SVM | T1 | 0.92 | |
| Controls (162) – MCI-c (76) | 0.86 | ||||
| MCI-nc (134) – MCI-c (76) | 0.73 | ||||
| (Duchesne et al., 2009) | Controls (75) – AD (75) | SVM (linear) | T1 | 0.92 | Other classifiers used (not reported here). |
| (Sørensen et al., 2017) | Controls (282) | LDA | T1 | 0.63 | Multi-class classification results. Winner of CADDementia challenge. More complex classifiers did not improve performance. |
| MCI (283) | |||||
| AD (154) | |||||
| CADDementia Test (354) | |||||
| (Ahmed et al., 2017) | Controls (52) – AD (45) | SVM (Gaussian) | T1, DTI | 0.902 | Use a multiple kernel learning method to combine features from T1, DTI, and CSF. |
| Controls (52) – MCI (58) | 0.794 | ||||
| MCI (58) – AD (45) | 0.766 | ||||
| (Vemuri et al., 2008) | Controls (190) – AD (190) | SVM (linear) | T1 | 0.885 | Model selection and optimization performed on 280 samples and validated on the remaining 100. |
| T1, APOE | 0.893 |
Indicates that accuracies have been computed using sensitivity and specificity values from the paper (accuracy = sensitivity · prevalence + specificity ·(1 − prevalence). The value for prevalence has been obtained from the number of cases for each group.
Dyrba et al. used a multimodal approach, with T1, DTI and rs-fMRI as inputs for SVM classifiers (Dyrba et al., 2015). Using only structural T1 information, an accuracy of 0.82 was obtained. The addition of DTI increased the AUC (0.89 vs. 0.86), but no improvement was observed by using all three modalities (accuracy 0.79, AUC 0.82). Authors discuss that this could be due to high levels of noise in the rs-fMRI data, which caused SVMs to overfit during training. This provides an example where more features do not necessarily imply more validation accuracy. The authors also comment on the hypothetical existence of a ceiling effect which makes it impossible to obtain diagnostic accuracies significantly higher than 0.90. This observation follows the same direction as the increasingly accepted idea that AD can have a combined etiology (vascular and neuronal) which increases the variability in the burden of vascular or neuronal damage in patients with identical dementia ratings. This theoretical upper limit is well in line with the values obtained for all the other papers analyzed in this review.
Klöppel et al. obtained high accuracy rates (0.811–0.964) when comparing controls to AD patients, using training and testing datasets from different databases (Klöppel et al., 2008). As mixing images from different sites can potentially have a confounding effect (Auzias et al., 2016), this implies robustness of the selected approach and that SVMs are able to generalize well. The same approach was used in Desikan et al. (2009) for controls vs. MCI classification: 49 controls and 48 MCI patients (training) were obtained from the OASIS database (Marcus et al., 2007), and 94 controls and 57 MCI patients (test) came from ADNI (http://adni.loni.usc.edu/). They also obtained high accuracy scores on the test dataset (AUC = 0.95, sensitivity = 0.73, specificity = 0.94).
Changing the SVM kernels (linear vs. Gaussian) in Klöppel et al. had no effect on the outcome (Klöppel et al., 2008), whereas in Wee et al., a linear kernel obtained significantly lower accuracy than a Gaussian kernel (0.67 vs. 0.89) (Wee et al., 2011). However, Wee et al. (2011) suffers from leakage, as they selected features based on the entire dataset, as opposed to Klöppel et al. (2008). Similar leakage problems are also present in other studies, such as Haller et al. (2010) and Plant et al. (2010).
Another measure of robustness is mixing images obtained with different field strengths. Schmitter et al. mixed structural T1 images acquired at 1.5 T and 3 T and compared a wide range of conditions (controls vs. AD, controls vs. MCI, MCI vs. AD, MCI-nc vs. MCI-c at 2 years, MCI-nc vs. MCI-c at 3 years) (Schmitter et al., 2015). In line with the rest of the literature, they report the highest accuracy for controls vs. AD classification (0.883), and the lowest for MCI vs. AD (0.687). Similar accuracies were obtained for the MCI prognosis tasks (0.688 at 2 years, 0.698 at 3 years). Authors report that using 1.5 T and 3 T datasets independently yielded similar accuracy scores.
Westman et al. (not reported in Table 1) assessed whether conversion of MCI-c patients to AD could be predicted, depending on the time window. For 12, 18, 24 and 36 months, 82.9%, 86.4%, 75.4% and 68% of MCI subjects were identified as AD patients, respectively (Westman et al., 2012).
As mentioned in the Introduction, application of ML in neuroimaging involves extracting features from the raw images. Cuingnet et al. explored the changes in accuracy when using different image processing tools for a variety of binary classification problems: AD vs. controls, controls vs. MCI-c, and controls vs. MCI-nc within an 18-month time frame (Cuingnet et al., 2011). They report a sensitivity difference of up to 0.3 in some cases due exclusively to the imaging processing technique employed.
Moradi et al. report that feature selection can improve accuracy rates up to 5%. They also identify relevant features using a controls vs. AD classification task and then use those features for classifying MCI-nc vs. MCI-c, reaching accuracy scores of up to 0.745 (AUC = 0.766) (Moradi et al., 2015), effectively showing that regions affected by AD can be useful in MCI-nc vs. MCI-c classification. Similarly, Davatzikos et al. extract regions of importance from a cohort of AD and healthy controls (Fan et al., 2008b) and applies the obtained patterns to MCI-nc vs. MCI-c classification task, also obtaining high accuracy scores (ACC = 0.734) (Davatzikos et al., 2011). This overlap in regions of importance has also been reported elsewhere (Aguilar et al., 2013; Cuingnet et al., 2011; Desikan et al., 2009; Westman et al., 2012).
Sørensen won the CADD Dementia challenge by building a multi-class LDA classifier to differentiate control, MCI, and AD simultaneously. They trained the classifier using data from more than 600 subjects from a combination of different datasets to obtain a multi-class accuracy of 0.63 on the unobserved CaDD Dementia test dataset (Sørensen et al., 2017). They further report on the effect of the size of the training dataset as well as complexity of the classifier on the performance of the classifier (Sørensen et al., 2017).
In the papers reviewed here, we found few examples of stacking/ensembling techniques. For instance, Moradi et al. first created a classifier with the imaging data and then used its output, along with age and behavioral data, as inputs to a random forest (Moradi et al., 2015). Zhang et al. combined different data sources with different SVM kernels (Zhang et al., 2011). Liu et al. also used multiple weak classifiers and combined their answers to produce a final result (Liu et al., 2012). Ingalhalikar et al. used this technique for a different application: to cope with missing data; different classifiers were created per subject, depending on the subset of data missing, and their outputs were merged afterwards (Ingalhalikar et al., 2014).
Table 2 summarizes the relevant GM and WM regions reported for the classification tasks in the reviewed literature. Note that different studies have followed different methodologies, some of which include selecting features based on the entire dataset, therefore creating variables that are informative at the group level, but not necessarily at the individual level. Having said that, this table paints a clear picture of AD: hippocampus, temporal lobes, amygdala, parahippocampal gyrus, middle temporal gyrus, entorhinal cortex and insula are the most important GM regions for the classification task. While fewer studies have used DTI, recent findings report microstructural WM changes and impaired connectivity as key factors leading to cognitive failure in AD. Changes in FA and mean diffusivity (MD) appear early in the disease and seem to be independent of GM changes in the medial temporal lobe (Fletcher et al., 2014; Lacalle-Aurioles et al., 2016). Decreased FA and increased MD have been described in preclinical phases of AD, when individuals are still cognitively normal; however, they have not been used in ML classification tasks at these stages (Fletcher et al., 2013).
Table 2.
Informative regions (GM and WM) the classification tasks in AD. This table does not make any distinction regarding the cohorts involved in the classification (AD, MCI, controls), as it has been shown that affected regions are similar for AD and MCI.
4.2. Autism
Autism spectrum disorders (ASD) are a series of developmental brain disorders defined by impairment in social interaction, verbal and non-verbal communication and repetitive behavior (Lewis et al., 2013). A few of the works reviewed here use the autism diagnostic observation schedule (ADOS) as a continuous clinical score instead of a binary label (autistic/control). Table 3 shows a summary of the papers reviewed in this section.
Table 3.
Summary of the classification papers in autism. Unless otherwise noted, reported accuracy rates are the highest found in the paper for different groups, methods and input modalities.
| Ref | Groups (N) | Method | Input Modalities | Accuracy | Comments |
|---|---|---|---|---|---|
| (Zhou et al., 2014) | Controls (153)-Autism (127) | Multiple | T1, rs-fMRI | 0.7 | Leakage: features selected on the full dataset. Uses 67 different classifiers from the WEKA toolbox. |
| (Ecker et al., 2010a) | Controls (20)-Autism (20) | SVM (linear) | T1 | 0.9 | No hyperparameter search (fixed C = 1). |
| (Sato et al., 2013) | Controls (84)-Autism (82) | SVR (Gaussian) | T1 | r = 0.362 | Predict ADOS scores instead of clinical state as a binary class problem. No hyperparameter search (fixed γ). |
| (Uddin et al., 2011) | Controls (24) - Autism (24) | SVM (Gaussian) | T1 | 0.92 | Analysis is done per individual region |
| (Libero et al., 2015) | Controls (18) - Autism (19) | Decision tree | T1, DTI, spectroscopy | 0.919 | Possible leakage: […] data points included were the significant resulting values of the statistical analyses of separate neuroimaging modalities |
| (Ingalhalikar et al., 2014) | Controls (42)-Autism (93) | LDA ensemble | MEG, DTI | 0.83 | Final accuracy rates are the result of ensembling LDA classifiers that use different combinations of input data. |
| ASD/LI+ (36)-ASD/LI- (57) | 0.7 | ||||
| (Wee et al., 2014) | Controls (59)-Autism (58) | SVM (multi-kernel) | T1 | 0.963 | |
| (Ecker et al., 2010b) | Controls (22)-Autism (22) | SVM (linear) | T1 | 0.81 | No hyperparameter search (fixed C = 1). |
| (Lange et al., 2010) | Control (30 + 7)-Autism (30 + 12) | QDA | DTI | 0.916 | Independent test set |
Zhou et al. used T1 metrics and network measurements from functional connectivity (rs-fMRI studies) and achieved an accuracy of 0.70. The methodology suffers from leakage (the features have been extracted using the full dataset and not in the cross-validation loop) (Zhou et al., 2014). Ecker et al. report accuracy rates of up to 0.9 when using CT metrics for the left hemisphere. This accuracy drops to 0.6 for the right hemisphere (Ecker et al., 2010a). This significant lateralization is also seen in (Sato et al., 2013), where CT values in the left hemisphere are better predictors of ADOS scores than those of the right hemisphere (rLeft = 0.29 vs. rRight = 0.072, rBoth = 0.362). A similar effect is also reported in (Uddin et al., 2011), where analyses were made per individual region. In another related work by the same group, patients with higher ADOS scores were found to be further from the optimal hyperplane when a linear-kernel SVM was used for binary classification (Ecker et al., 2010a).
As ASD is a heterogeneous disorder, it has been attempted to fine-tune the definition of the labels to include some of the most relevant symptoms, such as language impairment (ADS/LI+) (Ingalhalikar et al., 2014). However, similar to the case of MCI classification in AD, this task is much more challenging. They obtained an accuracy rate of 0.83 for ASD vs. controls, and 0.7 for ASD/LI vs. ASD/LI+. Additionally, they use model ensemble methods to compensate for missing data.
We have not included a table of relevant regions for this classification problem, since these effects seem to be very broadly spread through the brain in ASD. In addition to the lateralization effect, Wee et al. found that GM values in subcortical regions achieve higher accuracies than cortical regions (Wee et al., 2014).
4.3. Multiple sclerosis
As with AD, there is a distinction between healthy controls, clinically isolated syndrome (CIS) (Miller et al., 2012) and fully developed multiple sclerosis (MS). Similarly, classification tasks involving CIS are more challenging. Furthermore, CIS patients have a certain probability of developing MS within a given time window, which is another element to consider. Few papers (summarized in Table 4) have used structural differences for classification in MS. Instead, ML applications have been more heavily focused on the automatic segmentation of WM lesions. This is probably due to the fact that MS diagnosis can be easily made by detecting WM lesions directly from images, and the automatic labeling of those regions is the most challenging part of the problem (García-Lorenzo et al., 2013; Lladó et al., 2012).
Table 4.
Summary of the classification papers in MS. Unless otherwise noted, reported accuracy rates are the highest found in each paper for different groups, method and input modalities.
| Reference | Groups (N) | Method | Input Modalities | Accuracy | Comments |
|---|---|---|---|---|---|
| (Wottschel et al., 2015) | CIS (74) - longitudinal (1 year) | SVM (polynomial) | T2, PD, clinical, demographic | 0.714 | |
| CIS (74) - longitudinal (3 years) | 0.680 | ||||
| (Bendfeldt et al., 2012) | Early MS (17) - late MS (17) | SVM (linear) | T1, T2 | 0.85 | |
| Low lesion load MS (20)-High lesion load MS (20) | 0.83 | ||||
| Benign MS (13)-Non-benign MS (13) | 0.77 | ||||
| (Weygandt et al., 2011) | Controls (26) – MS (41) | SVM (linear) | T1, T2 | 0.96 | |
| (Weygandt et al., 2015) | Controls (15 + 15)-EOPMS (15 + 16) | Logistic regression | T2 | 0.867* | Each voxel individually tested. 2 groups of subjects matched differently (lesion load, gender, & disease duration or age). |
| Controls (15 + 15)-LOPMS (16+ 17) | 0.871* | ||||
| EOPMS (15 + 16)-LOPMS (16 + 17) | 0.807* |
Weygandt et al. used T1 and T2 images to segment the brain into three different regions (lesions and normal-appearing GM and WM) and obtained accuracy rates of up to 0.96 when using lesion information, but also of 0.84 and 0.91 when using normal-appearing regions (GM and WM, respectively) (Weygandt et al., 2011). In a later work, they also obtained high accuracy rates (0.87) when classifying healthy controls vs. early and late-onset pediatric MS (Weygandt et al., 2015). The classification accuracy when comparing the two MS groups was lower (0.807).
Bendfeldt et al. explored classifiers that distinguish between MS subgroups (early or late MS, low WM-lesion load or high WM-lesion load, and benign or non-benign MS) using T1 and T2 data as inputs for linear SVMs. They obtained accuracy rates of 0.85, 0.83, and 0.77, respectively, using GM information alone (Bendfeldt et al., 2012).
Wottschel et al. used 74 subjects at onset of CIS to predict which subjects would develop MS at 1 and 3 years using lesion metrics (count, load, intensity, …), imaging data and clinical and demographic features (Wottschel et al., 2015). Their results (accuracy scores of 0.714 and 0.68 for 1 and 3 years, respectively), show that the further the time horizon, the harder the classification problem. Also, the optimal feature combinations at 1 year (lesion load, type of presentation, gender) were completely different from the optimal features for the 3-year prediction task (lesion count, average lesion intensity on PD images, average distance of lesions from the center of the brain, shortest horizontal distance of a lesion from the vertical axis, age and Expanded Disability Status Scale (EDSS) at onset).
In terms of region importance, middle frontal gyrus was the most informative in Weygandt et al. (2015), whereas Bendfeldt et al. (2012) found relevant regions in cortical areas of all the cerebral lobes, as well as thalamus and caudate.
4.4. Parkinson's disease and related disorders
As in some of the previous cases, what initially looks like a binary problem can be further complicated by the introduction of intermediate states or other conditions that are commonly mistaken with the principal disease or disorder. In the case of idiopathic Parkinson's Disease (IPD, or PD), Progressive Supranuclear Palsy (PSP) and Multiple System Atrophy (MSA) have similar motor symptoms, but they also progress faster and are less responsive to treatment (Filippone et al., 2012; Salvatore et al., 2014). Collectively, these are referred to as Parkinsonian disorders or Parkinsonian Plus Syndromes (Duchesne et al., 2009). A Parkinsonian (MSA-P) and a cerebellar variant of MSA (MSA-C) are distinguished based on clinical presentations (Schulz et al., 1994; Wenning et al., 1994). Recently, another group referred to as SWEDD (Scans Without Evidence of Dopaminergic Deficit) has been added. SWEDD subjects show PD symptoms, but without any dopamine deficiency in their PET scan. Classification tasks in PD therefore include all these disorders as well as the possible combinations including healthy controls. Here we also review papers that use a multiclass approach (i.e. instead of binary classifications, more than two different labels are learned simultaneously) (Filippone et al., 2012; Marquand et al., 2013). Table 5 shows a summary of the relevant findings.
Table 5.
Summary of the classification papers in PD.
| Ref | Groups (N) | Method | Input Modalities | Accuracy | Comments |
|---|---|---|---|---|---|
| (Focke et al., 2011) | Controls (22) - PD (21) | SVM (linear) | T1 | 0.42 | Default C hyperparameter (C = 1). F-contrast computed using the whole sample applied as weight. |
| Controls (22) - PSP (10) | 0.937 | ||||
| Controls (22) - MSA (11) | 0.788 | ||||
| MSA (11) - PD (21) | 0.719 | ||||
| MSA (11) - PSP (10) | 0.762 | ||||
| PD (21) - PSP (10) | 0.968 | ||||
| (Cherubini et al., 2014) | PD (57) - PSP (21) | SVM (kernel not specified) | T1, T2, DTI | 1 | F-contrast computed using the whole sample applied as weight. |
| (Skidmore et al., 2015) | Controls (22) - PD (20) | Bootstrap | DTI | 0.901 | |
| (Marquand et al., 2013) | PSP (17), PD (14), MSA (19) | Multinomial logit | T1 | 0.917 | |
| Controls (19), PSP (17), PD (14), MSA (19) | 0.736 | ||||
| PSP (17), PD (14), MSA-C (7), MSA-P (12) | 0.845 | ||||
| Controls (19), PSP (17), PD (14), MSA-C (7), MSA-P (12) | 0.662 | ||||
| (Filippone et al., 2012) | Controls (14), PD (14), PSP | Multinomial logit | T1, T2, DTI | Brier = 0.753 | Highest multiclass error score (Brier) obtained using GM only. |
| (16), MSA (18) | |||||
| (Salvatore et al., 2014) | Controls (28) - PD (28) | SVM (linear kernel) | T1 | 0.927 | Not mentioned how hyperparameters were tuned. |
| Controls (28) - PSP (28) | 0.970 | ||||
| PD (28) - PSP (28) | 0.982 | ||||
| (Duchesne et al., 2009) | PD (16) - PSP (8) + MSA (8) | SVM | T1 | 0.906 | PCA transformation applied on 149 healthy controls. No mention on the type of kernel or how hyperparameters were tuned. |
| (Haller et al., 2012) | PD (17) - Other (23) | SVM (Gaussian kernel) | DTI | 0.975 | Heterogeneous “Other” containing patients with different diseases, including MSA and PSP. |
| (Haller et al., 2013) | PD (16) - Other (20) | SVM (Gaussian kernel) | SWI | 0.869 | Same considerations as for (Haller et al., 2012). |
Focke et al. obtained high accuracy rates for controls vs. PSP and PD vs. PSP classifications by using WM voxel values (processed with SPM) as input features (Focke et al., 2011). GM values yielded much lower accuracies. Similarly, in Cherubini et al., WM values alone achieved a perfect classification score (accuracy = 1) (Cherubini et al., 2014). However, it is important to note that in both cases, F-contrast values were applied as weights for the input voxels out of the cross-validation loop. This could be considered leakage, as this importance metric was computed using the whole sample. Also, the reported WM areas were mainly in the brainstem, where the GM appears as small nuclei surrounded by WM (e.g. substantia nigra pars compacta). When using VBM smoothing kernels, these nuclei can appear inside the WM probabilistic mask since the WM signal includes information from both WM and these nuclei.
Both Filippone et al. (2012) and Marquand et al. (2013) directly build multiclass classifiers. Filippone et al. applied a multinomial logit classifier to a cohort of 62 subjects (14 healthy controls, 14 PDs, 16 PSPs, 18 MSAs) (Filippone et al., 2012). Marquand et al., from the same research group, applied it to a different population and with two variations: a) either healthy controls were included or not in the given classifiers; and b) the MSA cohort was further divided into MSA-P and MSA-C (Marquand et al., 2013). Including healthy controls in the multiclass environment lowered the overall accuracy scores (Marquand et al., 2013). Focke et al. attribute this to inconsistencies in VBM processing (Focke et al., 2011).
In summary, the reviewed results imply that PD, PSP and MSA affect different brain regions, even if their symptoms are similar. Relevant regions are summarized on Table 6.
Table 6.
Informative regions (GM and WM) for PD classification tasks.
| Region | References | Number |
|---|---|---|
| Gray Matter | ||
| Rectal gyrus | (Skidmore et al., 2015) | 1 |
| Middle cingulate | (Skidmore et al., 2015) | 1 |
| Left Putamen | (Skidmore et al., 2015) | 1 |
| Right Putamen | (Skidmore et al., 2015) | 1 |
| Thalamus | (Haller et al., 2013; Salvatore et al., 2014; Skidmore et al., 2015) | 3 |
| Pons | (Salvatore et al., 2014) | 1 |
| Midbrain | (Marquand et al., 2013; Salvatore et al., 2014) | 2 |
| Brainstem | (Filippone et al., 2012; Marquand et al., 2013) | 2 |
| Caudate | (Filippone et al., 2012; Haller et al., 2013) | 2 |
| Putamen | (Filippone et al., 2012) | 1 |
| Precuneus | (Focke et al., 2011) | 1 |
| Basal ganglia | (Marquand et al., 2013) | 1 |
| Cerebellum | (Marquand et al., 2013) | 1 |
| White Matter | ||
| Corpus callosum | (Salvatore et al., 2014) | 1 |
| Brainstem | (Cherubini et al., 2014; Focke et al., 2011) | 2 |
| Mesoencephalon | (Focke et al., 2011) | 1 |
| Right frontal WM | (Haller et al., 2012) | 1 |
4.5. Other
Here we have included diseases or disorders for which we have not found a high number of publications, or in some cases those for which monographic reviews have been published recently.
4.5.1. Attention deficit hyperactivity disorder
Iannaccone et al. used both functional and structural imaging to study differences in a cohort of 20 attention deficit hyperactivity disorder (ADHD) patients and 20 healthy controls (Iannaccone et al., 2015). Using only T1 data processed with SPM and a linear SVM (fixed C = 1) they did not obtain a statistically significant accuracy rate (0.611). Lim et al. also used GM information from T1 images (processed with SPM) and a Gaussian process classifier (GPC) and obtained an accuracy of 0.793 for a cohort of 29 ADHD patients and 29 healthy controls (Lim et al., 2013). Finally, Peng et al. achieved up to 0.902 accuracy rates using extreme learning (a neural network variant) using cortical features from T1 data (thickness, surface, folding, curvature, volume) in a cohort of 55 ADHD subjects and 55 healthy controls. However, their feature selection was performed outside of the cross-validation loop (Peng et al., 2013). See also Eloyan et al. for a similar work on same dataset (Eloyan et al., 2012) (for more information, see Section 6 in this paper).
4.5.2. Depression
Johnston et al. studied 20 subjects with treatment-refractory depression (TRD) and 21 healthy controls (Johnston et al., 2015). A binary SVM (Gaussian kernel) classifier was able to obtain accuracy rates of 0.85 using T1 images as input. However, it was not possible to produce predictive systems for the level of resistance to treatment. Foland-Ross et al. also used GM information (CT) to separate healthy adolescent girls (n = 15) from those who suffered an initial onset of depression (n = 18) within a 5-year window using linear SVM and obtained an accuracy of 0.7 (Foland-Ross et al., 2015).
As for WM information, using DTI studies, Qin et al. studied network architecture from 29 depressive patients and 30 healthy controls (Qin et al., 2014). Nodal strength, local clustering coefficient, nodal betweenness centrality and nodal global efficiency, for nodes defined in the AAL atlas, were used as input features. Maximum relevance features selection (mRMR) was used to select relevant features in the whole sample (leakage). Under these conditions, a Gaussian SVM obtained a highest accuracy of 0.831. Using a similar approach, Sacchet et al. also used graph theory-related features (assortativity, global flow coefficient, global total flow, global efficiency, characteristic path length, transitivity and small-worldness) as inputs for a linear SVM to distinguish between 14 women with major depressive disorder and 18 healthy controls, obtaining an accuracy of 0.712 (Sacchet et al., 2015).
4.5.3. Schizophrenia
For schizophrenia, we refer to recently published reviews that analyze the use of ML algorithms in the context of this disorder in detail. Similar to AD, there are 3 prediction problems of interest in the context of schizophrenia: i) classifying schizophrenia patients versus healthy controls, ii) diagnosing schizophrenia in populations at high risk from baseline scan information, iii) prediction of disease progression, transition to schizophrenia, or response to treatment. Zarogianni et al. provide an extensive review on predictive classifiers for schizophrenia based on either structural or functional MRI, not only focusing on binary predictions, but also devoting a section to disease progression and treatment response (Zarogianni et al., 2013). They report accuracies in the range of 81–91.8% for classifying schizophrenia patients versus healthy controls using sMRI, with the majority of the studies using SVMs for classification. For diagnosing schizophrenia, the reviewed studies have used fMRI as well as sMRI, initially using ICA for dimensionality reduction and mostly SVM and Random Forests for classification, reporting accuracies in the range of 61.8–95%. Fewer studies have attempted to predict transition to schizophrenia and response to treatment, with one study reporting an accuracy of 85% in classifying responders using EEG data and a kernel partial least squares regression technique, and three studies reporting accuracies of 82–84.2% in differentiating transition to schizophrenia, all using SVMs. They conclude that the higher classification accuracy in the first problem (i.e. diagnosing schizophrenia versus healthy controls) is due to the more distinct differences in their neuroanatomical and functional patterns, which is not the case in within group predictions in subjects that do or do not show an specific outcome of interest (Zarogianni et al., 2013). In a more general review, Dazzan also includes a small section on how to use brain structure at illness onset to produce predictions at the individual level (Dazzan, 2014).
4.5.4. Traumatic brain injury
We found two studies that build predictive models for traumatic brain injury (TBI), both (Fagerholm et al., 2015; Lui et al., 2014) employing mRMR for feature selection on the entire sample prior to any cross-validation loop (leakage). Lui et al. used T1, DTI and rs-fMRI data for 23 TBI patients and 25 healthy controls, and tested several different classifying algorithms; they obtained an accuracy of 0.86 with a multilayer perceptron (neural network) using only relevant variables, and 0.80 with a Bayesian network using all variables (Lui et al., 2014). Fagerholm et al. used only DTI information, obtaining 24 different graph metrics and an accuracy of 0.934 with a linear SVM (Fagerholm et al., 2015).
4.5.5. Stroke
In the context of stroke, machine learning has been used to classify stroke patients versus normal controls, or predict post-stroke functional impairment or treatment outcome. Rehme et al. used DTI and resting state fMRI data information and a linear SVM to classify stroke patients vs. normal controls (accuracy = 0.826), and predict motor impairment after stroke (accuracy = 0.876). They also used information from DWI lesion maps to differentiate stroke patients with or without hand motor impairment, but with a relatively low sensitivity (accuracy = 0.738, sensitivity = 0.50), concluding that resting state fMRI is more useful in predicting behavioral deficits than DTI (Rehme et al., 2014). Bently et al. used CT information in combination with clinical variables and an SVM with a multi-layer perceptron kernel to predict whether or not to administer thrombolysis, a treatment that can result in better recovery or deterioration due to intracranial haemorrhage (AUC = 0.744) (Bentley et al., 2014).
4.5.6. Miscellanea
4.5.6.1. Anorexia nervosa (AN)
Lavagnino et al. used a LASSO regression to classify 15 patients with AN and 15 healthy controls using T1 information (processed with FreeSurfer), obtaining a accuracy of 0.833 (Lavagnino et al., 2015).
4.5.6.2. Bipolar disorder (BD)
Hajek et al. obtained an accuracy of 0.689 with a linear SVM (fixed C = 1) when differentiating 45 healthy subjects from 45 high-risk offsprings from subjects with BD (Hajek et al., 2015). Only using WM intensities (T1 scans, processed with SPM) yielded significant accuracies. Similarly, 36 healthy controls were distinguished from 36 BD patients with an accuracy of 0.597. In all experiments, subjects were matched by age and sex.
Lastly, we reference Sabuncu and Konukoglu, an empirical review that applies several ML algorithms to different data sets for a variety of diseases and disorders in a standardized way to build a gold standard that can be used to compare the accuracy of future new approaches (Sabuncu et al., 2015).
5. Discussion
In this review, we compile an extensive summary on ML techniques in the field of neuroimaging, from cross-validation analyses to specific applications in different diseases or disorders using structural modalities. We have attempted to include a wide-reaching sample that will help the reader get a precise grasp of the current state of the art. ML applications in this field are different from applications in other areas such as spam (or credit card fraud) detection. In the clinical case, practical application of ML does not simply aim to achieve the highest accuracy scores possible, as is the case when filtering spam e-mails, for instance. While it is undoubtedly preferable to obtain higher accuracy rates, in neuroscience, it is more relevant to study which features are informative for the classification task of interest as well as their corresponding biological interpretations (see for instance (Carbonell et al., 2015)).
A common pattern seen in the literature is that classification tasks are almost never purely binary in nature. While two-class approximations are still relevant (AD vs. controls, MS vs. controls, PD vs. controls, etc.), including intermediate (MCI, CIS, etc.) or related (PSP, MSA, etc.) states can add another level of complexity. In practical terms, the common solution is to opt for multiple binary comparisons (AD vs. controls, controls vs. MCI, AD vs. MCI), each of which can be solved by developing a separate classifier whose performance is assessed individually. Only in a few cases (e.g. (Filippone et al., 2012; Marquand et al., 2013)) has a multiclass approach been used. While binary approaches provide useful information about underlying biological mechanisms, from a clinical point of view, multiclass approaches might be more insightful, as a binary classifier would require to eliminate a priori all potential clinical labels but two, a process that is not always practical. This is further complicated by the fact that many disorders are spectrum disorders, and therefore a binary variable may not completely capture their underlying subtleties.
The best classification performances were obtained when differentiating between normal controls versus patients in various diseases (e.g. AD, autism, MS, PD) with accuracies higher than 0.9, suggesting the existence of brain patterns and structures identifiable on MRI that are significantly different between the diseased population and normal controls and can be reliably used for differentiating these groups (Table 7). Unfortunately, the accuracies were much lower (generally around 0.7) when attempting to differentiate between progressive and stable patients (e.g. MCI-c and MCI-nc in AD), although these problems are of higher clinical interest. While the reviewed studies provide valuable benchmarks for classification accuracy, in practice, there's still a need for double-blind experiments. Clinical trials in which the predictions are made before the actual outcome (e.g. conversion to AD) has been observed can provide confirmatory evidence for the clinical use of the prognosis models. Additionally, challenges that are administered by a different research group and provide only the necessary MRI and clinical data without the outcomes of interest on a preserved test dataset (e.g. MICCAI conference TADPOLE challenge: https://tadpole.grand-challenge.org/) would also ensure that the results are not influenced by leakage or overfitting the models.
Table 7.
Summary of the studies differentiating between normal control and patients.
| Disease | Methods | Input Modalities | Accuracy |
Number of Studies | |
|---|---|---|---|---|---|
| Mean | Min - Max | ||||
| Alzheimer's Disease | SVM, OPLS, Random Forests | T1, PET, DTI, CSF | 0.897 | 0.82–0.965 | 19 |
| Autism | SVM, Decision Tree, LDA, QDA | T1, DTI, Spectroscopy | 0.867 | 0.70–0.963 | 8 |
| Multiple Sclerosis | SVM, Logistic Regression | T1, T2 | 0.915 | 0.871–0.96 | 2 |
| Parkinson's Disease | SVM, Bootstrap, Multinomial Logit | T1, T2, DTI | 0.7472 | 0.42–0.927 | 5 |
| Attention Deficit Hyperactivity Disorder | SVM, Gaussian Process Classifier | T1 | 0.8475 | 0.793–0.902 | 2 |
| Depression | SVM | T1, DTI | 0.7477 | 0.70–0.831 | 3 |
Table 7 compares the results of the studies that classify normal controls versus patients. While generalizations made based on such small sample sizes and numbers of studies should be taken into consideration with care, the number of the studies in each field seem to reflect the current view on the structural nature of diseases (as can be detected on MRIs). The neurodegeneration pattern that is characteristic of AD seems to be a very good indicative for clinical diagnosis. On the other hand, most studies that attempt to make diagnosis for Schizophrenia use functional modalities, which might hint at the insufficiency of structural MRI for such predictions. Another factor that needs to be considered is the very different sample sizes across studies in different diseases, e.g. AD studies generally have much larger sample sizes. While this is influenced by the disease prevalence as well as financial funding allocations, the amount of available data on AD can considerably facilitate studies in this field.
Sometimes, certain processing techniques can be tuned for a specific disease or disorder in order to take into account some aspect that could improve the overall accuracy. Take for example Dukart et al., who removed age effects when noticing an age difference in misclassified individuals depending on the cohort (AD or healthy subjects) (Dukart et al., 2011).
We acknowledge there are limitations to the present study. First, this review focuses on analyses that employ only structural MRI data (T1, T2, and DWI), while we have also included works that have used other imaging modalities, either in isolation or in combination with structural MRI. However, in practice, other imaging modalities as well as a battery of clinical tests and measurements are acquired which can provide informative features that might significantly improve the classifications. For example, in the case of converter versus non-converter MCI subjects in the context of AD prediction, using the baseline clinical information significantly improves the prediction accuracy (Moradi et al., 2015). Since different studies acquire different MRI modalities and clinical information, we were not able to compare them across all modalities and measures. However, we have reported the other modalities and measures that have been used (e.g. fMRI, PET, clinical measures, etc.) for each study. Additionally, it has to be noted that in some cases (e.g. (Dyrba et al., 2015)), the inclusion of additional modalities does not increase accuracy. More features can result in more information, but also in more noise and confounding factors (Duda et al., 2001).
It should be taken into account that the output labels (the clinical state for each subject) may be an approximation, as there may not be a precise one-to-one correspondence between a given metric (e.g. ADOS) and a binary clinical outcome. Also, the clinical diagnosis might contain errors (Cherubini et al., 2014), and therefore it would impossible to obtain a perfect classification.
In some cases, diseases or disorders encapsulate a gradient of symptoms and causes. Additionally, the outcome of interest might be the amount or rate of change in a given metric (e.g. cognitive or motor function) rather than simply whether the subject declines or not. Such problems would be better studied using continuous regression techniques and not discrete classifications. A number of regression techniques have been used for estimating continuous clinical variables using neuroimaging data, such as linear regression and support vector regression (Duchesne et al., 2009, Duchesne et al., 2005; Hope et al., 2013; Rondina et al., 2016; Stonnington et al., 2010; Wang et al., 2010; Zhang et al., 2011). However, this review focuses on predictions that can be formulated as a binary classification task.
Another point that is worthwhile mentioning is the outcome of interest, which can be different for different prediction problems and in different populations in the clinical setting. For example, positive predictive value (the percentage of correct positive predictions over all positive predictions) or negative predictive value (the percentage of correct negative predictions over all negative predictions) might have more clinical relevance in specific cases. For the purpose of consistency and since it is the most commonly used measure across papers, here we report classification accuracy which reflects the percentage of both negative and positive correct predictions over all predictions.
While there are hundreds of different ML algorithms (Fernández-Delgado et al., 2014), there is undoubtedly a preponderance of SVMs in the neuroimaging literature (Table 7) (Liu et al., 2012). This goes so far as some reviews (e.g. (Salvatore et al., 2014; Veronese et al., 2013)) are centered exclusively on using SVMs for predictive purposes. While this can be attributed to the fact that previous experience greatly influences the choice of a certain algorithm, it is also true that SVMs behave robustly in the typical conditions of a neuroimaging problem: i.e. many more variables than available subjects (in some cases, these differences are of several orders of magnitude) (Lemm et al., 2011; Zarogianni et al., 2013). It has to be also taken into account, however, that other techniques have also been employed with comparable results (see Table 1, for instance) and that feature selection techniques, when correctly applied to avoid leakage, are extremely useful in reducing the dimensionality of the problem.
A fraction of the works included in this review report potentially overly-optimistic results (leakage), due to the fact that informative variables were selected outside of the cross-validation loop. This feature selection step was performed using statistical techniques that assess group differences in the entire sample, for instance, or used other types of filtering procedures that relied on the class labels of the test set to perform dimensionality reduction. As discussed before (Section 3.2.1), this should be avoided, as it might produce biased results.
Leakage effect is especially important when the population size is small and diminishes as the sample size grows. Kohavi and John reported this effect to be less concerning when the dataset contains more than 250 instances (Kohavi and John, 1997). However, this number is also dependent on the choice of classifier and the number of features used. Almost all the studies reviewed here have sample sizes smaller than 250, which makes the leakage issue more prominent. It would be interesting to compare different studies in the same domain to assess whether the reported accuracies are significantly different in cases where leakage occurs. However, since different studies are based on different populations and features, drawing meaningful comparisons is not feasible in practice.
Another challenge that can reduce the generalizability of classification models to new data and consequently their applicability to clinical practice is the inherent heterogeneity in neuroimaging datasets. Imaging data from different scanners and acquisition protocols can sometimes have very different contrasts and parameters. As a result, the estimated performances and classifier accuracies may only be reliable when applied to data from similar scanners and with similar acquisition parameters as the training dataset. Several preprocessing pipelines have been developed to deal with such variabilities, such as the SPM, FSL, and MINC tools (Aubert-Broche et al., 2013; Jenkinson et al., 2012; Penny et al., 2011). In addition, to increase the generalizability of the results, models are generally trained on multi-site and multi-scanner datasets (such as ADNI, PPMI, etc.).
Throughout this article, we have reviewed papers typically written in research institutions by domain experts: either scientists that have a close contact with clinical environments or more technically-oriented individuals who, also in a clinical or biomedical context, find in these datasets the opportunity to apply and improve their current algorithms. In recent years, non-domain experts (pure ML engineers, mathematicians, etc.) have also had access to datasets already processed, and have attempted to solve these classification challenges. From this point of view, websites such as Kaggle (https://www.kaggle.com) do a great job in gathering ML experts around a very heterogeneous set of problems. This has been the case, for instance, with the IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2014 Schizophrenia Classification Challenge (https://www.kaggle.com/c/mlsp-2014-mri), the American Epilepsy Society Seizure Prediction Challenge (https://www.kaggle.com/c/seizure-prediction), or the Predict HIV progression Challenge (https://www.kaggle.com/c/hivprogression). These challenges typically work in the following way: datasets are provided for both a training set and a test set. The output labels (clinical classifications) are also provided for the training set, and the competitors have to produce their predictions for the test set, with any technique they wish; the only common restriction is to use technologies with open-source licenses. Submissions are evaluated according to a certain metric (for instance, the AUC score) and the different teams are ranked in a classification table (leaderboard). To prevent competitors from training their predictors to simply increase the final scores in the leaderboard, these scores are computed on an unknown portion of the test set. These competitions normally take several months to complete and some have associated monetary prizes for the highest ranked teams.
Iannaccone et al. present a similar application in their Introduction: The ADHD-200 Global Competition (http://fcon_1000.projects.nitrc.org/indi/adhd200/results.html) (Iannaccone et al., 2015). In this challenge, functional and structural imaging and demographic and behavioral data was provided with the aim of producing individual clinical predictions for ADHD subjects. Interestingly, the highest accuracy (0.625) was obtained by one of the competing teams using only age, sex, handedness, and IQ, and no imaging information. Other recent examples include the Medical Image Computing and Computer Assisted Intervention (MICCAI) 2014 Machine Learning Challenge: Predicting Binary and Continuous Phenotypes from Structural Brain MRI Data (https://competitions.codalab.org/competitions/1471) and its sister challenge CADDementia (Bron et al., 2015).
ML algorithms are powerful tools that can be used to solve many different problems. In the field of neuroscience, these powerful tools can not only help us build predictive systems for diagnosis and prognosis, but can also be used to advance and deepen our knowledge about the underlying biological mechanisms of diseases and disorders.
Acknowledgments
Authors have received funding from the Brain Canada Foundation (grant 238990) and Fondation Marcelle et Jean Coutu (grant 241177). Authors also wish to thank Dr. Penelope Kostopoulos for her insightful comments.
References
- Ad-Dab'bagh Y., Lyttelton O., Muehlboeck J.S., Lepage C., Einarson D., Mok K., Ivanov O., Vincent R.D., Lerch J., Fombonne E. Proceedings of the 12th Annual Meeting of the Organization for Human Brain Mapping. Florence, Italy. 2006. The CIVET image-processing environment: a fully automated comprehensive pipeline for anatomical neuroimaging research; p. 2266. [Google Scholar]
- Aguilar C., Westman E., Muehlboeck J.-S., Mecocci P., Vellas B., Tsolaki M., Kloszewska I., Soininen H., Lovestone S., Spenger C. Different multivariate techniques for automated classification of MRI data in Alzheimer's disease and mild cognitive impairment. Psychiatry Res. Neuroimaging. 2013;212:89–98. doi: 10.1016/j.pscychresns.2012.11.005. [DOI] [PubMed] [Google Scholar]
- Ahmed O.B., Benois-Pineau J., Allard M., Catheline G., Amar C.B. Neurocomputing. Vol. 220. 2017. Recognition of Alzheimer's disease and mild cognitive impairment with multimodal image-derived biomarkers and multiple kernel learning; pp. 98–110. (Recent Research in Medical Technology Based on Multimedia and Pattern Recognition). [Google Scholar]
- Ahsen M.E., Boren T.P., Singh N.K., Misganaw B., Mutch D.G., Moore K.N., Backes F.J., McCourt C.K., Lea J.S., Miller D.S., White M.A., Vidyasagar M. Sparse feature selection for classification and prediction of metastasis in endometrial cancer. BMC Genomics. 2017;18:233. doi: 10.1186/s12864-017-3604-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albert M.S., Dekosky S.T., Dickson D., Dubois B., Feldman H.H., Fox N.C., Gamst A., Holtzman D.M., Jagust W.J., Petersen R.C. The diagnosis of mild cognitive impairment due to Alzheimer's disease: recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement. 2011;7:270–279. doi: 10.1016/j.jalz.2011.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ambroise C., McLachlan G.J. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. 2002;99:6562–6566. doi: 10.1073/pnas.102102699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ashburner J., Friston K.J. Voxel-based morphometry—the methods. NeuroImage. 2000;11:805–821. doi: 10.1006/nimg.2000.0582. [DOI] [PubMed] [Google Scholar]
- Aubert-Broche B., Fonov V.S., García-Lorenzo D., Mouiha A., Guizard N., Coupé P., Eskildsen S.F., Collins D.L. A new method for structural volume analysis of longitudinal brain MRI data and its application in studying the growth trajectories of anatomical brain structures in childhood. NeuroImage. 2013;82:393–402. doi: 10.1016/j.neuroimage.2013.05.065. [DOI] [PubMed] [Google Scholar]
- Auzias G., Takerkart S., Deruelle C. On the influence of confounding factors in multisite brain morphometry studies of developmental pathologies: application to autism Spectrum disorder. IEEE J. Biomed. Health Inform. 2016;20:810–817. doi: 10.1109/JBHI.2015.2460012. [DOI] [PubMed] [Google Scholar]
- Beheshti I., Demirel H. Probability distribution function-based classification of structural MRI for the detection of Alzheimer's disease. Comput. Biol. Med. 2015;64:208–216. doi: 10.1016/j.compbiomed.2015.07.006. [DOI] [PubMed] [Google Scholar]
- Bendfeldt K., Klöppel S., Nichols T.E., Smieskova R., Kuster P., Traud S., Mueller-Lenke N., Naegelin Y., Kappos L., Radue E.-W. Multivariate pattern classification of gray matter pathology in multiple sclerosis. NeuroImage. 2012;60:400–408. doi: 10.1016/j.neuroimage.2011.12.070. [DOI] [PubMed] [Google Scholar]
- Bentley P., Ganesalingam J., Jones A.L.C., Mahady K., Epton S., Rinne P., Sharma P., Halse O., Mehta A., Rueckert D. Prediction of stroke thrombolysis outcome using CT brain machine learning. NeuroImage Clin. 2014;4:635–640. doi: 10.1016/j.nicl.2014.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Betechuoh B.L., Marwala T., Tettey T. Autoencoder networks for HIV classification. Curr. Sci. 2006;91:1467. [Google Scholar]
- Boyle P.A., Wilson R.S., Aggarwal N.T., Tang Y., Bennett D.A. Mild cognitive impairment risk of Alzheimer disease and rate of cognitive decline. Neurology. 2006;67:441–445. doi: 10.1212/01.wnl.0000228244.10416.20. [DOI] [PubMed] [Google Scholar]
- Bron E.E., Smits M., Van Der Flier W.M., Vrenken H., Barkhof F., Scheltens P., Papma J.M., Steketee R.M., Orellana C.M., Meijboom R. Standardized evaluation of algorithms for computer-aided diagnosis of dementia based on structural MRI: the CADDementia challenge. NeuroImage. 2015;111:562–579. doi: 10.1016/j.neuroimage.2015.01.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bullmore E., Sporns O. The economy of brain network organization. Nat. Rev. Neurosci. 2012;13:336–349. doi: 10.1038/nrn3214. [DOI] [PubMed] [Google Scholar]
- Carbonell F., Zijdenbos A.P., Charil A., Grand'Maison M., Bedell B.J. Optimal target region for subject classification on the basis of amyloid PET images. J. Nucl. Med. 2015;56:1351–1358. doi: 10.2967/jnumed.115.158774. [DOI] [PubMed] [Google Scholar]
- Caruana R., Niculescu-Mizil A., Crew G., Ksikes A. Proceedings of the Twenty-First International Conference on Machine Learning. ACM; 2004. Ensemble selection from libraries of models; p. 18. [Google Scholar]
- Cherubini A., Morelli M., Nisticó R., Salsone M., Arabia G., Vasta R., Augimeri A., Caligiuri M.E., Quattrone A. Magnetic resonance support vector machine discriminates between Parkinson disease and progressive supranuclear palsy. Mov. Disord. 2014;29:266–269. doi: 10.1002/mds.25737. [DOI] [PubMed] [Google Scholar]
- Cuingnet R., Gerardin E., Tessieras J., Auzias G., Lehéricy S., Habert M.-O., Chupin M., Benali H., Colliot O., Initiative A.D.N. Automatic classification of patients with Alzheimer's disease from structural MRI: a comparison of ten methods using the ADNI database. NeuroImage. 2011;56:766–781. doi: 10.1016/j.neuroimage.2010.06.013. [DOI] [PubMed] [Google Scholar]
- Dash M., Liu H. Feature selection for classification. Intell. Data Anal. 1997;1:131–156. [Google Scholar]
- Davatzikos C. Why voxel-based morphometric analysis should be used with great caution when characterizing group differences. NeuroImage. 2004;23:17–20. doi: 10.1016/j.neuroimage.2004.05.010. [DOI] [PubMed] [Google Scholar]
- Davatzikos C., Fan Y., Wu X., Shen D., Resnick S.M. Detection of prodromal Alzheimer's disease via pattern classification of magnetic resonance imaging. Neurobiol. Aging. 2008;29:514–523. doi: 10.1016/j.neurobiolaging.2006.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davatzikos C., Bhatt P., Shaw L.M., Batmanghelich K.N., Trojanowski J.Q. Prediction of MCI to AD conversion, via MRI, CSF biomarkers, and pattern classification. Neurobiol. Aging. 2011;32 doi: 10.1016/j.neurobiolaging.2010.05.023. 2322–e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dazzan P. Neuroimaging biomarkers to predict treatment response in schizophrenia: the end of 30 years of solitude? Dialogues Clin. Neurosci. 2014;16:491. doi: 10.31887/DCNS.2014.16.4/pdazzan. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deng L., Yu D. Found. Vol. 7. Trends® Signal Process; 2014. Deep learning: methods and applications; pp. 197–387. [Google Scholar]
- Desikan R.S., Cabral H.J., Hess C.P., Dillon W.P., Glastonbury C.M., Weiner M.W., Schmansky N.J., Greve D.N., Salat D.H., Buckner R.L. Automated MRI measures identify individuals with mild cognitive impairment and Alzheimer's disease. Brain. 2009:awp123. doi: 10.1093/brain/awp123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dubois B., Feldman H.H., Jacova C., Cummings J.L., Dekosky S.T., Barberger-Gateau P., Delacourte A., Frisoni G., Fox N.C., Galasko D. Revising the definition of Alzheimer's disease: a new lexicon. Lancet Neurol. 2010;9:1118–1127. doi: 10.1016/S1474-4422(10)70223-4. [DOI] [PubMed] [Google Scholar]
- Duchesne S., Caroli A., Geroldi C., Frisoni G.B., Collins D.L. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2005. Predicting clinical variable from MRI features: application to MMSE in MCI; pp. 392–399. [DOI] [PubMed] [Google Scholar]
- Duchesne S., Rolland Y., Vérin M. Automated computer differential classification in parkinsonian syndromes via pattern analysis on MRI. Acad. Radiol. 2009;16:61–70. doi: 10.1016/j.acra.2008.05.024. [DOI] [PubMed] [Google Scholar]
- Duda R.O., Hart P.E., Stork D.G. A Wiley-Interscience Publication John Wiley & Sons Inc; 2001. Pattern Classification. [Google Scholar]
- Dukart J., Schroeter M.L., Mueller K., Initiative A.D.N. Age correction in dementia–matching to a healthy brain. PLoS One. 2011;6 doi: 10.1371/journal.pone.0022193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dyrba M., Grothe M., Kirste T., Teipel S.J. Multimodal analysis of functional and structural disconnection in Alzheimer's disease using multiple kernel SVM. Hum. Brain Mapp. 2015;36:2118–2131. doi: 10.1002/hbm.22759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Džeroski S., Ženko B. Is combining classifiers with stacking better than selecting the best one? Mach. Learn. 2004;54:255–273. [Google Scholar]
- Ecker C., Marquand A., Mourão-Miranda J., Johnston P., Daly E.M., Brammer M.J., Maltezos S., Murphy C.M., Robertson D., Williams S.C. Describing the brain in autism in five dimensions—magnetic resonance imaging-assisted diagnosis of autism spectrum disorder using a multiparameter classification approach. J. Neurosci. 2010;30:10612–10623. doi: 10.1523/JNEUROSCI.5413-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ecker C., Rocha-Rego V., Johnston P., Mourao-Miranda J., Marquand A., Daly E.M., Brammer M.J., Murphy C., Murphy D.G., Consortium M.A. Investigating the predictive value of whole-brain structural MR scans in autism: a pattern classification approach. NeuroImage. 2010;49:44–56. doi: 10.1016/j.neuroimage.2009.08.024. [DOI] [PubMed] [Google Scholar]
- Eloyan A., Muschelli J., Nebel M.B., Liu H., Han F., Zhao T., Barber A.D., Joel S., Pekar J.J., Mostofsky S.H. Automated diagnoses of attention deficit hyperactive disorder using magnetic resonance imaging. Front. Syst. Neurosci. 2012;6 doi: 10.3389/fnsys.2012.00061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fagerholm E.D., Hellyer P.J., Scott G., Leech R., Sharp D.J. Disconnection of network hubs and cognitive impairment after traumatic brain injury. Brain. 2015;138:1696–1709. doi: 10.1093/brain/awv075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan Y., Batmanghelich N., Clark C.M., Davatzikos C., Initiative A.D.N. Spatial patterns of brain atrophy in MCI patients, identified via high-dimensional pattern classification, predict subsequent cognitive decline. NeuroImage. 2008;39:1731–1743. doi: 10.1016/j.neuroimage.2007.10.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan Y., Resnick S.M., Wu X., Davatzikos C. Structural and functional biomarkers of prodromal Alzheimer's disease: a high-dimensional pattern classification study. NeuroImage. 2008;41:277–285. doi: 10.1016/j.neuroimage.2008.02.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernández-Delgado M., Cernadas E., Barro S., Amorim D. Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res. 2014;15:3133–3181. [Google Scholar]
- Filippone M., Marquand A.F., Blain C.R., Williams S.C., Mourão-Miranda J., Girolami M. Probabilistic prediction of neurological disorders with a statistical assessment of neuroimaging data modalities. Ann. Appl. Stat. 2012;6:1883. doi: 10.1214/12-aoas562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fischl B. FreeSurfer. NeuroImage. 2012;62:774–781. doi: 10.1016/j.neuroimage.2012.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fischl B., Dale A.M. Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proc. Natl. Acad. Sci. 2000;97:11050–11055. doi: 10.1073/pnas.200033797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fletcher E., Raman M., Huebner P., Liu A., Mungas D., Carmichael O., Decarli C. Loss of fornix white matter volume as a predictor of cognitive impairment in cognitively normal elderly individuals. JAMA Neurol. 2013;70:1389–1395. doi: 10.1001/jamaneurol.2013.3263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fletcher E., Carmichael O., Pasternak O., Maier-Hein K.H., Decarli C. Early brain loss in circuits affected by Alzheimer's disease is predicted by fornix microstructure but may be independent of gray matter. Front. Aging Neurosci. 2014;6:106. doi: 10.3389/fnagi.2014.00106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Focke N.K., Helms G., Scheewe S., Pantel P.M., Bachmann C.G., Dechent P., Ebentheuer J., Mohr A., Paulus W., Trenkwalder C. Individual voxel-based subtype prediction can differentiate progressive supranuclear palsy from idiopathic Parkinson syndrome and healthy controls. Hum. Brain Mapp. 2011;32:1905–1915. doi: 10.1002/hbm.21161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Foland-Ross L.C., Sacchet M.D., Prasad G., Gilbert B., Thompson P.M., Gotlib I.H. Cortical thickness predicts the first onset of major depression in adolescence. Int. J. Dev. Neurosci. 2015;46:125–131. doi: 10.1016/j.ijdevneu.2015.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fonov V., Evans A., McKinstry R., Almli C., Collins D. Unbiased nonlinear average age-appropriate brain templates from birth to adulthood. NeuroImage. 2009;47:S102. Organization for Human Brain Mapping 2009 Annual Meeting. [Google Scholar]
- Fonov V., Evans A.C., Botteron K., Almli C.R., McKinstry R.C., Collins D.L. Unbiased average age-appropriate atlases for pediatric studies. NeuroImage. 2011;54:313–327. doi: 10.1016/j.neuroimage.2010.07.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gabrieli J.D., Ghosh S.S., Whitfield-Gabrieli S. Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience. Neuron. 2015;85:11–26. doi: 10.1016/j.neuron.2014.10.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- García-Lorenzo D., Francis S., Narayanan S., Arnold D.L., Collins D.L. Review of automatic segmentation methods of multiple sclerosis white matter lesions on conventional magnetic resonance imaging. Med. Image Anal. 2013;17:1–18. doi: 10.1016/j.media.2012.09.004. [DOI] [PubMed] [Google Scholar]
- Golland P., Fischl B. IPMI. Springer; 2003. Permutation tests for classification: towards statistical significance in image-based studies; pp. 330–341. [DOI] [PubMed] [Google Scholar]
- Gray K.R., Aljabar P., Heckemann R.A., Hammers A., Rueckert D., Initiative A.D.N. Random forest-based similarity measures for multi-modal classification of Alzheimer's disease. NeuroImage. 2013;65:167–175. doi: 10.1016/j.neuroimage.2012.09.065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guyon I., Elisseeff A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003;3:1157–1182. [Google Scholar]
- Guzella T.S., Caminhas W.M. A review of machine learning approaches to spam filtering. Expert Syst. Appl. 2009;36:10206–10222. [Google Scholar]
- Hajek T., Cooke C., Kopecek M., Novak T., Hoschl C., Alda M. Using structural MRI to identify individuals at genetic risk for bipolar disorders: a 2-cohort, machine learning study. J. Psychiatry Neurosci. JPN. 2015;40:316. doi: 10.1503/jpn.140142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haller S., Nguyen D., Rodriguez C., Emch J., Gold G., Bartsch A., Lovblad K.O., Giannakopoulos P. Individual prediction of cognitive decline in mild cognitive impairment using support vector machine-based analysis of diffusion tensor imaging data. J. Alzheimers Dis. 2010;22:315–327. doi: 10.3233/JAD-2010-100840. [DOI] [PubMed] [Google Scholar]
- Haller S., Badoud S., Nguyen D., Garibotto V., Lovblad K.O., Burkhard P.R. Individual detection of patients with Parkinson disease using support vector machine analysis of diffusion tensor imaging data: initial results. Am. J. Neuroradiol. 2012;33:2123–2128. doi: 10.3174/ajnr.A3126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haller S., Badoud S., Nguyen D., Barnaure I., Montandon M.L., Lovblad K.O., Burkhard P.R. Differentiation between Parkinson disease and other forms of parkinsonism using support vector machine analysis of susceptibility-weighted imaging (SWI): initial results. Eur. Radiol. 2013;23:12–19. doi: 10.1007/s00330-012-2579-y. [DOI] [PubMed] [Google Scholar]
- Hastie T., Tibshirani R., Friedman J. 2nd edition. Springer; New York: 2009. The Elements of Statistical Learning. [Google Scholar]
- Haynes J.-D. A primer on pattern-based approaches to fMRI: principles, pitfalls, and perspectives. Neuron. 2015;87:257–270. doi: 10.1016/j.neuron.2015.05.025. [DOI] [PubMed] [Google Scholar]
- Hill D.L., Batchelor P.G., Holden M., Hawkes D.J. Medical image registration. Phys. Med. Biol. 2001;46:R1. doi: 10.1088/0031-9155/46/3/201. [DOI] [PubMed] [Google Scholar]
- Hope T.M.H., Seghier M.L., Leff A.P., Price C.J. Predicting outcome and recovery after stroke with lesions extracted from MRI images. NeuroImage Clin. 2013;2:424–433. doi: 10.1016/j.nicl.2013.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hughes G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory. 1968;14:55–63. [Google Scholar]
- Iannaccone R., Hauser T.U., Ball J., Brandeis D., Walitza S., Brem S. Classifying adolescent attention-deficit/hyperactivity disorder (ADHD) based on functional and structural imaging. Eur. Child Adolesc. Psychiatry. 2015;24:1279–1289. doi: 10.1007/s00787-015-0678-4. [DOI] [PubMed] [Google Scholar]
- Ingalhalikar M., Parker W.A., Bloy L., Roberts T.P., Verma R. Creating multimodal predictors using missing data: classifying and subtyping autism spectrum disorder. J. Neurosci. Methods. 2014;235:1–9. doi: 10.1016/j.jneumeth.2014.06.030. [DOI] [PubMed] [Google Scholar]
- Iturria-Medina Y. Anatomical brain networks on the prediction of abnormal brain states. Brain Connect. 2013;3:1–21. doi: 10.1089/brain.2012.0122. [DOI] [PubMed] [Google Scholar]
- Iturria-Medina Y., Canales-Rodriguez E.J., Melie-Garcia L., Valdes-Hernandez P.A., Martinez-Montes E., Alemán-Gómez Y., Sánchez-Bornot J.M. Characterizing brain anatomical connections using diffusion weighted MRI and graph theory. NeuroImage. 2007;36:645–660. doi: 10.1016/j.neuroimage.2007.02.012. [DOI] [PubMed] [Google Scholar]
- James G., Witten D., Hastie T., Tibshirani R. Springer; 2013. An Introduction to Statistical Learning. [Google Scholar]
- Jenkinson M., Beckmann C.F., Behrens T.E., Woolrich M.W., Smith S.M. Fsl. Neuroimage. 2012;62(2):782–790. doi: 10.1016/j.neuroimage.2011.09.015. [DOI] [PubMed] [Google Scholar]
- Johnston B.A., Mwangi B., Matthews K., Coghill D., Steele J.D. Predictive classification of individual magnetic resonance imaging scans from children and adolescents. Eur. Child Adolesc. Psychiatry. 2013;22:733–744. doi: 10.1007/s00787-012-0319-0. [DOI] [PubMed] [Google Scholar]
- Johnston B.A., Steele J.D., Tolomeo S., Christmas D., Matthews K. Structural MRI-based predictions in patients with treatment-refractory depression (TRD) PLoS One. 2015;10 doi: 10.1371/journal.pone.0132958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oliveira Jr, De M. P.P., Nitrini R., Busatto G., Buchpiguel C., Sato J.R., Amaro E., Jr. Use of SVM methods with surface-based cortical and volumetric subcortical measurements to detect Alzheimer's disease. J. Alzheimers Dis. 2010;19:1263–1272. doi: 10.3233/JAD-2010-1322. [DOI] [PubMed] [Google Scholar]
- Kanal L., Chandrasekaran B. On dimensionality and sample size in statistical pattern classification. Pattern Recogn. 1971;3:225–234. [Google Scholar]
- Klein A., Tourville J. 101 labeled brain images and a consistent human cortical labeling protocol. Front. Neurosci. 2012;6:171. doi: 10.3389/fnins.2012.00171. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klöppel S., Stonnington C.M., Chu C., Draganski B., Scahill R.I., Rohrer J.D., Fox N.C., Jack C.R., Ashburner J., Frackowiak R.S. Automatic classification of MR scans in Alzheimer's disease. Brain. 2008;131:681–689. doi: 10.1093/brain/awm319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kohavi R., John G.H. Wrappers for feature subset selection. Artif. Intell. Relevance. 1997;97:273–324. [Google Scholar]
- Kotsiantis S.B., Zaharakis I.D., Pintelas P.E. Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 2006;26:159–190. [Google Scholar]
- Kotsiantis S.B., Zaharakis I., Pintelas P. 2007. Supervised Machine Learning: A Review of Classification Techniques. [Google Scholar]
- Kuncheva L.I., Rodríguez J.J. Classifier ensembles for fMRI data analysis: an experiment. Magn. Reson. Imaging. 2010;28:583–593. doi: 10.1016/j.mri.2009.12.021. [DOI] [PubMed] [Google Scholar]
- Lacalle-Aurioles M., Navas-Sánchez F.J., Alemán-Gómez Y., Olazarán J., Guzmán-De-Villoria J.A., Cruz-Orduña I., Mateos-Pérez J.M., Desco M. The disconnection hypothesis in Alzheimer's disease studied through multimodal magnetic resonance imaging: structural, perfusion, and diffusion tensor imaging. J. Alzheimers Dis. 2016;50:1051–1064. doi: 10.3233/JAD-150288. [DOI] [PubMed] [Google Scholar]
- Lange N., Dubray M.B., Lee J.E., Froimowitz M.P., Froehlich A., Adluru N., Wright B., Ravichandran C., Fletcher P.T., Bigler E.D., Alexander A.L., Lainhart J.E. Atypical diffusion tensor hemispheric asymmetry in autism. Autism Res. 2010;3:350–358. doi: 10.1002/aur.162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lavagnino L., Amianto F., Mwangi B., D'Agata F., Spalatro A., Zunta-Soares G.B., Daga G.A., Mortara P., Fassino S., Soares J.C. Identifying neuroanatomical signatures of anorexia nervosa: a multivariate machine learning approach. Psychol. Med. 2015;45:2805–2812. doi: 10.1017/S0033291715000768. [DOI] [PubMed] [Google Scholar]
- Lecun Y., Bengio Y., Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
- Lemm S., Blankertz B., Dickhaus T., Müller K.-R. Introduction to machine learning for brain imaging. NeuroImage. 2011;56:387–399. doi: 10.1016/j.neuroimage.2010.11.004. [DOI] [PubMed] [Google Scholar]
- Lewis J.D., Theilmann R.J., Townsend J., Evans A.C. 2013. Network Efficiency in Autism Spectrum Disorder and its Relation to Brain Overgrowth. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li F., Tran L., Thung K.-H., Ji S., Shen D., Li J. International Workshop on Machine Learning in Medical Imaging. Springer; 2014. Robust deep learning for improved classification of AD/MCI patients; pp. 240–247. [Google Scholar]
- Li Z., Liu J., Yang Y., Zhou X., Lu H. Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans. Knowl. Data Eng. 2014;26:2138–2150. [Google Scholar]
- Libero L.E., Deramus T.P., Lahti A.C., Deshpande G., Kana R.K. Multimodal neuroimaging based classification of autism spectrum disorder using anatomical, neurochemical, and white matter correlates. Cortex. 2015;66:46–59. doi: 10.1016/j.cortex.2015.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lim L., Marquand A., Cubillo A.A., Smith A.B., Chantiluke K., Simmons A., Mehta M., Rubia K. Disorder-specific predictive classification of adolescents with attention deficit hyperactivity disorder (ADHD) relative to autism using structural magnetic resonance imaging. PLoS One. 2013;8 doi: 10.1371/journal.pone.0063660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu M., Zhang D., Shen D., Initiative A.D.N. Ensemble sparse classification of Alzheimer's disease. NeuroImage. 2012;60:1106–1116. doi: 10.1016/j.neuroimage.2012.01.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Siqi, Liu Sidong, Cai W., Pujol S., Kikinis R., Feng D. Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on. IEEE; 2014. Early diagnosis of Alzheimer's disease with deep learning; pp. 1015–1018. [Google Scholar]
- Lladó X., Oliver A., Cabezas M., Freixenet J., Vilanova J.C., Quiles A., Valls L., Ramió-Torrentà L., Rovira À. Segmentation of multiple sclerosis lesions in brain MRI: a review of automated approaches. Inf. Sci. 2012;186:164–185. [Google Scholar]
- Lo A., Chernoff H., Zheng T., Lo S.-H. Why significant variables aren't automatically good predictors. Proc. Natl. Acad. Sci. 2015;112:13892–13897. doi: 10.1073/pnas.1518285112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lui Y.W., Xue Y., Kenul D., Ge Y., Grossman R.I., Wang Y. Classification algorithms using multiple MRI features in mild traumatic brain injury. Neurology. 2014;83:1235–1240. doi: 10.1212/WNL.0000000000000834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maintz J.B.A., Viergever M.A. A survey of medical image registration. Med. Image Anal. 1998;2:1–36. doi: 10.1016/s1361-8415(01)80026-8. [DOI] [PubMed] [Google Scholar]
- Mandl R.C., Schnack H.G., Zwiers M.P., van der Schaaf A., Kahn R.S., Pol H.E.H. Functional diffusion tensor imaging: measuring task-related fractional anisotropy changes in the human brain along white matter tracts. PLoS One. 2008;3 doi: 10.1371/journal.pone.0003631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manjón J.V., Coupé P., Martí-Bonmatí L., Collins D.L., Robles M. Adaptive non-local means denoising of MR images with spatially varying noise levels. J. Magn. Reson. Imaging. 2010;31:192–203. doi: 10.1002/jmri.22003. [DOI] [PubMed] [Google Scholar]
- Marcus D.S., Wang T.H., Parker J., Csernansky J.G., Morris J.C., Buckner R.L. Open access series of imaging studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 2007;19:1498–1507. doi: 10.1162/jocn.2007.19.9.1498. [DOI] [PubMed] [Google Scholar]
- Marquand A.F., Filippone M., Ashburner J., Girolami M., Mourao-Miranda J., Barker G.J., Williams S.C., Leigh P.N., Blain C.R. Automated, high accuracy classification of parkinsonian disorders: a pattern recognition approach. PLoS One. 2013;8 doi: 10.1371/journal.pone.0069237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKhann G.M., Knopman D.S., Chertkow H., Hyman B.T., Jack C.R., Kawas C.H., Klunk W.E., Koroshetz W.J., Manly J.J., Mayeux R. The diagnosis of dementia due to Alzheimer's disease: recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease. Alzheimers Dement. 2011;7:263–269. doi: 10.1016/j.jalz.2011.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKnight L.K., Wilcox A., Hripcsak G. American Medical Informatics Association; 2002. The Effect of Sample Size and Disease Prevalence on Supervised Machine Learning of Narrative Data., in: Proceedings of the AMIA Symposium; p. 519. [PMC free article] [PubMed] [Google Scholar]
- Miller D.H., Chard D.T., Ciccarelli O. Clinically isolated syndromes. Lancet Neurol. 2012;11:157–169. doi: 10.1016/S1474-4422(11)70274-5. [DOI] [PubMed] [Google Scholar]
- Moradi E., Pepe A., Gaser C., Huttunen H., Tohka J., Initiative A.D.N. Machine learning framework for early MRI-based Alzheimer's conversion prediction in MCI subjects. NeuroImage. 2015;104:398–412. doi: 10.1016/j.neuroimage.2014.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nir T.M., Villalon-Reina J.E., Prasad G., Jahanshad N., Joshi S.H., Toga A.W., Bernstein M.A., Jack C.R., Weiner M.W., Thompson P.M. Diffusion weighted imaging-based maximum density path analysis and classification of Alzheimer's disease. Neurobiol. Aging. 2015;36:S132–S140. doi: 10.1016/j.neurobiolaging.2014.05.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ojala M., Garriga G.C. Permutation tests for studying classifier performance. J. Mach. Learn. Res. 2010;11:1833–1863. [Google Scholar]
- Park D.H., Kim H.K., Choi I.Y., Kim J.K. A literature review and classification of recommender systems research. Expert Syst. Appl. 2012;39:10059–10072. [Google Scholar]
- Payan A., Montana G. Predicting Alzheimer's disease: a neuroimaging study with 3D convolutional neural networks. ArXiv Prepr. 2015 (ArXiv150202506, CoRR, abs/1502.02506) [Google Scholar]
- Peng X., Lin P., Zhang T., Wang J. Extreme learning machine-based classification of ADHD using brain structural MRI data. PLoS One. 2013;8 doi: 10.1371/journal.pone.0079476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Penny W.D., Friston K.J., Ashburner J.T., Kiebel S.J., Nichols T.E. Academic Press; 2011. Statistical Parametric Mapping: The Analysis of Functional Brain Images. [Google Scholar]
- Pereira F., Mitchell T., Botvinick M. Machine learning classifiers and fMRI: a tutorial overview. NeuroImage. 2009;45:S199–S209. doi: 10.1016/j.neuroimage.2008.11.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plant C., Teipel S.J., Oswald A., Böhm C., Meindl T., Mourao-Miranda J., Bokde A.W., Hampel H., Ewers M. Automated detection of brain atrophy patterns based on MRI for the prediction of Alzheimer's disease. NeuroImage. 2010;50:162–174. doi: 10.1016/j.neuroimage.2009.11.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plitt M., Barnes K.A., Martin A. Functional connectivity classification of autism identifies highly predictive brain features but falls short of biomarker standards. NeuroImage Clin. 2015;7:359–366. doi: 10.1016/j.nicl.2014.12.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Power J.D., Mitra A., Laumann T.O., Snyder A.Z., Schlaggar B.L., Petersen S.E. Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage. 2014;84:320–341. doi: 10.1016/j.neuroimage.2013.08.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qin J., Wei M., Liu H., Chen J., Yan R., Hua L., Zhao K., Yao Z., Lu Q. Abnormal hubs of white matter networks in the frontal-parieto circuit contribute to depression discrimination via pattern classification. Magn. Reson. Imaging. 2014;32:1314–1320. doi: 10.1016/j.mri.2014.08.037. [DOI] [PubMed] [Google Scholar]
- Rakotomamonjy A. Variable selection using SVM-based criteria. J. Mach. Learn. Res. 2003;3:1357–1370. [Google Scholar]
- Raudys S.J., Jain A.K. Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans. Pattern Anal. Mach. Intell. 1991;13:252–264. [Google Scholar]
- Rehme A.K., Volz L.J., Feis D.-L., Bomilcar-Focke I., Liebig T., Eickhoff S.B., Fink G.R., Grefkes C. Identifying neuroimaging markers of motor disability in acute stroke by machine learning techniques. Cereb. Cortex. 2014;25:3046–3056. doi: 10.1093/cercor/bhu100. [DOI] [PubMed] [Google Scholar]
- Rondina J.M., Filippone M., Girolami M., Ward N.S. Google Scholar. 2016. Decoding post-stroke motor function from structural. [WWW Document]. URL https://scholar.google.ca/scholar?hl=en&as_sdt=0%2C5&q=Decoding+post-stroke+motor+function+from+structural+brain+imaging&btnG= (accessed 1.21.18) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rubinov M., Sporns O. Complex network measures of brain connectivity: uses and interpretations. NeuroImage. 2010;52:1059–1069. doi: 10.1016/j.neuroimage.2009.10.003. [DOI] [PubMed] [Google Scholar]
- Rudin C., Waltz D., Anderson R.N., Boulanger A., Salleb-Aouissi A., Chow M., Dutta H., Gross P.N., Huang B., Ierome S. Machine learning for the New York City power grid. IEEE Trans. Pattern Anal. Mach. Intell. 2012;34:328–345. doi: 10.1109/TPAMI.2011.108. [DOI] [PubMed] [Google Scholar]
- Sabuncu M.R., Konukoglu E., Initiative A.D.N. Clinical prediction from structural brain MRI scans: a large-scale empirical study. Neuroinformatics. 2015;13:31–46. doi: 10.1007/s12021-014-9238-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sacchet M.D., Prasad G., Foland-Ross L.C., Thompson P.M., Gotlib I.H. Support vector machine classification of major depressive disorder using diffusion-weighted neuroimaging and graph theory. Front. Psychiatry. 2015:6. doi: 10.3389/fpsyt.2015.00021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Salvatore C., Cerasa A., Castiglioni I., Gallivanone F., Augimeri A., Lopez M., Arabia G., Morelli M., Gilardi M.C., Quattrone A. Machine learning on brain MRI data for differential diagnosis of Parkinson's disease and progressive Supranuclear palsy. J. Neurosci. Methods. 2014;222:230–237. doi: 10.1016/j.jneumeth.2013.11.016. [DOI] [PubMed] [Google Scholar]
- Salvatore C., Battista P., Castiglioni I. Frontiers for the early diagnosis of AD by means of MRI brain imaging and support vector machines. Curr. Alzheimer Res. 2016;13:509–533. doi: 10.2174/1567205013666151116141705. [DOI] [PubMed] [Google Scholar]
- Sato J.R., Hoexter M.Q., de Magalhães Oliveira P.P., Brammer M.J., Murphy D., Ecker C., Consortium M.A. Inter-regional cortical thickness correlations are associated with autistic symptoms: a machine-learning approach. J. Psychiatr. Res. 2013;47:453–459. doi: 10.1016/j.jpsychires.2012.11.017. [DOI] [PubMed] [Google Scholar]
- Schmitter D., Roche A., Maréchal B., Ribes D., Abdulkadir A., Bach-Cuadra M., Daducci A., Granziera C., Klöppel S., Maeder P. An evaluation of volume-based morphometry for prediction of mild cognitive impairment and Alzheimer's disease. NeuroImage Clin. 2015;7:7–17. doi: 10.1016/j.nicl.2014.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schrouff J. University of Liege; Liege, Belgium: 2013. Pattern Recognition in NeuroImaging: What can Machine Learning Classifiers Bring to the Analysis of Functional Brain Imaging? [Google Scholar]
- Schulz N.J., Klockgether T., Petersen D., Jauch M., Müller-Schauenburg W., Spieker S., Voigt K., Dichgans J. Multiple system atrophy: natural history, MRI morphology, and dopamine receptor imaging with 123IBZM-SPECT. J. Neurol. Neurosurg. Psychiatry. 1994;57:1047–1056. doi: 10.1136/jnnp.57.9.1047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skidmore F.M., Spetsieris P.G., Anthony T., Cutter G.R., von Deneen K.M., Liu Y., White K.D., Heilman K.M., Myers J., Standaert D.G. A full-brain, bootstrapped analysis of diffusion tensor imaging robustly differentiates Parkinson disease from healthy controls. Neuroinformatics. 2015;13:7–18. doi: 10.1007/s12021-014-9222-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sled J.G., Zijdenbos A.P., Evans A.C. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. Med. Imaging IEEE Trans. On. 1998;17:87–97. doi: 10.1109/42.668698. [DOI] [PubMed] [Google Scholar]
- Smith S.M., Jenkinson M., Woolrich M.W., Beckmann C.F., Behrens T.E., Johansen-Berg H., Bannister P.R., De Luca M., Drobnjak I., Flitney D.E. Advances in functional and structural MR image analysis and implementation as FSL. NeuroImage. 2004;23:S208–S219. doi: 10.1016/j.neuroimage.2004.07.051. [DOI] [PubMed] [Google Scholar]
- Song S., Zhan Z., Long Z., Zhang J., Yao L. Comparative study of SVM methods combined with voxel selection for object category classification on fMRI data. PLoS One. 2011;6 doi: 10.1371/journal.pone.0017191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sørensen L., Igel C., Pai A., Balas I., Anker C., Lillholm M., Nielsen M. Differential diagnosis of mild cognitive impairment and Alzheimer's disease using structural MRI cortical thickness, hippocampal shape, hippocampal texture, and volumetry. NeuroImage Clin. 2017;13:470–482. doi: 10.1016/j.nicl.2016.11.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stonnington C.M., Chu C., Klöppel S., Jack C.R., Ashburner J., Frackowiak R.S., Initiative A.D.N. Predicting clinical scores from magnetic resonance scans in Alzheimer's disease. NeuroImage. 2010;51:1405–1413. doi: 10.1016/j.neuroimage.2010.03.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suk H.-I., Shen D. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2013. Deep learning-based feature representation for AD/MCI classification; pp. 583–590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suk H.-I., Lee S.-W., Shen D., Initiative A.D.N. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage. 2014;101:569–582. doi: 10.1016/j.neuroimage.2014.06.077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suk H.-I., Lee S.-W., Shen D., Initiative A.D.N. Latent feature representation with stacked auto-encoder for AD/MCI diagnosis. Brain Struct. Funct. 2015;220:841–859. doi: 10.1007/s00429-013-0687-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tan M., Wang L., Tsang I.W. Proceedings of the 27th International Conference on Machine Learning (ICML-10) 2010. Learning sparse svm for feature selection on very high dimensional datasets; pp. 1047–1054. [Google Scholar]
- Tustison N.J., Avants B.B., Cook P.A., Zheng Y., Egan A., Yushkevich P.A., Gee J.C. N4ITK: improved N3 bias correction. IEEE Trans. Med. Imaging. 2010;29:1310–1320. doi: 10.1109/TMI.2010.2046908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tzourio-Mazoyer N., Landeau B., Papathanassiou D., Crivello F., Etard O., Delcroix N., Mazoyer B., Joliot M. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage. 2002;15:273–289. doi: 10.1006/nimg.2001.0978. [DOI] [PubMed] [Google Scholar]
- Uddin L.Q., Menon V., Young C.B., Ryali S., Chen T., Khouzam A., Minshew N.J., Hardan A.Y. Multivariate searchlight classification of structural magnetic resonance imaging in children and adolescents with autism. Biol. Psychiatry. 2011;70:833–841. doi: 10.1016/j.biopsych.2011.07.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Varoquaux G., Raamana P.R., Engemann D.A., Hoyos-Idrobo A., Schwartz Y., Thirion B. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines. NeuroImage. 2017;145:166–179. doi: 10.1016/j.neuroimage.2016.10.038. [DOI] [PubMed] [Google Scholar]
- Vemuri P., Gunter J.L., Senjem M.L., Whitwell J.L., Kantarci K., Knopman D.S., Boeve B.F., Petersen R.C., Jack C.R. Alzheimer's disease diagnosis in individual subjects using structural MR images: validation studies. NeuroImage. 2008;39:1186–1197. doi: 10.1016/j.neuroimage.2007.09.073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Veronese E., Castellani U., Peruzzo D., Bellani M., Brambilla P. Machine learning approaches: from theory to application in schizophrenia. Comput. Math. Methods Med. 2013;2013 doi: 10.1155/2013/867924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vincent P., Larochelle H., Bengio Y., Manzagol P.-A. Proceedings of the 25th International Conference on Machine Learning. ACM; 2008. Extracting and composing robust features with denoising autoencoders; pp. 1096–1103. [Google Scholar]
- Vovk U., Pernus F., Likar B. A review of methods for correction of intensity inhomogeneity in MRI. IEEE Trans. Med. Imaging. 2007;26:405–421. doi: 10.1109/TMI.2006.891486. [DOI] [PubMed] [Google Scholar]
- Wang Y., Fan Y., Bhatt P., Davatzikos C. High-dimensional pattern regression using machine learning: from medical images to continuous clinical variables. NeuroImage. 2010;50:1519–1535. doi: 10.1016/j.neuroimage.2009.12.092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wee C.-Y., Yap P.-T., Li W., Denny K., Browndyke J.N., Potter G.G., Welsh-Bohmer K.A., Wang L., Shen D. Enriched white matter connectivity networks for accurate identification of MCI patients. NeuroImage. 2011;54:1812–1822. doi: 10.1016/j.neuroimage.2010.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wee C.-Y., Yap P.-T., Zhang D., Denny K., Browndyke J.N., Potter G.G., Welsh-Bohmer K.A., Wang L., Shen D. Identification of MCI individuals using structural and functional connectivity networks. NeuroImage. 2012;59:2045–2056. doi: 10.1016/j.neuroimage.2011.10.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wee C.-Y., Wang L., Shi F., Yap P.-T., Shen D. Diagnosis of autism spectrum disorders using regional and interregional morphological features. Hum. Brain Mapp. 2014;35:3414–3430. doi: 10.1002/hbm.22411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wenning G.K., Shlomo Y.B., Magalhaes M., Danie S.E., Quinn N.P. Clinical features and natural history of multiple system atrophy. Brain. 1994;117:835–845. doi: 10.1093/brain/117.4.835. [DOI] [PubMed] [Google Scholar]
- Westman E., Simmons A., Zhang Y., Muehlboeck J.-S., Tunnard C., Liu Y., Collins L., Evans A., Mecocci P., Vellas B. Multivariate analysis of MRI data for Alzheimer's disease, mild cognitive impairment and healthy controls. NeuroImage. 2011;54:1178–1187. doi: 10.1016/j.neuroimage.2010.08.044. [DOI] [PubMed] [Google Scholar]
- Westman E., Muehlboeck J.-S., Simmons A. Combining MRI and CSF measures for classification of Alzheimer's disease and prediction of mild cognitive impairment conversion. NeuroImage. 2012;62:229–238. doi: 10.1016/j.neuroimage.2012.04.056. [DOI] [PubMed] [Google Scholar]
- Weygandt M., Hackmack K., Pfüller C., Bellmann-Strobl J., Paul F., Zipp F., Haynes J.-D. MRI pattern recognition in multiple sclerosis normal-appearing brain areas. PLoS One. 2011;6 doi: 10.1371/journal.pone.0021138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weygandt M., Hummel H.-M., Schregel K., Ritter K., Allefeld C., Dommes E., Huppke P., Haynes J., Wuerfel J., Gärtner J. MRI-based diagnostic biomarkers for early onset pediatric multiple sclerosis. NeuroImage Clin. 2015;7:400–408. doi: 10.1016/j.nicl.2014.06.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wink A.M., Roerdink J.B. Denoising functional MR images: a comparison of wavelet denoising and Gaussian smoothing. IEEE Trans. Med. Imaging. 2004;23:374–387. doi: 10.1109/TMI.2004.824234. [DOI] [PubMed] [Google Scholar]
- Wottschel V., Alexander D.C., Kwok P.P., Chard D.T., Stromillo M.L., De Stefano N., Thompson A.J., Miller D.H., Ciccarelli O. Predicting outcome in clinically isolated syndrome using machine learning. NeuroImage Clin. 2015;7:281–287. doi: 10.1016/j.nicl.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu L., Wu X., Chen K., Yao L. Multi-modality sparse representation-based classification for Alzheimer's disease and mild cognitive impairment. Comput. Methods Prog. Biomed. 2015;122:182–190. doi: 10.1016/j.cmpb.2015.08.004. [DOI] [PubMed] [Google Scholar]
- Young J., Modat M., Cardoso M.J., Mendelson A., Cash D., Ourselin S., Initiative A.D.N. Accurate multimodal probabilistic prediction of conversion to Alzheimer's disease in patients with mild cognitive impairment. NeuroImage Clin. 2013;2:735–745. doi: 10.1016/j.nicl.2013.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zarogianni E., Moorhead T.W., Lawrie S.M. Towards the identification of imaging biomarkers in schizophrenia, using multivariate pattern classification at a single-subject level. NeuroImage Clin. 2013;3:279–289. doi: 10.1016/j.nicl.2013.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeighami Y., Ulla M., Iturria-Medina Y., Dadar M., Zhang Y., Larcher K.M.-H., Fonov V., Evans A.C., Collins D.L., Dagher A. Network structure of brain atrophy in de novo Parkinson's disease. elife. 2015;4 doi: 10.7554/eLife.08440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang D., Wang Y., Zhou L., Yuan H., Shen D., Initiative A.D.N. Multimodal classification of Alzheimer's disease and mild cognitive impairment. NeuroImage. 2011;55:856–867. doi: 10.1016/j.neuroimage.2011.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Y., Yu F., Duong T. Multiparametric MRI characterization and prediction in autism spectrum disorder using graph theory and machine learning. PLoS One. 2014;9 doi: 10.1371/journal.pone.0090405. [DOI] [PMC free article] [PubMed] [Google Scholar]



