Abstract
Accurate diagnosis of Alzheimer’s disease and its prodromal stage, i.e., mild cognitive impairment, is very important for early treatment. Over the last decade, various machine learning methods have been proposed to predict disease status and clinical scores from brain images. It is worth noting that many features extracted from brain images are correlated significantly. In this case, feature selection combined with the additional correlation information among features can effectively improve classification/regression performance. Typically, the correlation information among features can be modeled by the connectivity of an undirected graph, where each node represents one feature and each edge indicates that the two involved features are correlated significantly. In this paper, we propose a new graph-guided multi-task learning method incorporating this undirected graph information to predict multiple response variables (i.e., class label and clinical scores) jointly. Specifically, based on the sparse undirected feature graph, we utilize a new latent group Lasso penalty to encourage the correlated features to be selected together. Furthermore, this new penalty also encourages the intrinsic correlated tasks to share a common feature subset. To validate our method, we have performed many numerical studies using simulated datasets and the Alzheimer’s Disease Neuroimaging Initiative dataset. Compared with the other methods, our proposed method has very promising performance.
Keywords: Alzheimer’s disease, Group Lasso, Magnetic resonance imaging (MRI), Multi-task learning, Partial correlation, Positron emission tomography (PET), Undirected graph
Introduction
Alzheimer’s disease (AD) is one of the most common forms of dementia characterized by progressive cognitive and memory deficits. It has been reported that one in every 85 persons in year 2050 will be likely affected by this disease (Brookmeyer et al. 2007). The increasing incidence of AD makes this disease a very important health issue and also huge financial burden for both patients and governments (Hebert et al. 2001; Bain et al. 2008). Thus, it is very important to develop methods for timely diagnosis of AD and its predromal stage, i.e., mild cognitive impairment (MCI). Over the last decade, many machine learning methods have been used for early diagnosis of AD and MCI based on different modalities of biomarkers, e.g., structural brain atrophy delineated by structural magnetic resonance imaging (MRI) (Du et al. 2007; McEvoy et al. 2009; Fjell et al. 2010; Yu et al. 2014), metabolic alterations characterized by fluorodeoxyglucose positron emission tomography (FDG-PET) (De Santi et al. 2001; Morris et al. 2001), and pathological amyloid depositions measured by cerebrospinal fluid (CSF) (Bouwman et al. 2007; Fjell et al. 2010). Typically, these methods learn a binary classification model from training data and use this model to predict disease status (i.e., class label) of the testing subjects.
Besides classification of disease status, accurate prediction of clinical scores such as Mini-Mental State Examination (MMSE) score and Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) is also important and useful since they can help evaluate the stage of AD pathology and predict future progression. Specifically, as a brief 30-point questionnaire test, MMSE is commonly used to screen for cognitive impairment. It can be used to examine a patient’s arithmetic, memory and orientation (Folstein et al. 1975). As another important clinical score of AD, ADAS-Cog is a cognitive testing instrument widely used in clinical trials. It is designed to measure the severity of the most important symptoms of AD (Rosen et al. 1984). Several studies based on regression methods have been conducted to estimate MMSE and ADAS-Cog using the extracted features from MRI and FDG-PET. For example, Duchesne et al. (2005) used linear regression models, Wang et al. (2010) developed a high-dimensional kernel-based regression method, and Cheng et al. (2013) proposed a semi-supervised multi-modal relevance vector regression method. However, almost all of these regression methods model different clinical scores separately and do not use the class label information which is often available in practice.
Although the classification of disease status and the prediction of clinical scores are different tasks, there exists inherent correlation among them since the underlying pathology is the same (Fan et al. 2010; Stonnington et al. 2010). In the literature, Zhang and Shen (2012) proposed multi-modal multi-task (M3T) learning to predict both class label and clinical scores jointly. M3T formulated the estimations of class label and clinical scores as different tasks. The l2,1 penalty was used to deliver sparse models with a common feature subset for each task. Their experimental results indicated that selecting a common feature subset for different correlated tasks could achieve better prediction of both class label and clinical scores than choosing the feature subset for each task separately. Although benefiting from using the commonality among different correlated tasks, M3T method did not incorporate the correlation information among features. Actually, many features extracted from brain images such as structural MRI are statistically correlated significantly. In this case, feature selection combined with the additional correlation information among features can improve classification/regression performance (Yang et al. 2012).
In this paper, we extract effective correlation information among features by constructing a sparse undirected feature graph. This undirected graph uses all features as nodes. Also, two features are connected by an edge in the graph if there is statistically significant partial correlation between them. In practice, we can use many existing high-dimensional precision matrix estimation methods (Friedman et al. 2008; Cai et al. 2011) to construct this undirected graph. Based on this undirected feature graph, we propose a new graph-guided multi-task learning (GGML) method to predict both class label and clinical scores simultaneously. Specifically, we utilize a new latent group Lasso penalty to encourage the significantly correlated features to be in or out of the models together. This new penalty also encourages the intrinsic correlated tasks to share a common feature subset. It is very useful for us to acquire robust and accurate feature selection. Computationally, the optimization problem for our proposed GGML method can be solved by the traditional group Lasso algorithm efficiently (Yuan and Lin 2006). Theoretically, our proposed GGML method includes M3T method as a special case. To validate our proposed GGML method, we have conducted many numerical studies using simulated datasets and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (http://www.loni.ucla.edu/ADNI) dataset. Compared with the other methods, our proposed GGML method acquired very promising results.
The remainder of this paper is organized as follows. In the “Materials” section, we introduce the ADNI dataset used in this study. In the “Method” section, we show how to extract useful correlation information among features and describe our proposed new method. In “Simulation study” and “Analysis of the ADNI dataset” sections, we compare our method with the other methods by simulation study and also the analysis of the ADNI dataset. In the “Discussion” section, we discuss some possible extensions of our proposed method. Finally, we conclude this paper in the “Conclusion” section.
Materials
Data
Data used in this paper were obtained from the ADNI database. As a $ 60 million, 5-year public–private partnership, the ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations. The main goal of ADNI was to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessments could be combined to measure the progression of MCI and early AD. To that end, 800 adults with age between 55 and 90 were recruited from over 50 sites across the US and Canada. Approximately, 200 cognitively normal controls (NC) and 400 MCI individuals were followed for 3 years and 200 individuals with early AD were followed for 2 years (see http://www.adni-info.org for up-to-date information). The general inclusion/exclusion criteria of the subjects are described in Zhang and Shen (2012). In this paper, we use data from 199 subjects who have complete baseline MRI, FDG-PET, and CSF data. These 199 subjects include 50 AD subjects, 97 MCI subjects, and 52 NC subjects. The detailed demographic information about these 199 subjects is summarized in Table 1.
Table 1.
Demographic information of the 199 subjects used in this study
| Characteristics | AD (50 subjects) | MCI (97 subjects) | NC (52 subjects) |
|---|---|---|---|
| Gender (F/M) | 17/33 | 32/65 | 18/34 |
| Age (mean ± SD) | 75.2 ± 7.6 | 75.3 ± 7.0 | 75.1 ± 5.1 |
| Education (mean ± SD) | 14.7 ± 3.7 | 15.9 ± 2.9 | 15.8 ± 3.2 |
| MMSE (mean ± SD) | 23.7 ± 1.9 | 27.1 ± 1.7 | 29.0 ± 1.2 |
| ADAS (mean ± SD) | 18.5 ± 5.9 | 11.4 ± 4.4 | 7.36 ± 3.2 |
Data preprocessing
Imaging preprocessing was performed for MRI and PET. For MRI, the preprocessing steps include anterior commissure (AC)–posterior commissure (PC) correction, intensity inhomogeneity correction (Sled et al. 1998), skull stripping (Wang et al. 2011), cerebellum removal based on registration with atlas, spatial segmentation (Zhang et al. 2001) and registration (Shen and Davatzikos 2002). After registration, we obtained the subject-labeled image based on the Jacob template (Kabani et al. 1998) with 93 manually labeled regions of interest (ROI). For each of the 93 ROIs in the labeled MRI, we computed the volume of gray matter as a feature. For each PET image, we first aligned the PET image to its respective MRI using affine registration. Then, we got the skull-stripping image using the corresponding brain mask of MRI and computed the average intensity of every ROI in the PET image as a feature. Besides MRI and PET, we used CSF Aβ42, CSF t-tau and CSF p-tau as CSF features. For each subject, we finally obtained 93 MRI features, 93 PET features, and 3 CSF features. We also had the class label, MMSE and ADAS-Cog scores for each subject.
Methods
In this section, after introducing some notations, we will first discuss how to extract the correlation information among features. Next, in order to show how to utilize this correlation information clearly, we first introduce the graph-guided single-task learning (GGSL) method. Then, as an extension of this method, our proposed graph-guided multi-task learning method will be described.
Notation
For a set , we denote as the number of elements in . For a matrix B, we denote BT and B−1 as the transpose and the inverse of matrix B, respectively. We also denote as the Frobenius norm.
Suppose we have n samples and p features. Let X = (X1, X2, …,Xp) = (x1, x2, …,xn)T denote the n × p training data matrix of features, where x1, x2, …, xn are i.i.d. samples generated from a p-dimensional multivariate distribution with mean vector 0p×1 and covariance matrix . Also, let denote the precision matrix. Furthermore, suppose we have q response variables. Let Y = (Y1, Y2, …, Yq) = (y1, y2, …, yn)T denote the n × q training data matrix of response variables, where the response variables can be binary (for classification) or continuous (for regression). Note that, for the ADNI dataset used in our study, we have three response variables, which are class label, MMSE score, and ADASCog score. The class labels are coded as +1 and −1 for the binary classification problem considered in this paper.
Extract the correlation information among features
The correlation information is often measured by the Pearson correlation between each pair of features. We can use sample Pearson correlation coefficients to identify the statistically significant correlated features. One issue with this method is that it only estimates the marginal linear dependence between a pair of features without considering the influence of other features and common driving influences. Such issue can be overcome by using partial correlation which measures the linear dependence between each pair of features after eliminating the linear effect of the other features. In practice, we can compute the sample partial correlation coefficient between features i and j, denoted as , which is defined as the sample Pearson correlation coefficient between the residuals Ri and Rj resulting from the linear regression of the feature Xi with features {Xk : k ≠ i, j} and of the feature Xj with features {Xk : k ≠ i, j}, respectively. The resulting can be further used to identify features which are partially correlated statistically significantly.
When the number of features p is small and the sample size n is big enough (bigger than p), it is easy to get good estimates of partial correlation coefficients. In this case, many previous studies (Hampson et al. 2002; Lee et al. 2011) have used partial correlations to identify the significant correlated features. However, in the high-dimensional case with the number of features p bigger than the sample size n, the conventional methods for estimating partial correlation may result in over-fitting of the data (Ryali et al. 2012). In this case, it is difficult to get accurate estimates of partial correlation coefficients.
For our proposed method introduced in the next section, in order to incorporate the correlation information among features, instead of requiring accurate estimation of , we only need to estimate which pairs of features are partially correlated, i.e., estimate the set . It is well known that the partial correlation coefficients are proportional to the off-diagonal entries of the precision matrix Ω (Meinshausen and Bühlmann 2006). Thus, estimating is equivalent to estimating the set {(i,j) : i<j and ɷij ≠ 0}. In this way, many existing methods (Meinshausen and Bühlmann 2006; Friedman et al. 2008; Cai et al. 2011) can be used to estimate effectively.
In this paper, we will use the graphical Lasso (Friedman et al. 2008) or the neighborhood selection method (Meinshausen and Bühlmann 2006) to estimate and denote its estimate as . Furthermore, we represent as an undirected graph G with p nodes and edges, where each node represents one feature and each edge indicates that two involved features are partially correlated significantly. Figure 1 shows an example on how to transform the estimated precision matrix into the estimated undirected graph G. In the graph G, features i and j are connected if and only if .
Fig. 1.

Transforming the precision matrix (left) into the undirected graph G (right). Features i and j are connected if and only if
Graph-guided single-task learning (GGSL) method
In this section, we assume that the undirected feature graph G has been constructed. For each i = 1, 2, …, p, denote as the set including the ith feature and its neighbors in the graph G, i.e., .
To show how to use the correlation information represented by G, we consider the single-task learning first and then generalize this idea to multi-task learning. Without loss of generality, considering the tth task, we want to use the following linear model to predict the response variable Yt,
| (1) |
where Bt = (b1t, b2t, …, bpt)T ϵ Rp is the coefficient vector of interest and ϵt = (ϵ1t, ϵ2t, …, ϵnt) ϵ Rn is the error vector with E(ϵst) = 0 and for each 1 ≤ s ≤ n.
Suppose the feature matrix X is independent of the error vector ϵt. Denote Ct as the marginal correlation vector between p features and the response variable Yt, i.e., Ct = E(XTYt/n) = (c1t, c2t, …, cpt) ϵ Rp. Then by (1), we have
| (2) |
Thus, the true coefficient vector Bt can be represented as
| (3) |
where Ω shows the partial correlations among different features, and Ct reflects the marginal correlations between the features and the tth response variable Yt.
Furthermore, the Eq. (3) can be expanded as follows:
| (4) |
We observe that the coefficients vector Bt = (b1t, b2t, …, bpt)T is the sum of p parts, where the ith part, (ɷ1icit, ɷ2icit, …, ɷpicit)T, is the ith vertical part in the right side of the above equations (4). In addition, for each i, if there is no marginal correlation between the ith feature and the response variable Yt, i.e., cit = 0, then the components in the ith part (ɷ1icit, ɷ2icit, …, ɷpicit)T will be zero simultaneously due to the common factor cit. Furthermore, if the ith feature and the response variable Yt are correlated marginally, then cit ≠ 0 and the set of candidate nonzero components in the ith part is {j : ɷji ≠ 0}, which can be well estimated by the set including the ith feature and its neighbors in the estimated undirected graph G.
Motivated by the decompositions shown in Eq. (4), we assume that there is a latent decomposition of the coefficients vector Bt into p parts, V1t, …, Vit, …, Vpt, where Vit is a p-dimensional latent vector representing the ith vertical part in the right side of Eq. (4). In order to incorporate the correlation information represented by the undirected graph G, a group penalty term will be used to encourage the ith latent vector Vit to be zero or have nonzero components only for the indices in the set . Hence, we use the following (GGSL method to estimate Bt:
| (5) |
subject to and for each 1≤i≤p, where supp(Vit) is the index set of the nonzero components in the vector Vit.
In the optimization problem (5), τit is a positive weight for the ith part and tth task. Similar to the methods for adaptive Lasso (Zou 2006) and group Lasso (Yuan and Lin 2006), we can set where γ is a positive parameter and is an initial estimate of bit. In our experiments, we chose as the sample correlation coefficient between Xi and Yt. Both the positive parameter γ and the tuning parameter λ were chosen by cross-validation. Our experimental results indicate that this method can acquire good performance in general.
Theoretically, the GGSL method is very general and covers the popular Lasso method as a special case. Specifically, if we ignore the correlation information among features, we can set the undirected graph G as an empty graph with no edge. In this case, if setting constant weights τits, we can show that , and the GGSL method is the same as the Lasso method (Tibshirani 1996). In general, we can estimate a sparse undirected graph G for modeling the significant partial correlation information among features. The GGSL method can utilize this correlation information effectively and thus acquires good prediction performance.
Graph-guided multi-task learning (GGML) method
For the multi-task learning, we aim at estimating q response variables simultaneously. Similar to the above GGSL method, for each t, we assume that the coefficient vector Bt can be decomposed as , where each Vit is a p-dimensional latent vector satisfying . Furthermore, in order to make use of the intrinsic correlation among these q tasks (response variables), we also assume that the decompositions of q coefficient vectors B1, B2, …, Bq have the same pattern, i.e., supp(Vi1) = supp(Vi2) = ⋯ = supp(Viq) for each 1≤i≤p. That is, for each i = 1, 2, …,p, we assume that, if both the ith feature and its partially correlated features are useful for prediction of one response variable, they are also useful for prediction of the other response variables.
Based on the above assumption, denote B = (B1, B2, …, Bq) ϵ Rp×q and Vi = (Vi1, Vi2, …, Viq) ϵ Rp×q for each 1≤i≤p, we generalize the GGSL method to the following GGML method:
| (6) |
subject to and for each 1≤i≤p, where is the jth row of the matrix Vi.
Similar to the GGSL method discussed in “Graph-guided single-task learning (GGSL) method” section, we can set the weight . The cross-validation method can be used to choose the best γ and the best tuning parameter λ for different tasks separately. Note that the penalty term in (6) along with the additional constraints not only encourage the significantly partially correlated features to be in or out of the model jointly, but also choose a common feature subset for different tasks. Due to the use of both the correlation information among features and the intrinsic commonality among different related tasks, our proposed GGML method can acquire better prediction performance than the methods not using or only using part of these two kinds of information.
As an interesting remark, we note that the M3T method (Zhang and Shen 2012) is a special case of our proposed GGML method. In particular, when we ignore the correlation information among features, we can set the undirected graph G as an empty graph with no edge. In this case, if setting constant weights τis, we can show that , where Bi is the ith row of the coefficient matrix B. Thus, our proposed GGML method is exactly the same as the M3T method using the l2,1 penalty.
Objective function optimization
For our proposed GGML method, we need to solve the optimization problem (6). We can transform this constrained optimization problem into a simple unconstrained optimization problem by feature duplication.
Denote as the sub-matrix of X with column indices in , and denote as the sub-matrix of Vi with row indices in . Furthermore, denote as the duplicated feature matrix and as the coefficient matrix. Then, we can check that and (6) is equivalent to the following unconstrained optimization problem:
| (7) |
The above problem (7) is a traditional group Lasso problem which can be solved efficiently by the blockwise majorization decent algorithm (Yang and Zou 2013). Denote the estimate of B as . In the application stage, given a testing subject x*, for the tth task, we can estimate by if is a class label and by if is a continuous response variable.
Simulation study
In this section, we perform numerical studies using simulated examples. For each example, we compare our proposed GGML method with (1) the Lasso method which learns different tasks separately; (2) the GGSL method which uses the correlation information among features and learns different tasks separately, and (3) M3T method which learns different tasks jointly while ignoring the correlation information among features. We implement Lasso, GGSL, and M3T methods as shown in “Objective function optimization” section to predict the response variables.
Similar to the measures used in Zhang and Shen (2012), the classification accuracy and the Pearson’s correlation coefficient (CC) are also used here to evaluate the classification and regression performances, respectively. In addition, we also use the root-mean-square error (RMSE) to evaluate the regression performance.
Simulated examples
We study three simulated examples. Each example has one classification task and two regression tasks. We set p = 100, B1 = (2, 2, …, 2, 0, 0, …, 0)T, B2 = B3 = (1, 1, …, 1, 0, 0, …, 0)T, where only the first 15 elements of each Bt (t = 1, 2, 3) are nonzero. For each t, the errors . For s = 1, 2, …, n, the feature vector (xs1, xs2, …, xsp)T is generated as follows.
Example 1
For 1 ≤j≤5, . For 6≤j≤10, . For 11≤j15, . For 16≤j≤p, . Here, .
Example 2
The features (xs1, xs2, …, xsp)T ~ N(0, Σ) with σij = 0.5∣i–j∣. For this example, we have ɷii = 1.333, ɷij = −0.667 if ∣i – j∣ and ɷij = 0 if ∣i – j∣ > 1.
Example 3
The features {xsj : 1≤j≤15} are generated from the same model as shown in Example 1. In addition, the features , where . Each off-diagonal entry in M is generated independently and equals 0.5 with probability 0.05 or 0 with probability 0.95. The diagonal entry of M is 0. Here, δ is chosen such that the conditional number of is equal to p – 15. Finally, is standardized to have unit diagonals.
After generating each column of the response matrix Y by model (1), we replace the elements in the first column of Y by their signs (positive or negative) to simulate class labels. For all examples, we generate 40 training samples, 40 validation samples, and 400 testing samples. All the models are fitted on the training data. The validation data are used to choose the tuning parameters and the testing data are used to evaluate different methods. For each example, we repeat the simulation 30 times.
Figure 2 shows the binary maps of the true precision matrices and Fig. 3 shows the corresponding feature graphs of these three examples. All these three graphs are sparse. For Examples 1 and 3, useful features (i.e., features with nonzero regression coefficients) are only connected with useful features. For Example 2, one useful feature is connected with one useless feature. In addition, for each example, different tasks are highly correlated since they share the same useful features. It is very interesting to study whether correlation information among features represented by the feature graph and the correlation information among tasks can be incorporated to improve the prediction performance.
Fig. 2.
Binary maps of the true precision matrices corresponding to these three simulated examples: left (Example 1), middle (Example 2), and right (Example 3). Each red dot represents a nonzero element in the precision matrix
Fig. 3.
True feature graphs corresponding to these three simulated examples: left (Example 1), middle (Example 2), and right (Example 3). Each blue dot indicates a feature
Simulation results
Table 2 shows the comparison of different methods using these three simulated examples. As shown in Table 2, for all these three examples, the GGSL method and GGML method acquire better performance than the Lasso method and the M3T method, respectively. This indicates that the extracted partial correlation information from features can be utilized to improve the prediction performance. In addition, the GGML method and M3T method also acquire better performance than the GGSL method and the Lasso method, respectively. It indicates that learning different correlated tasks jointly can also improve the prediction performance. For these three simulated examples, since our proposed GGML method incorporates both the partial correlation information among features and the intrinsic correlation information among different related tasks, it delivers the best performance in all cases. In the next section, we will further compare these four methods using the ADNI dataset.
Table 2.
Comparison of different methods using the simulated examples
| Example | Method | Accuracy | CC1 | CC2 | RMSE1 | RMSE2 |
|---|---|---|---|---|---|---|
| 1 | Lasso | 0.828 (0.007) | 0.909 (0.004) | 0.910 (0.003) | 4.091 (0.070) | 4.106 (0.064) |
| GGSL | 0.848 (0.009) | 0.932 (0.003) | 0.933 (0.002) | 3.548 (0.062) | 3.620 (0.057) | |
| M3T | 0.840 (0.006) | 0.918 (0.002) | 0.917 (0.002) | 3.916 (0.059) | 4.005 (0.059) | |
| GGML | 0.872 (0.006) | 0.938 (0.002) | 0.936 (0.001) | 3.402 (0.043) | 3.488 (0.039) | |
| 2 | Lasso | 0.765 (0.008) | 0.781 (0.010) | 0.767 (0.012) | 4.567 (0.084) | 4.596 (0.089) |
| GGSL | 0.800 (0.008) | 0.823 (0.008) | 0.810 (0.010) | 4.134 (0.075) | 4.213 (0.089) | |
| M3T | 0.796 (0.008) | 0.814 (0.008) | 0.807 (0.008) | 4.261 (0.075) | 4.290 (0.075) | |
| GGML | 0.816 (0.008) | 0.839 (0.007) | 0.838 (0.007) | 3.966 (0.069) | 3.981 (0.073) | |
| 3 | Lasso | 0.821 (0.005) | 0.910 (0.004) | 0.903 (0.005) | 3.995 (0.066) | 4.163 (0.096) |
| GGSL | 0.846 (0.008) | 0.932 (0.003) | 0.927 (0.004) | 3.506 (0.063) | 3.633 (0.084) | |
| M3T | 0.843 (0.006) | 0.918 (0.003) | 0.913 (0.004) | 3.907 (0.049) | 3.992 (0.073) | |
| GGML | 0.872 (0.006) | 0.938 (0.002) | 0.934 (0.002) | 3.388 (0.045) | 3.464 (0.050) |
Bold values represent the best performance for a particular measure
CC1 (CC2) is the Pearson’s correlation coefficient of the first (second) regression task; RMSE1 (RMSE2) is the root-mean-square error of the first (second) regression task. The values in the parenthesis are standard deviations
Analysis of the ADNI dataset
For the ADNI dataset, we estimate one class label and two clinical scores (i.e., MMSE and ADAS-Cog) using the MRI, FDG-PET and/or CSF features. Since there are two binary classification problems (AD vs. NC, and MCI vs. NC), we perform two sets of experiments. The first set of experiments uses the AD/NC dataset including only AD and NC subjects. The second set of experiments uses the MCI/NC dataset including only MCI and NC subjects. For each set of experiments, we consider four cases: (I) use only MRI features; (II) use only PET features; (III) use both MRI and PET features (denoted as MRI + PET); (IV) use all MRI, PET and CSF features (denoted as MRI + PET + CSF).
To evaluate the performance of different methods, we used the tenfold cross-validation (CV) strategy. Specifically, the whole samples were partitioned randomly into ten subsets. Each time only nine subsets were chosen for training and the remaining one was used for testing. We repeated this process ten times with each of the ten subsets used exactly once as the testing data. Furthermore, in consideration of possible bias due to the random partition in the tenfold CV, we repeated the whole 10-CV process 30 times. In the training process, each column of the training data was normalized to have mean 0 and standard deviation 1. For all methods, we performed another inner fivefold CV on the training data to choose the tuning parameters.
Partial correlation among different features
In the first step of the GGSL and GGML methods, we need to extract the effective correlation information from features. Note that, only the training data matrix of features were used to estimate the sparse undirected graph G representing the significant partial correlation among features. Figure 4 shows the binary maps of the estimated precision matrices. Binary maps in the first two columns indicate that many features within the same modality (e.g., MRI or PET) are partially correlated statistically significantly. However, as shown by the binary maps in the third column, the partial correlation between MRI features and PET features are not statistically significantly in most cases. Furthermore, the comparison between the binary maps in the first row and the second row indicates that the partial correlation information extracted from AD/NC data is similar to that of MCI/NC data. Similar to the example shown in Fig. 1, we can transform the estimated precision matrices to some undirected graphs. The feature graphs corresponding to the estimated precision matrices are shown in Fig. 5. This graph information will be used in the GGML and GGSL methods.
Fig. 4.
Binary maps of the estimated precision matrices. First row uses AD/NC data; second row uses MCI/NC data. First column use only MRI features; second column use only PET features; third column use both MRI and PET features. Each red dot in the plot represents a nonzero element
Fig. 5.
Feature graphs corresponding to the estimated precision matrices. First row uses AD/NC data; second row uses MCI/NC data. First column use only MRI features; second column use only PET features; third column use both MRI and PET features. Each blue dot represents an MRI feature and each green dot represents a PET feature
Classification results
The classification accuracies of different methods are shown in Table 3. All methods deliver higher classification accuracy for the AD/NC dataset than the corresponding classification accuracy for the MCI/NC dataset. For the AD/NC dataset, when we use only MRI features or PET features, the GGSL method and GGML method acquire better classification performance than the Lasso method and the M3T method, respectively. This indicates that the extracted partial correlation information from features can be utilized to improve the classification performance. In addition, when we use both MRI and PET features or all the MRI, PET, and CSF features, since it is relatively easy to discriminate AD subjects from NC subjects in this case, all four methods acquire similar high classification accuracies.
Table 3.
Comparison of the classification performance on the ADNI dataset
| Data | Method | MRI | PET | MRI + PET | MRI + PET+CSF |
|---|---|---|---|---|---|
| AD/NC | Lasso | 0.878 (0.003) | 0.823 (0.003) | 0.903 (0.003) | 0.917 (0.003) |
| GGSL | 0.896 (0.003) | 0.830 (0.003) | 0.911 (0.002) | 0.915 (0.002) | |
| M3T | 0.884 (0.002) | 0.821 (0.002) | 0.914 (0.002) | 0.918 (0.002) | |
| GGML | 0.906 (0.003) | 0.832 (0.003) | 0.919 (0.002) | 0.926 (0.002) | |
| MCI/NC | Lasso | 0.722 (0.003) | 0.677 (0.003) | 0.737 (0.004) | 0.750 (0.004) |
| GGSL | 0.737 (0.004) | 0.688 (0.004) | 0.755 (0.005) | 0.769 (0.003) | |
| M3T | 0.738 (0.003) | 0.655 (0.003) | 0.775 (0.003) | 0.776 (0.003) | |
| GGML | 0.751 (0.003) | 0.696 (0.003) | 0.784 (0.003) | 0.800 (0.003) |
Bold values represent the best performance for a particular measure
The reported values are the averaged classification accuracy with standard deviation.
For the MCI/NC dataset, on the one hand, the comparison between GGSL and Lasso (or GGML and M3T) indicates that using the extracted partial correlation information among features improve the classification performance significantly. On the other hand, the comparison between GGML and GGSL (or M3T and Lasso) shows that the joint classification and regression could provide better classification performance than the separate classification. Since our proposed GGML method incorporates both the partial correlation information among features and the intrinsic correlation information among different related tasks, it delivers the best classification performance.
Regression results
For regression tasks, we need to predict both the MMSE score and the ADAS-Cog score. Tables 4 and 5 show the comparison of regression performance on the AD/NC data and the MCI/NC data, respectively. As shown in Tables 4 and 5, our proposed GGML method acquires promising performance in most cases. For example, when we use all the features to predict the MMSE score, for the AD/NC data, our proposed GGML method achieves the highest correlation coefficient 0.745 while the corresponding correlation coefficients for Lasso, GGSL, and M3T are 0.709, 0.723 and 0.724, respectively. For the MCI/NC data, GGML also has the best performance with correlation coefficient 0.382 while the corresponding correlation coefficients for Lasso, GGSL, and M3T are 0.303, 0.325 and 0.364, respectively. In addition, when we use all the features to predict the ADAS-Cog scores, for the AD/NC data, our proposed GGML method achieves the highest correlation coefficient 0.740 while the corresponding correlation coefficients for Lasso, GGSL, and M3T are 0.664, 0.719 and 0.718, respectively. For the MCI/NC data, GGML also has the best performance with correlation coefficient 0.472 while the corresponding correlation coefficients for Lasso, GGSL, and M3T are 0.336, 0.464 and 0.426, respectively.
Table 4.
Comparison of the regression performance on the AD/NC dataset
| Response | Method | MRI | PET | MRI + PET | MRI + PET + CSF |
|---|---|---|---|---|---|
| MMSE | Lasso | 0.601 (0.005) | 0.601 (0.004) | 0.688 (0.003) | 0.709 (0.003) |
| GGSL | 0.656 (0.003) | 0.611 (0.003) | 0.698 (0.003) | 0.723 (0.003) | |
| M3T | 0.651 (0.004) | 0.585 (0.003) | 0.693 (0.002) | 0.724 (0.002) | |
| GGML | 0.671 (0.002) | 0.598 (0.003) | 0.712 (0.002) | 0.745 (0.002) | |
| ADAS-Cog | Lasso | 0.695 (0.003) | 0.611 (0.004) | 0.652 (0.004) | 0.664 (0.004) |
| GGSL | 0.703 (0.002) | 0.632 (0.004) | 0.708 (0.003) | 0.719 (0.002) | |
| M3T | 0.703 (0.002) | 0.635 (0.003) | 0.709 (0.003) | 0.718 (0.002) | |
| GGML | 0.705 (0.002) | 0.644 (0.003) | 0.721 (0.002) | 0.740 (0.002) |
Bold values represent the best performance for a particular measure
The reported values are the averaged correlation coefficient with standard deviation.
Table 5.
Comparison of the regression performance on the MCI/NC dataset
| Response | Method | MRI | PET | MRI + PET | MRI + PET + CSF |
|---|---|---|---|---|---|
| MMSE | Lasso | 0.326 (0.006) | 0.168 (0.010) | 0.303 (0.007) | 0.303 (0.007) |
| GGSL | 0.313 (0.007) | 0.181 (0.004) | 0.323 (0.005) | 0.325 (0.005) | |
| M3T | 0.382 (0.004) | 0.182 (0.007) | 0.379 (0.004) | 0.364 (0.004) | |
| GGML | 0.394 (0.004) | 0.213 (0.005) | 0.392 (0.005) | 0.382 (0.004) | |
| ADAS-Cog | Lasso | 0.355 (0.006) | 0.427 (0.006) | 0.343 (0.006) | 0.336 (0.006) |
| GGSL | 0.378 (0.005) | 0.451 (0.005) | 0.462 (0.004) | 0.464 (0.003) | |
| M3T | 0.354 (0.004) | 0.406 (0.006) | 0.429 (0.003) | 0.426 (0.003) | |
| GGML | 0.391 (0.004) | 0.469 (0.005) | 0.462 (0.003) | 0.472 (0.003) |
Bold values represent the best performance for a particular measure.
The reported values are the averaged correlation coefficient with standard deviation.
It is interesting to note that for the MCI/NC dataset, the PET and CSF data seem to be not useful for the prediction of MMSE score. All four methods acquire poor prediction of the MMSE scores when only the PET data are used. In addition, compared with the cases only using MRI data, both M3T and GGML methods acquire worse performance when the additional PET/CSF data are used. Similar to the previous discussion about classification performance, the comparison between GGSL and Lasso (or GGML and M3T) indicates that using the extracted partial correlation information among features improves the prediction of MMSE and ADAS-Cog scores significantly. In addition, the comparison between GGML and GGSL (or M3T and Lasso) shows that joint classification and regression could deliver better prediction performance than the separate regression of MMSE (or ADAS-Cog) on the features. Since our GGML method incorporates both the partial correlation information among features and the intrinsic correlation information among different tasks, it delivers the best prediction of the MMSE and ADAS-Cog scores.
Most discriminative brain regions
In this subsection, we investigate the most discriminative brain regions for the diagnosis of disease status and the prediction of the MMSE and ADAS-Cog scores. For each method, we repeated the whole 10-CV process 30 times and acquired 300 different models using different training datasets. Figure 6 shows the selection frequency of each of 93 ROIs for the AD/NC classification task using only MRI features, where the selection frequency for each ROI is defined as
For each method, some ROIs are always selected while some ROIs are seldom selected. Compared with Lasso and M3T, the GGSL and GGML methods tend to select more ROIs since they use the feature graph information and encourage the significantly partially correlated features to be selected jointly. According to the selection frequency, we compare the top ten selected ROIs of different methods for different tasks. Tables 6, 7 and 8 show the indices of the top ten selected ROIs of the four methods for different tasks (classification or regression), different datasets (AD/NC or MCI/NC) and different modalities (MRI or PET). Table 9 contains the full names of the ROIs.
Fig. 6.
Selection frequency of 93 ROIs for the AD/NC classification task using only MRI features
Table 6.
Comparison of the top ten selected ROIs for the classification task
| MRI | PET | |
|---|---|---|
| AD/NC | ||
| Lasso | 18, 22, 38, 44, 46, 69, 80, 83, 84, 90 | 12, 18, 23, 26, 41, 68, 69, 73, 81, 87 |
| GGSL | 18, 22, 30, 44, 58, 69, 80, 83, 84, 90 | 12, 18, 26, 35, 41, 68, 69, 73, 79, 87 |
| M3T | 9, 18, 22, 46, 48, 69, 80, 83, 84, 90 | 12, 23, 26, 35, 62, 68, 69, 73, 81, 87 |
| GGML | 18, 22, 30, 44, 48, 67, 80, 83, 84, 90 | 7, 12, 23, 26, 35, 62, 68, 69, 73, 87 |
| MCI/NC | ||
| Lasso | 17, 28, 40, 48, 63, 64, 69, 83, 86, 92 | 2, 37, 39, 41, 54, 55, 63, 68, 81, 87 |
| GGSL | 17, 22, 30, 40, 46, 64, 69, 76, 83, 92 | 11, 12, 23, 26, 28, 29, 38, 40, 41, 87 |
| M3T | 17, 40, 46, 48, 53, 63, 64, 69, 83, 86 | 12, 35, 41, 62, 64, 68, 73, 79, 81, 87 |
| GGML | 22, 40, 45, 46, 61, 64, 69, 76, 83, 86 | 11, 12, 26, 29, 38, 40, 41, 47, 79, 87 |
Table 7.
Comparison of the top ten selected ROIs for the regression task (MMSE)
| MRI | PET | |
|---|---|---|
| AD/NC | ||
| Lasso | 9, 15, 18, 19, 22, 40, 80, 83, 84, 90 | 12, 18, 23, 26, 62, 63, 68, 69, 73, 79 |
| GGSL | 19, 22, 48, 58, 62, 67, 80, 83, 84, 85 | 7, 12, 23, 26, 35, 41, 62, 68, 69, 73 |
| M3T | 9, 18, 22, 46, 48, 69, 80, 83, 84, 90 | 12, 23, 26, 35, 62, 68, 69, 73, 81, 87 |
| GGML | 18, 22, 30, 44, 48, 67, 80, 83, 84, 90 | 7, 12, 23, 26, 35, 62, 68, 69, 73, 87 |
| MCI/NC | ||
| Lasso | 17, 33, 40, 44, 48, 53, 62, 64, 69, 86 | 4, 23, 24, 33, 41, 61, 62, 68, 84, 87 |
| GGSL | 22, 45, 46, 48, 61, 64, 69, 76, 83, 86 | 11, 12, 23, 26, 28, 29, 38, 40, 41, 87 |
| M3T | 17, 40, 46, 48, 53, 63, 64, 69, 83, 86 | 12, 35, 41, 62, 64, 68, 73, 79, 81, 87 |
| GGML | 22, 40, 45, 46, 61, 64, 69, 76, 83, 86 | 11, 12, 26, 29, 38, 40, 41, 47, 79, 87 |
Table 8.
Comparison of the top ten selected ROIs for the regression task (ADAS-Cog)
| MRI | PET | |
|---|---|---|
| AD/NC | ||
| Lasso | 9, 18, 46, 48, 61, 62, 80, 83, 84, 90 | 12, 23, 26, 30, 35, 62, 73, 76, 81, 92 |
| GGSL | 18, 30, 48, 58, 62, 67, 80, 83, 84, 85 | 7, 12, 23, 26, 30, 35, 62, 69, 73, 92 |
| M3T | 9, 18, 22, 46, 48, 69, 80, 83, 84, 90 | 12, 23, 26, 35, 62, 68, 69, 73, 81, 87 |
| GGML | 18, 22, 30, 44, 48, 67, 80, 83, 84, 90 | 7, 12, 23, 26, 35, 62, 68, 69, 73, 87 |
| MCI/NC | ||
| Lasso | 10, 17, 18, 38, 45, 46, 69, 72, 83, 87 | 10, 12, 14, 19, 35, 39, 41, 62, 64, 88 |
| GGSL | 17, 45, 46, 61, 62, 69, 72, 76, 83, 87 | 11, 12, 28, 29, 35, 38, 41, 71, 79, 87 |
| M3T | 17, 40, 46, 48, 53, 63, 64, 69, 83, 86 | 12, 35, 41, 62, 64, 68, 73, 79, 81, 87 |
| GGML | 22, 40, 45, 46, 61, 64, 69, 76, 83, 86 | 11, 12, 26, 29, 38, 40, 41, 47, 79, 87 |
Table 9.
Names of the selected ROIs in this study
| ROI index | ROI name |
|---|---|
| 2 | Middle frontal gyrus right |
| 4 | Insula right |
| 7 | Cingulate region right |
| 9 | Medial frontal gyrus left |
| 10 | Superior frontal gyrus right |
| 11 | Globus pallidus right |
| 12 | Globus pallidus left |
| 14 | Inferior frontal gyrus left |
| 15 | Putamen right |
| 17 | Parahippocampal gyrus left |
| 18 | Angular gyrus right |
| 19 | Temporal pole right |
| 22 | Uncus right |
| 23 | Cingulate region left |
| 24 | Fornix left |
| 26 | Precuneus right |
| 28 | Cerebral peduncle left |
| 29 | Cerebral peduncle right |
| 30 | Hippocampal formation right |
| 33 | Caudate nucleus left |
| 35 | Anterior limb of internal capsule left |
| 37 | Middle frontal gyrus left |
| 38 | Superior parietal lobule left |
| 39 | Caudate nucleus right |
| 40 | Cuneus left |
| 41 | Precuneus left |
| 44 | Supramarginal gyrus right |
| 45 | Superior temporal gyrus left |
| 46 | Uncus left |
| 47 | Middle occipital gyrus right |
| 48 | Middle temporal gyrus left |
| 53 | Postcentral gyrus left |
| 54 | Inferior frontal gyrus right |
| 55 | Precentral gyrus left |
| 58 | Perirhinal cortex right |
| 61 | Perirhinal cortex left |
| 62 | Inferior temporal gyrus left |
| 63 | Temporal pole left |
| 64 | Entorhinal cortex left |
| 67 | Lateral occipitotemporal gyrus right |
| 68 | Entorhinal cortex right |
| 69 | Hippocampal formation left |
| 71 | Parietal lobe WM right |
| 72 | Insula left |
| 73 | Postcentral gyrus right |
| 76 | Amygdala left |
| 79 | Anterior limb of internal capsule right |
| 80 | Middle temporal gyrus right |
| 81 | Occipital pole right |
| 83 | Amygdala right |
| 84 | Inferior temporal gyrus right |
| 85 | Superior temporal gyrus right |
| 86 | Middle occipital gyrus left |
| 87 | Angular gyrus left |
| 88 | Medial occipitotemporal gyrus right |
| 90 | Lateral occipitotemporal gyrus left |
| 92 | Occipital pole left |
As shown in Tables 6, 7 and 8, for different tasks, the top ten selected ROIs of the single-task learning methods such as Lasso and GGSL are different while the top ten selected ROIs of the multi-task learning methods such as M3T and GGML are the same. We can also observe that the top ten selected ROIs for the cases using MRI features are not very similar to the top ten selected ROIs for the cases using PET features. One possible reason is that MRI features and PET features provide complementary information for the diagnosis of AD. However, for each case, the top ten selected ROIs of the four methods are similar. For example, for the AD/NC classification task using MRI features, Table 6 indicates that the ROIs with indices 18, 80, 83, 84, and 90 are frequently selected by all four methods. It is interesting to point out that both GGML and M3T methods also select the 48th ROI frequently for the AD/NC classification task while this ROI is not one of the top ten selected ROIs of Lasso and GGSL for this task. However, as shown in Table 8, the 48th ROI is frequently selected by Lasso and GGSL for the regression task (ADAS-Cog) using AD/NC data. This indicates that the multi-task learning methods such as GGML and M3T incorporate the clinical score information for the classification task. On the other hand, as shown in Table 8, both GGML and M3T methods select the 22th ROI frequently for the regression task (ADAS-Cog) using AD/NC data while this ROI is not one of the top ten selected ROIs of Lasso and GGSL for this task. However, as shown in Table 6, the 22th ROI is frequently selected by Lasso and GGSL for the classification task (AD vs NC). This indicates that the multi-task learning methods such as GGML and M3T incorporate the class label information for the regression task.
Furthermore, as shown in Tables 6, 7 and 8, for the study using AD/NC data and MRI features, the common top ten selected ROIs of Lasso for different tasks are the ROIs with indices 18, 80, 83, 84 and 90. The common top ten selected ROIs of the GGSL method for different tasks are the ROIs with indices 58, 80, 83, and 84. Most of these ROIs are the top ten selected ROIs of our proposed GGML method. In Figs. 7 and 8, we visualize the top ten selected ROIs of our proposed GGML method when different datasets (AD/NC or MCI/NC) and different modalities (MRI or PET) are used. Most of the selected regions, e.g., uncus right (22), hippocampal formation right (30), uncus left (46), middle temporal gyrus left (48), hippocampus formation left (69), middle temporal gyrus right (80) and amygdale right (83), are known to be highly correlated with AD and MCI by many studies using group comparison methods (Jack et al. 1999; Misra et al. 2009; Zhang and Shen 2012).
Fig. 7.
Top ten most discriminative brain regions selected by GGML method using AD/NC dataset
Fig. 8.
Top ten most discriminative brain regions selected by GGML method using MCI/NC dataset
Discussion
In this section, we first discuss some issues about constructing the undirected feature graph G. Then, some possible extensions of our proposed method will be discussed.
Construction of the undirected feature graph G
Before performing our proposed GGML method, we need to construct an undirected feature graph G representing the significant correlation information among features. In “Extract the correlation information among features” section, we proposed to use the graphical Lasso method to construct this graph. For some datasets, the constructed graph G may include many edges corresponding to weak or even wrong partial correlation due to bad estimation of the precision matrix. In this case, by thresholding of the estimated precision matrix, we can construct a sparse undirected graph for representing only the most reliable partial correlation.
Furthermore, besides partial correlation information among features, we can also combine other useful information (e.g., some prior information about features) to construct this graph G. Our proposed GGML method can be used for any given undirected feature graph G representing the relationships among different features.
Use of the structure information among different subjects
Our proposed GGML method utilizes both the correlation information among features and the intrinsic correlation information among different response variables. Actually, we can also generalize GGML method to incorporate the structure information among different subjects. Similar to the locality preserving projection (LPP) method (He and Niyogi 2004), we can model the structure information among different training subjects as another sparse undirected graph S. Here, S has n nodes and each node represents one subject. The connectivity of the graph S can be defined by the k nearest neighbors, i.e., subjects xs and xl are connected by an edge if xs is among the k nearest neighbors of xl, or xl is among the k nearest neighbors of xs. In order to use the structure information among different training subjects represented by S, we can preserve the neighborhood structure of subjects, i.e., encouraging the predicted response variables and to be close if the sth and the lth subjects are connected in the undirected graph S.
Conclusion
In summary, we propose a new graph-guided multi-task learning method to incorporate the correlation information among features and the intrinsic correlation information among different tasks. To use the correlation information among features, our proposed GGML method encourages the partially correlated features to be in or out of the model jointly. Furthermore, in order to acquire more robust and accurate feature selection, our proposed GGML method encourages different tasks to share a common useful feature subset. Theoretically, our proposed GGML method is very general and includes the M3T method as a special case. The experimental results on the simulated examples and the ADNI dataset also show the advantage of the proposed GGML method over the existing methods.
Acknowledgments
This work was supported in part by NIH Grants AG041721, EB006733, EB008374, EB009634, NSF DMS-1407241 and NIH/NCI Grant R01 CA-149569. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) National Institutes of Health Grant U01 AG024904. ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abbott, AstraZeneca AB, Amorfix, Bayer Schering Pharma AG, Bioclinica Inc., Biogen Idec, Bristol-Myers Squibb, Eisai Global Clinical Development, Elan Corporation, Genentech, GE Healthcare, Innogenetics, IXICO, Janssen Alzheimer Immunotherapy, Johnson and Johnson, Eli Lilly and Co., Medpace, Inc., Merck and Co., Inc., Meso Scale Diagnostic, & LLC, Novartis AG, Pfizer Inc, F. Hoffman-La Roche, Servier, Synarc, Inc., and Takeda Pharmaceuticals, as well as non-profit partners the Alzheimer’s Association and Alzheimer’s Drug Discovery Foundation, with participation from the US Food and Drug Administration. Private sector contributions to ADNI are facilitated by the Foundation for the National Institutes of Health (http://www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles.
References
- Bain LJ, Jedrziewski K, Morrison-Bogorad M, Albert M, Cotman C, Hendrie H, Trojanowski JQ. Healthy brain aging: a meeting report from the Sylvan M. Cohen annual retreat of the University of Pennsylvania Institute on Aging. Alzheimer’s Dement J Alzheimer’s Assoc. 2008;4:443–446. doi: 10.1016/j.jalz.2008.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bouwman FH, van der Flier WM, Schoonenboom NS, van Elk EJ, Kok A, Rijmen F, Blankenstein MA, Scheltens P. Longitudinal changes of csf biomarkers in memory clinic patients. Neurology. 2007;69(10):1006–1011. doi: 10.1212/01.wnl.0000271375.37131.04. [DOI] [PubMed] [Google Scholar]
- Brookmeyer R, Johnson E, Ziegler-Graham K, Arrighi HM. Forecasting the global burden of Alzheimer’s disease. Alzheimer’s Dement. 2007;3:186–191. doi: 10.1016/j.jalz.2007.04.381. [DOI] [PubMed] [Google Scholar]
- Cai T, Liu W, Luo X. A constrained l1 minimization approach to sparse precision matrix estimation. J Am Stat Assoc. 2011;106(494):594–607. [Google Scholar]
- Cheng B, Zhang D, Chen S, Kaufer DI, Shen D. Semi-supervised multimodal relevance vector regression improves cognitive performance estimation from imaging and biological biomarkers. Neuroinformatics. 2013;11(3):339–353. doi: 10.1007/s12021-013-9180-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Santi S, de Leon MJ, Rusinek H, Convit A, Tarshish CY, Roche A, Tsui WH, Kandil E, Boppana M, Daisley K, et al. Hippocampal formation glucose metabolism and volume losses in MCI and AD. Neurobiol Aging. 2001;22(4):529–539. doi: 10.1016/s0197-4580(01)00230-5. [DOI] [PubMed] [Google Scholar]
- Du AT, Schuff N, Kramer JH, Rosen HJ, Gorno-Tempini ML, Rankin K, Miller BL, Weiner MW. Different regional patterns of cortical thinning in Alzheimer’s disease and frontotemporal dementia. Brain. 2007;130:1159–1166. doi: 10.1093/brain/awm016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duchesne S, Caroli A, Geroldi C, Frisoni GB, Collins DL. Medical image computing and computer-assisted intervention—MICCAI 2005. Springer; 2005. Predicting clinical variable from MRI features: application to MMSE in MCI; pp. 392–399. [DOI] [PubMed] [Google Scholar]
- Fan Y, Kaufer D, Shen D. Joint estimation of multiple clinical variables of neurological diseases from imaging patterns. 2010 IEEE international symposium on biomedical imaging: from nano to macro, IEEE.2010. pp. 852–855. [Google Scholar]
- Fjell AM, Walhovd KB, Fennema-Notestine C, McEvoy LK, Hagler DJ, Holland D, Brewer JB, Dale AM. CSF biomarkers in prediction of cerebral and clinical change in mild cognitive impairment and Alzheimer’s disease. J Neurosci. 2010;30:2088–2101. doi: 10.1523/JNEUROSCI.3785-09.2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Folstein MF, Folstein SE, McHugh PR. Mini-mental state: a practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975;12(3):189–198. doi: 10.1016/0022-3956(75)90026-6. [DOI] [PubMed] [Google Scholar]
- Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9:432–441. doi: 10.1093/biostatistics/kxm045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hampson M, Peterson BS, Skudlarski P, Gatenby JC, Gore JC. Detection of functional connectivity using temporal correlations in MR images. Hum Brain Mapp. 2002;15(4):247–262. doi: 10.1002/hbm.10022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He X, Niyogi P. Locality preserving projections. In: Thrun S, Saul LK, editors. Neural information processing systems. Vol. 16. MIT Press; Cambridge: 2004. p. 153. [Google Scholar]
- Hebert LE, Beckett LA, Scherr PA, Evans DA. Annual incidence of Alzheimer disease in the United States projected to the years 2000 through 2050. Alzheimer Dis Assoc Disord. 2001;15:169–173. doi: 10.1097/00002093-200110000-00002. [DOI] [PubMed] [Google Scholar]
- Jack C, Petersen RC, Xu YC, O’Brien PC, Smith GE, Ivnik RJ, Boeve BF, Waring SC, Tangalos EG, Kokmen E. Prediction of AD with MRI-based hippocampal volume in mild cognitive impairment. Neurology. 1999;52(7):1397–1397. doi: 10.1212/wnl.52.7.1397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kabani N, MacDonald D, Holmes C, Evans A. A 3d atlas of the human brain. NeuroImage. 1998;7:S717. [Google Scholar]
- Lee H, Lee DS, Kang H, Kim BN, Chung MK. Sparse brain network recovery under compressed sensing. IEEE Trans Med Imaging. 2011;30(5):1154–1165. doi: 10.1109/TMI.2011.2140380. [DOI] [PubMed] [Google Scholar]
- McEvoy LK, Fennema-Notestine C, Roddey JC, Jr, DJH, Holland D, Karow DS, Pung CJ, Brewer JB, Dale AM. Alzheimer disease: quantitative structural neuroimaging for detection and prediction of clinical and structural changes in mild cognitive impairment. Radiology. 2009;251:195–205. doi: 10.1148/radiol.2511080924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. Ann Stat. 2006;34(3):1436–1462. [Google Scholar]
- Misra C, Fan Y, Davatzikos C. Baseline and longitudinal patterns of brain atrophy in MCI patients, and their use in prediction of short-term conversion to AD: results from ADNI. Neuroimage. 2009;44(4):1415–1422. doi: 10.1016/j.neuroimage.2008.10.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morris JC, Storandt M, Miller JP, McKeel DW, Price JL, Rubin EH, Berg L. Mild cognitive impairment represents early-stage Alzheimer disease. Arch Neurol. 2001;58(3):397. doi: 10.1001/archneur.58.3.397. [DOI] [PubMed] [Google Scholar]
- Rosen WG, Mohs RC, Davis KL. A new rating scale for alzheimer’s disease. Am J Psychiatry. 1984;141(11):1356–1364. doi: 10.1176/ajp.141.11.1356. [DOI] [PubMed] [Google Scholar]
- Ryali S, Chen T, Supekar K, Menon V. Estimation of functional connectivity in FMRI data using stability selection-based sparse partial correlation with elastic net penalty. Neuroimage. 2012;59(4):3852–3861. doi: 10.1016/j.neuroimage.2011.11.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shen D, Davatzikos C. Hammer: hierarchical attribute matching mechanism for elastic registration. IEEE Trans Med Imaging. 2002;21(11):1421–1439. doi: 10.1109/TMI.2002.803111. [DOI] [PubMed] [Google Scholar]
- Sled JG, Zijdenbos AP, Evans AC. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans Med Imaging. 1998;17(1):87–97. doi: 10.1109/42.668698. [DOI] [PubMed] [Google Scholar]
- Stonnington CM, Chu C, Klöppel S, Jack CR, Jr, Ashburner J, Frackowiak RS. Predicting clinical scores from magnetic resonance scans in Alzheimer’s disease. Neuroimage. 2010;51(4):1405–1413. doi: 10.1016/j.neuroimage.2010.03.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B. 1996;58(1):267–288. [Google Scholar]
- Wang Y, Fan Y, Bhatt P, Davatzikos C. High-dimensional pattern regression using machine learning: from medical images to continuous clinical variables. NeuroImage. 2010;50(4):1519–1535. doi: 10.1016/j.neuroimage.2009.12.092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y, Nie J, Yap PT, Shi F, Guo L, Shen D. Robust deformable-surface-based skull-stripping for large-scale studies. Med Image Comput Comput Assist Interv. 2011;6893:635–642. doi: 10.1007/978-3-642-23626-6_78. [DOI] [PubMed] [Google Scholar]
- Yang S, Yuan L, Lai YC, Shen X, Wonka P, Ye J. Feature grouping and selection over an undirected graph. Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM; 2012. pp. 922–930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Y, Zou H. gglasso: group lasso penalized learning using a unified BMD algorithm. (version 1.1) 2013 http://CRAN.R-project.org/package=gglasso, r package.
- Yu G, Liu Y, Thung KH, Shen D. Multi-task linear programming discriminant analysis for the identification of progressive MCI individuals. PloS One. 2014;9(5):e96458. doi: 10.1371/journal.pone.0096458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B. 2006;68:49–67. [Google Scholar]
- Zhang D, Shen D. Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease. NeuroImage. 2012;59(2):895–907. doi: 10.1016/j.neuroimage.2011.09.069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imaging. 2001;20(1):45–57. doi: 10.1109/42.906424. [DOI] [PubMed] [Google Scholar]
- Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101(476):1418–1429. [Google Scholar]







