Skip to main content
BioMed Research International logoLink to BioMed Research International
. 2021 Sep 2;2021:5531940. doi: 10.1155/2021/5531940

Biomarker Extraction Based on Subspace Learning for the Prediction of Mild Cognitive Impairment Conversion

Ying Li 1,2, Yixian Fang 3, Jiankun Wang 4, Huaxiang Zhang 2,, Bin Hu 2,5,
PMCID: PMC8429015  PMID: 34513992

Abstract

Accurate recognition of progressive mild cognitive impairment (MCI) is helpful to reduce the risk of developing Alzheimer's disease (AD). However, it is still challenging to extract effective biomarkers from multivariate brain structural magnetic resonance imaging (MRI) features to accurately differentiate the progressive MCI from stable MCI. We develop novel biomarkers by combining subspace learning methods with the information of AD as well as normal control (NC) subjects for the prediction of MCI conversion using multivariate structural MRI data. Specifically, we first learn two projection matrices to map multivariate structural MRI data into a common label subspace for AD and NC subjects, where the original data structure and the one-to-one correspondence between multiple variables are kept as much as possible. Afterwards, the multivariate structural MRI features of MCI subjects are mapped into a common subspace according to the projection matrices. We then perform the self-weighted operation and weighted fusion on the features in common subspace to extract the novel biomarkers for MCI subjects. The proposed biomarkers are tested on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Experimental results indicate that our proposed biomarkers outperform the competing biomarkers on the discrimination between progressive MCI and stable MCI. And the improvement from the proposed biomarkers is not limited to a particular classifier. Moreover, the results also confirm that the information of AD and NC subjects is conducive to predicting conversion from MCI to AD. In conclusion, we find a good representation of brain features from high-dimensional MRI data, which exhibits promising performance for predicting conversion from MCI to AD.

1. Introduction

Alzheimer's disease (AD) characterized by memory loss and cognitive decline is the most prevalent neurodegenerative disease [1, 2]. Mild cognitive impairment (MCI) is regarded as the prodromal stage of AD with possibility to develop AD. Individuals with MCI can carry out daily activities, but their thinking abilities have mild and measurable changes [3]. On average, 32 percent of individuals with MCI will convert to AD within 5 years [4]. Therefore, it is critical to identify MCI as early as possible, so that we can delay the progress of AD by the well-targeted treatment. The development of neuroimaging techniques provides powerful tools for early prediction of AD. Structural magnetic resonance imaging (MRI) with high spatial resolution, high availability, noninvasive nature, and moderate costs is an extensively used neuroimaging modality. Numerous structural MRI-based biomarkers have been extracted for the AD detection at different stages [513]. For instance, in [6], spatial frequency components of cortical thickness were used for individual AD identification based on incremental learning. In [13], an individual network was constructed using six types of morphological features to improve the accuracy of AD and MCI diagnoses. However, since the pathological variations are subtle at the MCI stage, it is still challenging to develop more advanced biomarkers to accurately predict the conversion from MCI to AD.

According to whether the MCI subjects will convert to AD or not within a given time period (i.e., 3 years), they are separated into two categories: progressive MCI (pMCI) and stable MCI (sMCI). Previous studies [14, 15] have shown that the subjects with pMCI are similar to AD while subjects with sMCI are more like normal control (NC). As a result, the classification between AD and NC is a simple version of that between pMCI and sMCI. Due to the high heterogeneity of MCI population, it is effective to take advantage of AD and NC information in MCI conversion prediction, such as feature selection and classifier training. Studies [1422] also have demonstrated that the information of AD and NC subjects is helpful in distinguishing pMCI subjects from sMCI subjects. In [16, 17], the data of AD and NC subjects was used to build classifier for the discrimination between pMCI and sMCI subjects. In [1820], the AD and NC subjects were regarded as labeled samples while MCI subjects were taken as unlabeled samples, and a semisupervised learning approach was applied to dividing MCI subjects into normal-like and AD-like categories. In [14], to distinguish pMCI from sMCI, a semisupervised low-density separation (LDS) method was used to integrate AD and NC information. In [21], a novel domain transfer learning method drawing support from AD and NC subjects was used for MCI conversion prediction. Besides, some studies extracted novel biomarkers for MCI conversion prediction by information propagation from AD and NC subjects to MCI subjects. For instance, in [22], the information was propagated from AD and NC subjects to MCI subjects by a weighting function, and the average grading value was computed for MCI classification. In [15], the disease labels of AD and NC subjects were propagated to MCI subjects using elastic net technique, and a global grading biomarker was developed.

Owing to the high dimensionality of MRI features, it is difficult to find a good representation of brain features to reveal their subtle pathological variations for MCI conversion prediction [23]. The subspace learning method as a dimension reduction approach has become a hot topic in many fields [2430]. In the field of AD diagnosis, several subspace learning methods, such as canonical correlation analysis (CCA) [31, 32], independent component analysis (ICA) [33, 34], partial least squares (PLS) [35, 36], locality preserving projection (LPP) [37, 38], linear discriminant analysis (LDA) [38, 39], and locally linear embedding (LLE) [23, 40], have demonstrated promising performance. For instance, in [23], multivariate MRI data were transformed into a locally linear space by LLE algorithm, and the embedded features were used to predict the conversion from MCI to AD. In [34], the risk factors associated with MCI conversion were investigated by combining ICA with the multivariate Cox proportional hazards regression model. In [38], a sparse least square regression framework with LDA and LPP was proposed for feature selection in AD diagnosis. The experimental results verified that subspace learning methods outperformed feature selection methods. Although many subspace learning methods have been applied to the early detection of AD, it is still a challenging problem to map MRI data into a low-dimensional subspace and find representative brain features for detecting the differences between pMCI and sMCI. In addition, it is interesting to investigate how the AD and NC data can provide auxiliary information in this procedure and enhance the performance of MCI classification.

In this work, we propose a method to extract biomarkers of MCI subjects based on subspace learning for predicting conversion from MCI to AD. Specifically, we first learn two projection matrices to map multivariate MRI data of regional cortical thickness (CT) and cortical volume (CV) into a common label subspace with lower dimensions for AD and NC subjects, where the correlation of multiple variables and the original data structure are kept as much as possible. We then use the projection matrices to map the CT and CV data of the MCI subjects into the common subspace to obtain the CT- and CV-based features for MCI subjects accordingly. After that, we perform self-weighted operation and weighted fusion on the CT- and CV-based features in common subspace and extract the novel biomarkers for MCI subjects.

2. Materials and Method

2.1. Image Data and Preprocessing

Data used in this work are acquired from Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/). We use baseline MRI scans (1.5 T, 1.25 mm × 1.25 mm in-plane spatial resolution, 1.2 mm thick slices) of 528 subjects, which include 142 AD subjects, 165 NC subjects, and 221 MCI subjects. Moreover, the 221 MCI subjects contain 126 pMCI and 95 sMCI subjects. The characteristics of the participants are shown in Table 1.

Table 1.

Characteristics of the subjects.

Variables NC MCI AD
sMCI pMCI
No. of subjects (male/female) 165 (78/87) 95 (63/32) 126 (73/53) 142 (72/70)
Age 76.40 ± 5.37 74.94 ± 7.32 73.40 ± 9.25 76.10 ± 7.51
CDR 0 0.5 0.5 0.5/1
MMSE 29.19 ± 0.96 27.69 ± 1.73 26.49 ± 1.70 23.20 ± 2.01

CDR: Clinical Dementia Rating; MMSE: Mini-Mental State Examination.

The image preprocessing involves the following steps: motion correction, nonbrain tissue removal, coordinate transformation, gray matter (GM) segmentation, and reconstruction of GM/white matter boundaries [4143]. We conducted all preprocessing steps by FreeSurfer v5.3.0 (http://surfer.nmr.mgh.harvard.edu). The reconstruction and segmentation errors are visually checked using FreeView software and manually corrected. After that, surface inflation and registration are performed, followed by cortical thickness and volume measurement calculation [44]. Finally, the images were smoothed by a 30 mm full width at half maximum Gaussian kernel [45]. The images are segmented into 90 regions in the light of the automated anatomical labeling atlas [46], and then, 12 subcortical regions are removed owing to the lack of the thickness features. The average cortical thickness and cortical volume of each region are calculated and used as features.

2.2. Method

Schematic representation of our proposed method is provided in Figure 1. The method includes three steps: (1) Taking AD and NC subjects as auxiliary data, we learn two projection matrices. (2) The MCI subjects are mapped into subspace according to the projection matrices. (3) Self-weighted operation and weighted fusion are performed on the features in the subspace, and the biomarkers are extracted.

Figure 1.

Figure 1

Schematic representation of the proposed method.

2.2.1. Learning Projection Matrices Using Auxiliary Data

In this subsection, with AD and NC subjects as auxiliary data, we learn two projection matrices to map multivariate structural MRI data of regional cortical thickness and volume into a common label subspace, where the original data structure and the one-to-one correspondence between multiple variables are kept as much as possible. LetXCT = [x1CT, x2CT, ⋯, xnCT] ∈ ℝd×n and XCV = [x1CV, x2CV, ⋯, xnCV] ∈ ℝd×n denote the cortical thickness and cortical volume feature matrices, respectively, where n is the number of AD and NC subjects, and d is the number of feature dimensions. Let Y ∈ ℝn×c represent a class indicator matrix with 0-1 encoding, where c is the number of classes. To learn the two projection matrices Ud×c and Vd×c, the objective function is defined as follows:

minU,VQU,V=λlU,V+1λfU,V+αgU,V+βrU,V. (1)

The first term l(U, V) is the linear regression from the feature space to the label space, and it guarantees that samples are close to their labels after projection. l(U, V) is expressed as follows:

lU,V=YXCTTUF2+YXCVTVF2. (2)

The second term maintains the correlation between the CT features and CV features of the same image. It is well known that different morphological features of the same image reflect the same label information from different views. They should be close to each other after projection. Therefore, f(U, V) is defined as follows:

fU,V=XCTTUXCVTVF2. (3)

The third term g(U, V) is the graph regularization term, which is used to better exploit the local structural information of the data. We aim to preserve the neighborhood relationship between samples of single morphological feature. Here, we first introduce the graph regularization term for cortical thickness feature XCT. We define an undirected and symmetric graph GCT = (VCT, WCT), where VCT is a collection of samples in XCT and WCT represents the relations between samples. Each element wijCT in WCT is defined as follows:

wijCT=expxiCTxjCT22σ2,if xiCTNkxjCT,ij,0,otherwise, (4)

where Nk(xjCT) denotes the k-nearest neighbors of xjCT. Let ai denote the i-th column of UTXCT; then, the graph regularization term for cortical thickness data is formulated as follows:

gCTU=12i,j=1naiaj22wijCT=trUTXCTLCTXCTTU, (5)

where LCT = DCTWCT is the graph Laplacian matrix and DCT ∈ ℝn×n is a diagonal matrix with its diagonal elements DiiCT = ∑jwijCT.

Similarly, for the cortical volume data XCV, let bi denote the i-th column of VTXCV. The graph regularization term of volume data is formulated as follows:

gCVV=12i,j=1nbibj22wijCV=trVTXCVLCVXCVTV, (6)

where wijCV and LCV are defined as before. The final representation of the graph regularization term is then given by the following:

gU,V=gCTU+gCVV=trUTXCTLCTXCTTU+trVTXCVLCVXCVTV. (7)

The last term r(U, V) controls the scale of projection matrices and avoids overfitting:

rU,V=UF2+VF2. (8)

Besides, λ, α, and β are the three balancing parameters. Based on Equations (2), (3), (7), and (8), we can obtain the final objective function as follows:

minU,VQU,V=λYXCTTUF2+YXCVTVF2+1λXCTTUXCVTVF2+αtrUTXCTLCTXCTTU+trVTXCVLCVXCVTV+βUF2+VF2. (9)

2.2.2. Optimization Algorithm

Both U and V are initialized as zero matrices. We then iteratively update each variable by fixing another variable. By setting the partial derivative of Q(U, V) with respect to U and setting it to zero, we have the following:

U,VU=2λXCTXCTTUXCTY+21λXCTXCTTUXCTXCVTV+2αXCTLCTXCTTU+2βU=0. (10)

We can get the following:

U=XCTXCTT+αXCTLCTXCTT+βI1λXCTY+1λXCTXCVTV. (11)

Similarly, by fixing U and updating V, we can obtain the following:

V=XCVXCVT+αXCVLCVXCVT+βI1λXCVY+1λXCVXCTTU. (12)

The procedure of projection matrices learning with auxiliary data is described in Algorithm 1.

Algorithm 1.

Algorithm 1

Projection matrices learning algorithm based on auxiliary data.

2.2.3. Feature Extraction of MCI Subjects

Let ZCT = [z1CT, z2CT, ⋯, zmCT] ∈ ℝd×m and ZCV = [z1CV, z2CV, ⋯, zmCV] ∈ ℝd×m denote the cortical thickness and cortical volume feature matrices of the m images of MCI subjects, respectively. The feature representations of MCI subjects in subspace are denoted by FeaCT ∈ ℝm×c and FeaCV ∈ ℝm×c, which are computed as follows:

FeaCT=ZCTT×U, (13)
FeaCV=ZCVT×V. (14)

To make the projected features of pMCI and sMCI subjects are more discriminative, as well as balance the effectiveness of features from thickness and volume data, we perform self-weighted operation and weighted fusion on the features in subspace to obtain the final features. Finally, the biomarkers for MCI subjects are defined as follows:

Fea=ηFeaCTFeaCT+1ηFeaCVFeaCV, (15)

where η is the weight parameter. |FeaCT| represents the absolute values of all elements in matrix FeaCT.

3. Experiments and Results

We first evaluated the performance of the proposed biomarkers by carrying out pairwise classifications with three classifiers, i.e., decision tree classifier, support vector machine (SVM) with RBF kernel, and SVM with linear kernel. To verify the efficacy of the feature reduction, the proposed method was also compared with four commonly used feature reduction methods. Second, we compared the performance of the proposed biomarkers with that of state-of-the-art methods. Third, the effectiveness of learning projection matrices using AD and NC information was validated. Finally, the discrimination ability of the proposed biomarkers was illustrated. To make fair comparisons, we repeated 10-fold cross-validation 20 times to report the average results for each method. The10-fold cross-validation strategy partitioned all samples into 10 subsets, left one subset for testing and other subsets for training until each of the 10 subsets was tested. Four measures including accuracy (ACC), sensitivity (SEN), specificity (SPE), and area under the receiver operating characteristic curve (AUC) were used to comprehensively evaluate the performance for all methods. Moreover, to assess whether the differences between the two competing methods were statistically significant, paired t-tests at 95% significance level were performed on the classification accuracies of the 20 runs.

We conducted all the experiments under MATLAB R2016b. Specifically, the decision tree classifier was implemented based on the MATLAB build-in functions. SVM with RBF kernel and linear kernel were adopted from the LIBSVM toolbox [47] and LIBLINEAR toolbox [48], respectively. For the three balancing parameters in Equation (9), λ was tested in the range of {0.1, 0.2, ⋯, 0.9}, while the parameter α was tested at the logarithmic scale of 10i with i = {−3, −2, ⋯, 1}, and the parameter β was also determined at the logarithmic scale of 10j with j = {−1, 0, 1}. The value of nearest neighbors k was tested from the set of {3, 5, 7, 9, 11, 13, 15}. Besides, the parameter η in Equation (15) was determined in a specific range (η ∈ {q × 10−2, q × 10−1}, where q ∈ {1, 2, ⋯, 9}). Note that we also conducted the parameter optimization for each method in comparison to reach their best performance.

3.1. Evaluation of Classification Performance

In this subsection, we first compared the classification performance of the proposed biomarkers with that of global grading biomarker in [15], based on three different classifiers, i.e., decision tree classifier, SVM with RBF kernel, and SVM with linear kernel. In [15], elastic net was used to propagate the information of AD and NC subjects to the target MCI subject, and a global grading biomarker was extracted for each MCI subject. We used the same method as proposed in [15] but calculated the grading biomarkers based on regionwise features. The sparse coding process of elastic net [49] was implemented via SPAMS toolbox [50]. Table 2 demonstrates the group classification results of the proposed biomarkers and the global grading biomarker developed in [15], separately for the three classifiers. The classification performances of our proposed biomarkers were significantly better (p < 0.05) than that of global grading biomarker in [15] under decision tree classifier and SVM with linear kernel. There was no significant difference in the classification performance between the two competitive biomarkers using SVM with RBF kernel, although the classification accuracy, sensitivity, specificity, and AUC of the proposed biomarkers were slightly higher. In conclusion, the proposed biomarkers were superior to or at least as good as the global grading biomarker in [15] under different classifiers. The proposed biomarker achieved highest accuracy of 69.37% when using SVM classifier with linear kernel.

Table 2.

Classification results of two competing biomarkers using different classifiers.

Feature Classifier ACC (%) p value SEN (%) SPE (%) AUC
Global grading biomarker in [15] Decision tree classifier 64.72 0.0082 69.30 58.56 0.6097
Proposed biomarkers 66.35 70.97 60.33 0.6216
Global grading biomarker in [15] SVM classifier with RBF kernel 68.22 0.5169 83.35 48.03 0.6790
Proposed biomarkers 68.37 83.57 48.06 0.6800
Global grading biomarker in [15] SVM classifier with linear kernel 67.43 <0.0001 70.04 63.96 0.6981
Proposed biomarkers 69.37 75.39 61.23 0.6951

As mentioned above, the proposed method could reduce the feature dimensions and extract meaningful biomarkers. To verify its performance on dimensionality reduction, we further compared the proposed method with four commonly used feature reduction methods, i.e., minimum redundancy and maximum relevance (mRMR) [51], t-test, principal component analysis (PCA) [52], and ICA. The mRMR method selects features according to the minimum redundancy and maximum relevance criterion based on mutual information. t-test is one of the statistical hypothesis testing techniques, which has been successfully used for supervised feature selection in neuroimaging studies [53]. Both PCA and ICA are subspace learning methods. PCA captures most of the variance in the data by linearly transforming correlated features into a smaller number of uncorrelated features. ICA separates data into a set of independent and relevant features. We compared above feature reduction methods with the three aforementioned classifiers. The best number of features for each competing method was found by grid search optimization. As can be seen from Figure 2, the proposed method outperformed other feature reduction methods with all three classifiers. The proposed method improved the classification accuracy on average by 9.21%, 8.38%, 7.97%, and 6.43% compared to mRMR, t-test, PCA, and ICA, respectively.

Figure 2.

Figure 2

Comparison of different feature reduction methods.

3.2. Comparison with State-of-the-Art Methods

In this subsection, we compared the best classification performance of the proposed biomarkers with that of the feature extraction methods presented in [13, 15] on the same dataset. In [13], the MFN features were extracted, and then, the two-step feature selection mRMR and SVM-based recursive feature elimination (SVM-RFE) [54] were employed to find the optimal MFN feature subset. Finally, the SVM classifier with RBF kernel was used for pMCI and sMCI classification. In [15], grading biomarkers were calculated using elastic net technique, and then, the SVM classifier with linear kernel was used for classification. In order to show the validity of our feature extraction strategy, the original morphological features were also added for comparison using the same feature selection strategy and classifier as literature [13].

Table 3 summarizes the classification results of all competing methods. It is notable from Table 3 that all feature extraction methods outperformed the method of exploiting original morphological features in terms of ACC, SPE, and AUC, which implies that the extraction of effective features can improve classification performance. In virtue of subspace learning, our proposed method achieved the highest classification accuracy and sensitivity among all competing methods. Specifically, compared with the methods proposed in [13, 15], our method improved the classification accuracy by 3.76% and 1.94% and improved the sensitivity by 4.76% and 5.35%, respectively. Therefore, it is reasonable to integrate subspace learning into the feature extraction, which can enhance the classification power of the features.

Table 3.

Comparison of classification results of all competing methods.

Feature Classifier ACC (%) SEN (%) SPE (%) AUC
Original morphological features SVM classifier with RBF kernel 62.92 71.23 51.72 0.6508
MFN in [13] SVM classifier with RBF kernel 65.61 70.63 58.95 0.6670
Global grading biomarker in [15] SVM classifier with linear kernel 67.43 70.04 63.96 0.6981
Proposed biomarker SVM classifier with linear kernel 69.37 75.39 61.23 0.6951

The best parameter combination found by experiments was λ = 0.1, α = 0.1, β = 10, and η = 0.03. The numbers of nearest neighbors for cortical thickness and volume data were 11 and 3, respectively. For the classification of pMCI and sMCI, the class indicator c = 2.

3.3. Effectiveness of Learning Projection Matrices Using AD and NC Information

In this subsection, we examined the effectiveness of learning projection matrices using AD and NC data. For comparison, we learned projection matrices by virtue of pMCI and sMCI data. The same procedure of MCI feature extraction as Section 2.2 was conducted. Three different classifiers, i.e., decision tree classifier, SVM with RBF kernel, and SVM with linear kernel, were used for test in turn. We also conducted 10-fold cross-validation for 20 times to obtain the average results. To be specific, we randomly divided the MCI dataset into 10 subsets and then iteratively left one subset for testing and the remaining 9 subsets for training until each of the 10 subsets was validated. The two projection matrices were learned from the training subsets, and then, all the data of training subsets and testing subsets were projected from original space into the subspace by the two projection matrices. At last, the biomarkers were computed according to Equation (15). All the parameters of the competing methods were optimized in the same range as our proposed method.

Table 4 demonstrates the classification results of learning projection matrices using different data. Compared with pMCI and sMCI data, the projection matrices learned with AD and NC data obtained better classification performance no matter which classifier was used. In particular, compared to learning projection matrices using pMCI and sMCI data, the proposed method obtained significant improvements on the classification accuracy and sensitivity by 4.59% and 7.8% when using SVM classifier with linear kernel, respectively. These results confirmed the efficacy of adopting AD and NC data in the subspace learning in our method. Meanwhile, this also validated that the inclusion of AD and NC information is beneficial for the classification between pMCI and sMCI [14, 15, 17, 19, 21, 22, 55].

Table 4.

Classification performance of learning projection matrices using different data.

Data of learning projection matrices Classifier ACC (%) p value SEN (%) SPE (%) AUC
pMCI and sMCI data Decision tree classifier 60.89 <0.0001 67.22 52.27 0.5318
AD and NC data 66.35 70.97 60.33 0.6216
pMCI and sMCI data SVM classifier with RBF kernel 65.17 <0.0001 73.95 53.46 0.6495
AD and NC data 68.37 83.57 48.06 0.6800
pMCI and sMCI data SVM classifier with linear kernel 64.78 <0.0001 67.59 60.91 0.6628
AD and NC data 69.37 75.39 61.23 0.6951

3.4. Visualization

In this subsection, we illustrated the distributions of MCI samples in original morphological feature space and the projected subspace, respectively, to visually exhibit the distinguishing ability of different features. For the original morphological features, the PCA was applied to converting the original thickness and volume features to a number of uncorrelated features, respectively. Here, we employed the first principal component with the largest amount of variance for each type of morphological features and displayed the sample distribution in the two-dimensional space. In the original feature space (Figure 3(a)), it is clear to see that the distributions of pMCI and sMCI samples overlapped severely and samples in each class were scattered. Thus, the classification performance of the original features was very limited. In contrast, interclass distance of the pMCI and sMCI samples in the subspace is large while the intraclass distance is small (Figure 3(b)). Therefore, the proposed biomarkers derived from morphological features exhibited superiority over their original form; that is, our proposed biomarker extraction method was effective. Moreover, from Figures 3(c) and 3(d), we can see that the differences between pMCI and sMCI along the two dimensions in subspace were significant.

Figure 3.

Figure 3

Visualization of all MCI samples in original feature space and subspace. (a) The distributions of pMCI and sMCI samples in original morphological feature space. (b) The distributions of pMCI and sMCI samples in subspace. (c, d) The distribution of subject locations by group along the first and second dimensions, respectively.

4. Discussion

In this work, we presented a novel biomarker extraction method based on subspace learning for the prediction of MCI-to-AD conversion. The developed biomarkers outperformed the competing biomarkers on the discrimination between pMCI and sMCI subjects. Moreover, the improvement from the developed biomarkers was not limited to a particular classifier but worked equally well for three different classifiers. In a word, this work provided a promising biomarker for the early diagnosis of AD.

4.1. Effectiveness Analysis of the Proposed Method

The good performance of our proposed method can be attributed to three reasons: (1) Effective subspace learning. We have demonstrated that the MCI subjects in original morphological feature space were high-dimensional and severely overlapped with each other. Therefore, subspace learning methods mapped multivariate MRI data of MCI subjects into a common subspace with fewer dimensions, where they were much easier to be distinguished. Figure 3 clearly exhibits the efficacy of the space transformation. (2) The information of AD and NC subjects was employed. Compared with MCI subjects, the distances between intraclass samples are small while interclass samples are large for AD and NC subjects. Thus, it is easier to keep the neighborhood relationship between intraclass samples in subspace learning using AD and NC data. In addition, the utilization of AD and NC subjects instead of MCI subjects during subspace learning can avoid the double-dipping problem [56] in the classification of sMCI and pMCI. Therefore, it is reasonable to learn projection matrices using AD and NC data for MCI data, which was verified by the results in Table 4. (3) The self-weighted operation and weighted fusion were conducted. According to the projection matrices learned from AD and NC data, we mapped the thickness and volume data of MCI subjects into a common subspace. The feature representations of MCI subjects in the subspace, i.e., FeaCT and FeaCV, were obtained. After that, we conducted the self-weighted operation on FeaCT and FeaCV, to further amplify the differences between pMCI and sMCI. Although the cortical thickness and volume provided complementary information for the discrimination between pMCI and sMCI, the effect of them on classification is imbalanced; the more discriminative the morphological features are, the larger weights they should possess. Thus, we performed weighted fusion on thickness and volume-based features to obtain the final biomarkers. The results in Section 3 implied the effectiveness of our extracted biomarkers.

4.2. Influence of the Number of Auxiliary Data on Classification Accuracy

To study the influence of the number of auxiliary data on classification accuracy, we firstly used different numbers of auxiliary data to calculate the grading biomarker in [15] and the proposed biomarker, respectively, and then compared the differences of performance between them using SVM classifier with linear kernel. The number of auxiliary data varied from 50 to 250 with an increment of 50. For each specific number, we resampled the AD and NC subjects with the proportion of 1 : 1 for 10 times and calculated the average classification accuracy to avoid the sampling bias. The same procedure of 10-fold cross-validation and parameter optimization as Section 3 were conducted in the classification. The classification accuracies of two competing biomarkers with respect to different numbers of auxiliary data are illustrated in Figure 4. For comparison, we also plotted the classification accuracies of biomarkers computed by all auxiliary data. As shown in Figure 4, the classification performance of both two methods improves gradually with the increase of the number of auxiliary data, which verify the number of auxiliary data has an impact on classification performance of the biomarkers. In addition, the proposed biomarker outperforms the grading biomarker in [15] with different numbers of auxiliary data, which confirms the effectiveness of our proposed method.

Figure 4.

Figure 4

Classification accuracy of using different numbers of auxiliary data.

4.3. Limitations

There are several limitations that should be addressed in the future work. Firstly, in our work, the CCA was adopted to maintain the correlation between the thickness features and volume features of the same image. And the graph regularization term was used to preserve the neighborhood relationship of samples in the subspace. However, other subspace learning methods, such as ICA, LDA, and LLE, should be further explored and validated in the biomarker extraction. Secondly, to map the MCI subjects into subspace, we learned the projection matrices only using the information of AD and NC subjects. It remains to be explored whether the performance can be improved by integrating the information of AD, NC, and MCI subjects during the projection matrices learning process. Thirdly, the proposed method took advantage of the limited morphological features, i.e., thickness and volume. As a matter of fact, different morphological features could reflect abnormal alterations of the brain from different perspectives, so they might provide complementary information for the early recognition of disease. More morphologies such as surface area [57], gyrus height [58], and local gyrification index [59] could be adopted to improve the classification performance.

5. Conclusion

In this paper, we developed the novel biomarkers based on subspace learning and the information integration of AD and NC subjects, which found a good feature representation from high-dimensional MRI data for predicting conversion from MCI to AD. The extracted biomarkers exhibited promising performance on discrimination between pMCI and sMCI, which validated the effectiveness of our proposed method. In addition, experimental results showed that the subspace learning was effective approach for finding satisfactory biomarkers and the information integration of AD and NC subjects was beneficial for the prediction of MCI-to-AD conversion.

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China (Grant No. 2019YFA0706200), in part by the National Natural Science Foundation of China (Grant No. 61632014 and No. 61627808), in part by the Key Research and Development Program of Shandong Province (Grant No. 2019GGX101056), and in part by the Natural Science Foundation of Shandong Province (Grant No. ZR2019MG022). Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health, Grant U01 AG024904) and DOD ADNI (Department of Defense, award number W81XWH-12-2-0012).

Contributor Information

Huaxiang Zhang, Email: huaxzhang@163.com.

Bin Hu, Email: bh@lzu.edu.cn.

Data Availability

The data comes from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/).

Conflicts of Interest

The authors declare no competing interests.

References

  • 1.Prince M. J., Wimo A., Guerchet M. M., Ali G. C., Wu Y. T., Prina M. World Alzheimer Report 2015-The global impact of dementia: an analysis of prevalence, incidence, cost and trends. 2015
  • 2.Brookmeyer R., Johnson E., Ziegler‐Graham K., Arrighi H. M. Forecasting the global burden of Alzheimer’s disease. Alzheimers & Dementia. 2007;3(3):186–191. doi: 10.1016/j.jalz.2007.04.381. [DOI] [PubMed] [Google Scholar]
  • 3.Morris J. C., Cummings J. Mild cognitive impairment (MCI) represents early-stage Alzheimer’s disease. Journal of Alzheimers Disease. 2005;7(3):235–239. doi: 10.3233/JAD-2005-7306. [DOI] [PubMed] [Google Scholar]
  • 4.Ward A., Tardiff S., Dye C., Arrighi H. M. Rate of conversion from prodromal Alzheimer’s disease to Alzheimer’s dementia: a systematic review of the literature. Dementia & Geriatric Cognitive Disorders Extra. 2013;3(1):320–332. doi: 10.1159/000354370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Westman E., Simmons A., Zhang Y., et al. Multivariate analysis of MRI data for Alzheimer's disease, mild cognitive impairment and healthy controls. NeuroImage. 2011;54(2):1178–1187. doi: 10.1016/j.neuroimage.2010.08.044. [DOI] [PubMed] [Google Scholar]
  • 6.Cho Y., Seong J. K., Jeong Y., Shin S. Y., Alzheimer's Disease Neuroimaging Initiative Individual subject classification for Alzheimer's disease based on incremental learning using a spatial frequency representation of cortical thickness data. NeuroImage. 2012;59(3):2217–2230. doi: 10.1016/j.neuroimage.2011.09.085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Coupé P., Eskildsen S. F., Manjón J. V., Fonov V. S., Collins D. L., Alzheimer's Disease Neuroimaging Initiative Simultaneous segmentation and grading of anatomical structures for patient's classification: Application to Alzheimer's disease. NeuroImage. 2012;59(4):3736–3747. doi: 10.1016/j.neuroimage.2011.10.080. [DOI] [PubMed] [Google Scholar]
  • 8.Guerrero R., Wolz R., Rao A. W., Rueckert D. Manifold population modeling as a neuro-imaging biomarker: application to ADNI and ADNI-GO. NeuroImage. 2014;94(6):275–286. doi: 10.1016/j.neuroimage.2014.03.036. [DOI] [PubMed] [Google Scholar]
  • 9.Zheng W., Yao Z., Hu B., Gao X., Cai H., Moore P. Novel cortical thickness pattern for accurate detection of Alzheimer’s disease. Journal of Alzheimers Disease. 2015;48(4):995–1008. doi: 10.3233/JAD-150311. [DOI] [PubMed] [Google Scholar]
  • 10.Thung K. H., Wee C. Y., Yap P. T., Shen D. Identification of progressive mild cognitive impairment patients using incomplete longitudinal MRI scans. Brain Structure & Function. 2016;221(8):3979–3995. doi: 10.1007/s00429-015-1140-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rathore S., Habes M., Iftikhar M. A., Shacklett A., Davatzikos C. A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer's disease and its prodromal stages. NeuroImage. 2017;155:530–548. doi: 10.1016/j.neuroimage.2017.03.057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kim H. J., Shin J. H., Han C. E., et al. Using individualized brain network for analyzing structural covariance of the cerebral cortex in Alzheimer’s patients. Frontiers in Neuroscience. 2016;10:p. 394. doi: 10.3389/fnins.2016.00394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zheng W., Yao Z., Xie Y., Fan J., Hu B. Identification of Alzheimer's Disease and Mild Cognitive Impairment Using Networks Constructed Based on Multiple Morphological Brain Features. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. 2018;3(10):887–897. doi: 10.1016/j.bpsc.2018.06.004. [DOI] [PubMed] [Google Scholar]
  • 14.Moradi E., Pepe A., Gaser C., Huttunen H., Tohka J., Alzheimer's Disease Neuroimaging Initiative Machine learning framework for early MRI-based Alzheimer's conversion prediction in MCI subjects. NeuroImage. 2015;104:398–412. doi: 10.1016/j.neuroimage.2014.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Tong T., Gao Q., Guerrero R., et al. A novel grading biomarker for the prediction of conversion from mild cognitive impairment to Alzheimer’s disease. IEEE Transactions on Biomedical Engineering. 2017;64(1):155–165. doi: 10.1109/TBME.2016.2549363. [DOI] [PubMed] [Google Scholar]
  • 16.Fan Y., Batmanghelich N., Clark C. M., Davatzikos C., Alzheimer’s Disease Neuroimaging Initiative Spatial patterns of brain atrophy in MCI patients, identified via high-dimensional pattern classification, predict subsequent cognitive decline. NeuroImage. 2008;39(4):1731–1743. doi: 10.1016/j.neuroimage.2007.10.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Young J., Modat M., Cardoso M. J., et al. Accurate multimodal probabilistic prediction of conversion to Alzheimer’s disease in patients with mild cognitive impairment. NeuroImage Clinical. 2013;2(1):735–745. doi: 10.1016/j.nicl.2013.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ye D. H., Pohl K. M., Davatzikos C. Semi-supervised pattern classification: application to structural MRI of Alzheimer’s disease. 2011 International Workshop on Pattern Recognition in NeuroImaging; 2011; Seoul, Korea (South). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Filipovych R., Davatzikos C. Semi-supervised pattern classification of medical images: application to mild cognitive impairment (MCI) NeuroImage. 2011;55(3):1109–1119. doi: 10.1016/j.neuroimage.2010.12.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Batmanghelich K. N., Dong H. Y., Pohl K. M., Taskar B., Davatzikos C. Disease classification and prediction via semi-supervised dimensionality reduction. 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro; 2011; Chicago, IL, USA. pp. 1086–1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Cheng B., Liu M., Zhang D., Munsell B. C., Shen D. Domain transfer learning for MCI conversion prediction. IEEE Transactions on Biomedical Engineering. 2015;62(7):1805–1817. doi: 10.1109/TBME.2015.2404809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Coupé P., Eskildsen S. F., Manjón J. V., et al. Scoring by nonlocal image patch estimator for early detection of Alzheimer’s disease. Neuroimage Clinical. 2012;1(1):141–152. doi: 10.1016/j.nicl.2012.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Liu X., Tosun D., Weiner M. W., Schuff N., Alzheimer's Disease Neuroimaging Initiative Locally linear embedding (LLE) for MRI based Alzheimer's disease classification. NeuroImage. 2013;83:148–157. doi: 10.1016/j.neuroimage.2013.06.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Yan J., Zhang H., Sun J., et al. Joint graph regularization based modality-dependent cross-media retrieval. Multimedia Tools & Applications. 2017;77:3009–3027. [Google Scholar]
  • 25.Zhang M., Zhang H., Li J., Wang L., Fang Y., Sun J. Supervised graph regularization based cross media retrieval with intra and inter-class correlation. Journal of Visual Communication and Image Representation. 2019;58:1–11. [Google Scholar]
  • 26.Zhang M., Zhang H., Li J., Fang Y., Wang L., Shang F. Multi-modal graph regularization based class center discriminant analysis for cross modal retrieval. Multimedia Tools and Applications. 2019;78(19):28285–28307. doi: 10.1007/s11042-019-07909-2. [DOI] [Google Scholar]
  • 27.Ji Z., Li S., Pang Y. Fusion-attention network for person search with free-form natural language. Pattern Recognition Letters. 2018;116:205–211. doi: 10.1016/j.patrec.2018.10.020. [DOI] [Google Scholar]
  • 28.Zhan L., Zhou J., Wang Y., et al. Comparison of nine tractography algorithms for detecting abnormal structural brain networks in Alzheimer’s disease. Frontiers in Aging Neuroscience. 2015;7:p. 48. doi: 10.3389/fnagi.2015.00048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Sui J., Adali T., Yu Q., Chen J., Calhoun V. D. A review of multivariate methods for multimodal fusion of brain imaging data. Journal of Neuroscience Methods. 2012;204(1):68–81. doi: 10.1016/j.jneumeth.2011.10.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Liang C., Yu S., Luo J. Adaptive multi-view multi-label learning for identifying disease-associated candidate miRNAs. PLoS Computational Biology. 2019;15(4):p. e1006931. doi: 10.1371/journal.pcbi.1006931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hardoon D. R., Szedmak S., Shawe-Taylor J. Canonical correlation analysis: an overview with application to learning methods. Neural Computation. 2004;16(12):2639–2664. doi: 10.1162/0899766042321814. [DOI] [PubMed] [Google Scholar]
  • 32.Zhu X., Huang Z., Shen H. T., Cheng J., Xu C. Dimensionality reduction by mixed kernel canonical correlation analysis. Pattern Recognition. 2012;45(8):3003–3016. doi: 10.1016/j.patcog.2012.02.007. [DOI] [Google Scholar]
  • 33.Prasad P. S. Independent Component Analysis. Cambridge University Press; 2001. [Google Scholar]
  • 34.Liu K., Chen K., Yao L., Guo X. Prediction of mild cognitive impairment conversion using a combination of independent component analysis and the Cox model. Frontiers in Human Neuroscience. 2017;11:p. 33. doi: 10.3389/fnhum.2017.00033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Wold H. Partial least squares. Encyclopedia of Statistical Sciences; 2006. [Google Scholar]
  • 36.Zhu X., Li X., Zhang S. Block-row sparse multiview multilabel learning for image classification. IEEE Transactions on Cybernetics. 2016;46(2):450–461. doi: 10.1109/TCYB.2015.2403356. [DOI] [PubMed] [Google Scholar]
  • 37.He X., Cai D., Niyogi P. Laplacian score for feature selection. Advances in Neural Information Processing Systems. 2005;18:507–514. [Google Scholar]
  • 38.Zhu X., Suk H. I., Lee S. W., Shen D. Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification. IEEE Transactions on Biomedical Engineering. 2016;63(3):607–618. doi: 10.1109/TBME.2015.2466616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Fisher R. A. The use of multiple measurements in taxonomic problems. Annals of Eugenics. 1936;7(2):179–188. doi: 10.1111/j.1469-1809.1936.tb02137.x. [DOI] [Google Scholar]
  • 40.Roweis S. T., Saul L. K. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323–2326. doi: 10.1126/science.290.5500.2323. [DOI] [PubMed] [Google Scholar]
  • 41.Fischl B., Sereno M. I., Dale A. M. Cortical Surface-Based Analysis: II: Inflation, Flattening, and a Surface- Based Coordinate System. NeuroImage. 1999;9(2):195–207. doi: 10.1006/nimg.1998.0396. [DOI] [PubMed] [Google Scholar]
  • 42.Dale A. M., Fischl B., Sereno M. I. Cortical Surface-Based Analysis: I. Segmentation and Surface Reconstruction. NeuroImage. 1999;9(2):179–194. doi: 10.1006/nimg.1998.0395. [DOI] [PubMed] [Google Scholar]
  • 43.Fischl B., Liu A., Dale A. M. Automated manifold surgery: constructing geometrically accurate and topologically correct models of the human cerebral cortex. IEEE Transactions on Medical Imaging. 2001;20(1):70–80. doi: 10.1109/42.906426. [DOI] [PubMed] [Google Scholar]
  • 44.Fischl B., Dale A. M. Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proceedings of the National Academy of Sciences of the United States of America. 2000;97(20):11050–11055. doi: 10.1073/pnas.200033797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hogstrom L. J., Westlye L. T., Walhovd K. B., Fjell A. M. The structure of the cerebral cortex across adult life: age-related patterns of surface area, thickness, and gyrification. Cerebral Cortex. 2013;23(11):2521–2530. doi: 10.1093/cercor/bhs231. [DOI] [PubMed] [Google Scholar]
  • 46.Tzourio-Mazoyer N., Landeau B., Papathanassiou D., et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage. 2002;15(1):273–289. doi: 10.1006/nimg.2001.0978. [DOI] [PubMed] [Google Scholar]
  • 47.Chang C.-C., Lin C.-J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems Technology. 2011;2(3):p. 27. [Google Scholar]
  • 48.Fan R. E., Chang K. W., Hsieh C. J., Wang X. R., Lin C. J. LIBLINEAR: a library for large linear classification. The Journal of machine Learning research. 2008;9(9):1871–1874. [Google Scholar]
  • 49.Hui Z., Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. 2005;67(5):768–768. [Google Scholar]
  • 50.Mairal J., Bach F., Ponce J., Sapiro G. Online dictionary learning for sparse coding. Proceedings of the 26th Annual International Conference on Machine Learning; 2009; Montreal, Quebec, Canada. pp. 689–696. [Google Scholar]
  • 51.Peng H., Long F., Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis & Machine Intelligence. 2005;27(8):1226–1238. doi: 10.1109/TPAMI.2005.159. [DOI] [PubMed] [Google Scholar]
  • 52.Jolliffe I. T. Principal component analysis. Journal of Marketing Research. 2002;87(4):p. 513. [Google Scholar]
  • 53.Mwangi B., Tian S. T., Soares J. C. A review of feature reduction techniques in neuroimaging. Neuroinformatics. 2014;12(2):229–244. doi: 10.1007/s12021-013-9204-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Guyon I., Weston J., Barnhill S., Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning. 2002;46(1/3):389–422. doi: 10.1023/A:1012487302797. [DOI] [Google Scholar]
  • 55.Thung K. H., Wee C. Y., Yap P. T., Shen D. Neurodegenerative disease diagnosis using incomplete multi-modality data via matrix shrinkage and completion. NeuroImage. 2014;91(2):386–400. doi: 10.1016/j.neuroimage.2014.01.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Kriegeskorte N., Simmons W. K., Bellgowan P. S., Baker C. I. Circular analysis in systems neuroscience: the dangers of double dipping. Nature Neuroscience. 2009;12(5):535–540. doi: 10.1038/nn.2303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Rogers J., Kochunov P., Lancaster J., et al. Heritability of brain volume, surface area and shape: an MRI study in an extended pedigree of baboons. Human Brain Mapping. 2007;28(6):576–583. doi: 10.1002/hbm.20407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Destrieux C., Fischl B., Dale A., Halgren E. Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage. 2010;53(1):1–15. doi: 10.1016/j.neuroimage.2010.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Schaer M., Cuadra M. B., Tamarit L., Lazeyras F., Eliez S., Thiran J. P. A surface-based approach to quantify local cortical gyrification. IEEE Transactions on Medical Imaging. 2008;27(2):161–170. doi: 10.1109/TMI.2007.903576. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data comes from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu/).


Articles from BioMed Research International are provided here courtesy of Wiley

RESOURCES