Skip to main content
Human Brain Mapping logoLink to Human Brain Mapping
. 2017 Mar 27;38(6):3081–3097. doi: 10.1002/hbm.23575

Multi‐task diagnosis for autism spectrum disorders using multi‐modality features: A multi‐center study

Jun Wang 1,2, Qian Wang 3,, Jialin Peng 2, Dong Nie 2, Feng Zhao 2, Minjeong Kim 2, Han Zhang 2, Chong‐Yaw Wee 4, Shitong Wang 1, Dinggang Shen 2,5,
PMCID: PMC5427005  NIHMSID: NIHMS859091  PMID: 28345269

Abstract

Autism spectrum disorder (ASD) is a neurodevelopment disease characterized by impairment of social interaction, language, behavior, and cognitive functions. Up to now, many imaging‐based methods for ASD diagnosis have been developed. For example, one may extract abundant features from multi‐modality images and then derive a discriminant function to map the selected features toward the disease label. A lot of recent works, however, are limited to single imaging centers. To this end, we propose a novel multi‐modality multi‐center classification (M3CC) method for ASD diagnosis. We treat the classification of each imaging center as one task. By introducing the task‐task and modality‐modality regularizations, we solve the classification for all imaging centers simultaneously. Meanwhile, the optimal feature selection and the modeling of the discriminant functions can be jointly conducted for highly accurate diagnosis. Besides, we also present an efficient iterative optimization solution to our formulated problem and further investigate its convergence. Our comprehensive experiments on the ABIDE database show that our proposed method can significantly improve the performance of ASD diagnosis, compared to the existing methods. Hum Brain Mapp 38:3081–3097, 2017. © 2017 Wiley Periodicals, Inc.

Keywords: multitask learning, multi‐modality data, feature selection, task‐task relation, modality‐modality relation, autism spectrum disorders

INTRODUCTION

Autism spectrum disorder (ASD) is characterized as a syndrome of poor social communication abilities in combination with repetitive behaviors or restricted interest [Anagnostou and Taylor, 2011]. According to the data released by the Centers for Disease Control and Prevention, 1 out of 68 American children is affected by certain form of ASD. The prevalence has increased by 78% compared to one decade ago, which makes ASD a very important health issue and also a financial burden for both the family and the society. Thus, it is urgent to develop methods for precise early diagnosis of ASD.

The diagnosis of ASD is traditionally behavior‐based. Specifically, ASD is identified by comparing an individual's abnormal behaviors with other normal children of the same age [Lord and Jones, 2012]. As many behavioral phenotypes are associated with psychological and psychiatric disorders other than ASD, the behavior‐based diagnosis is not always effective. Nowadays, many neuroimaging tools have been widely applied to ASD diagnosis [Anagnostou and Taylor, 2011; Jin et al., 2015; Wang et al., 2016; Wee et al., 2014]. Among them, structural magnetic resonance imaging (MRI) is commonly used. Because of its powerful capability of detecting neuroanatomical abnormalities and capturing regional morphological features, structural MRI provides a feasible solution other than the traditional behavior‐based ASD diagnosis. Besides, ASD is found to be associated with disrupted brain connectivity patterns [Shi et al., 2013]. Thus, the resting‐state functional MRI (rs‐fMRI) can reveal brain network of functional activities and contribute to understanding the pathophysiology of the disease [Van Den Heuvel and Pol, 2010]. In general, MRI has become an important tool in clinical ASD diagnosis.

Structural MRI and rs‐fMRI provide different views for the same brain. It is necessary to fuse the two modalities for discovering the hidden evidences of ASD, which may not be available using only a single imaging modality. Therefore, we aim to combine these two modalities effectively and build multi‐modality classifiers for ASD diagnosis in this article. Previous studies have indicated that multi‐modality classifiers can achieve better performance than using a single modality in diagnosing degenerative neural disorders [Fan et al., 2008; Hinrichs et al., 2011; Liu et al., 2014; Vemuri et al., 2009; Walhovd et al., 2010; Zhang et al., 2011]. However, most of them are limited to the datasets acquired from a single imaging center, which does not always provide enough data to train an accurate diagnosis classifier. In practice, it is common to acquire images from multiple imaging centers and handle them simultaneously. As different centers may use different scanners and imaging parameters, it is hard to learn a classifier from the imaging data of one center and then directly apply it to another center. Meanwhile, although multi‐center data can be used for training the classifier jointly, the inconsistency across the images of multiple centers is still very challenging to resolve. To this end, it is necessary to develop a novel ASD diagnosis method for multi‐modality and multi‐center imaging data.

Regarding ASD diagnosis, the classifiers are expected to tag disease labels and also to regress out clinical scores, given the images of individual subjects. It is also important to identify the disease‐related features that contribute to not only accurate image‐based ASD diagnosis but also better understanding of the disease. For multi‐modality multi‐center data, selecting the optimal features is a challenging task because of the huge number of feature candidates and also the large variation across the data of multiple centers. Automated feature selection methods based on sparse learning have been adopted for related studies in the literature [Jie et al., 2015; Liu et al., 2015; Zhang and Shen, 2012; Zhang et al., 2015]. For example, Zhang and Shen proposed a multi‐task feature selection method under the framework of M3T [Zhang and Shen, 2012]. Liu et al. proposed the view‐centralized multi‐atlas feature selection method [Liu et al., 2015]. Jie et al. proposed a manifold regularized multi‐task feature learning method, which preserves both the intrinsic distribution of each image modality and the correlation across different modalities [Jie et al., 2015]. However, these methods rely on the carefully designed feature selection strategies, and suffer from the fact that the feature selection and the learning of the classifier are treated separately. In addition, they handle single‐center images only, leaving the question of tackling multi‐center data.

Recently, multi‐task learning has attracted increasing attention in neuroimaging [Jie et al., 2015; Wang et al., 2012, 2016; Zhang and Shen, 2012; Zhu et al., 2017]. The main goal of multi‐task learning is to encode the intrinsic relationship among different tasks, as the tasks under consideration (although seemingly independent) are highly related to each other. To this end, compared to the case of treating the tasks separately, multi‐task learning tends to learn multiple tasks simultaneously and can often obtain better results [Argyriou and Evgeniou, 2007; Argyriou et al., 2008; Jiang et al., 2015; Liu et al., 2009; Zhang et al., 2010]. Regarding multi‐task learning in neuroimaging, there are usually two scenarios: (1) Each image modality is regarded as the input to a certain task, such that multi‐modality data can be handled by multi‐task learning [Jie et al., 2015]; (2) Each response variable is regarded as the output of a certain learning task, thus multiple tasks can be learned altogether [Wang et al., 2012; Zhang and Shen, 2012; Zhu et al., 2017]. However, these works do not consider the important scenario of multiple center classification for multi‐modality data, which will be addressed in this work.

Taking advantages of the open‐access Autism Brain Imaging Data Exchange (ABIDE) database and the multi‐task learning technique, we aim to develop a multi‐modality multi‐center classification (M3CC in abbreviation) framework for ASD diagnosis in this work. Specifically, ASD classification for multiple imaging centers is solved by multi‐task learning, with each task corresponding to one imaging center. A small number of disease‐related features are selected from the large set of image‐based feature candidates following the sparse constraint, and then used for the classification of the disease. Meanwhile, the inconsistency between multiple modalities/centers is addressed in both feature selection and the modeling of the classifiers, as only the common disease‐related features are preserved.

The contributions of this article can be summarized as follows. (1) A classification method for ASD diagnosis is proposed to handle multi‐modality multi‐center neuroimaging data. With multi‐task learning and sparse feature selection, the task‐task and modality‐modality relations are considered for joint classification of multi‐modality multi‐center images. (2) An efficient iterative optimization scheme is further adopted to facilitate the study, regarding the convergence of the proposed method. (3) Quantitative evaluations are conducted on the multi‐center data of ABIDE, which yield satisfactory outcomes and also promising comparisons.

The rest of the article is organized as follows. Image Acquisition and Pre‐Processing section provides information on the image dataset, including data acquisition and preprocessing. Method section details the proposed classification method for ASD diagnosis. Experiments section gives the experimental results. Conclusions section draws the conclusion of the article.

IMAGE ACQUISITION AND PRE‐PROCESSING

In this study, we consider both T1‐weighted MRI and rs‐fMRI scans acquired from multiple imaging centers. The classification for the images of a specific center is treated as one task. Multiple tasks are combined together to provide multi‐modality multi‐center classification capability. The original images and the demographic data were collected from ABIDE. All subjects included in our study are under 15‐year old and recruited from four imaging centers, that is, NYU, STANDFORD, UM_1, and YALE. Note that different centers in ABIDE acquired images with different scanners and imaging parameters. Table 1 summarizes the demographic information and scanning parameters of the data used in this study.

Table 1.

The demographic information and acquisition parameters of multi‐modality multi‐center images used in this article

Center NYU STANDFORD UM_1 YALE
Male/female 88/24 32/8 49/16 31/11
Age 11.0484 ± 2.28 9.9581 ± 1.58 11.6667 ± 1.65 11.5276 ± 2.20
Patients/controls 54/58 20/20 34/31 20/22
T1‐weighted MRI
Sequence 3D MPRAGE 3D SPGR 3D SPGR 3D MPRAGE
Make (model) Siemens Magnetom (Allegra) GE (Signa) GE (Signa) Siemens Magnetom (TrioTim)
Voxel size (mm3) 1.3 × 1 × 1.3 0.86 × 1.5 × 0.86 1.2 × 1 × 1 1 × 1 × 1.2
Flip angle (deg) 7 15 15 9
TR (ms) 2530 8.4 500 2300
TE (ms) 3.25 1.8 1.8 2.91
TI (ms) 1100 NA NA 624
Bandwidth (Hz/Px) 200 NA 15.63 240
rs‐fMRI
Make (model) Siemens Magnetom (Allegra) GE (Signa) GE (Signa) Siemens Magnetom (TrioTim)
Voxel size (mm3) 3.0 × 3.0 × 4.0 3.125 × 3.125 × 4.5 3.438 × 3.438 × 3.0 3.4 × 3.4 × 4.0
Flip angle (deg) 90 80 90 60
TR (ms) 2000 2 2000 2000
TE (ms) 15 30 30 25
Bandwidth (Hz/Px) 3906 NA NA 2520

Figure 1 shows the preprocessing of the multi‐modality data. Regional morphological features were extracted from the T1‐weighted MRI in an automated manner using the standard FreeSurfer pipeline (which is effective in performing volumetric segmentation and cortical surface reconstruction). We used multiple atlases with different sets of regions of interest (ROIs) to extract abundant features. In particular, the cerebral cortical gray matter (GM) volumes, subcortical white matter (WM) volumes, and mean cortical thickness measures were extracted for the ROIs of the Desikan‐Killiany cortical atlas [Desikan et al., 2006]. The subcortical structure volumes (SSV) were also extracted with the subcortical structural atlas in FreeSurfer [Fischl et al., 2002]. Besides, the volumes and thickness measures of the Brodmann areas (including BA1, BA2, BA3a, BA3b, BA4a, BA4p, BA6, BA44, BA45, V1, V2, MT, and entorhinal_exvivo) were also extracted for both hemispheres. Totally, there were 303 regional morphological features for each T1 MR image in our study.

Figure 1.

Figure 1

Preprocessing of multi‐modality imaging data, such as T1‐weighted MRI and rs‐MRI. [Color figure can be viewed at http://wileyonlinelibrary.com]

We further extracted features from rs‐fMRI using the standard pipeline provided by ABIDE with AFNI (https://afni.nimh.nih.gov/afni/). Specifically, the first 10 acquired rs‐fMRI volumes of each subject were discarded before any further processing. Then, slice timing and head motion correction were performed. All rs‐fMRI images were normalized to the MNI space of the resolution 3 × 3 × 3 mm3. Nuisance variable regression was further conducted [Friston et al., 1996]. The resulted rs‐fMRI images were parcellated into 116 ROIs according to the Automated Anatomical Labeling (AAL) template [Tzourio‐Mazoyer et al., 2002]. The band‐pass filtering (0.005–0.1Hz) was applied to the rs‐fMRI time series of each ROI. After that, scrubbing was further performed, and the volumes of the time points of equal to or larger than 0.5 mm displacement were removed. Also, two volumes before and one volume after the time point with excessive motion were removed. Finally, the subjects with fewer than three volumes left after scrubbing were excluded. To measure functional connectivity between ROIs, the pairwise Pearson correlation coefficients were computed to yield the values between −1 and 1 for individual ROI pairs under consideration. The above processing resulted in a 116 × 116 correlation matrix for each subject. As the matrix is symmetric, we treat correlation measures in the upper triangle of the matrix as the inter‐regional features.

METHOD

Figure 2 shows the schematic diagram of the proposed method. The regional morphological features and the inter‐regional functional features are extracted from T1‐weighted MRI and rs‐fMRI, respectively. In this way, multi‐modality representations can be generated for the subjects coming from multiple centers. Such representations of the subjects are fed into the proposed M3CC method. In the training stage, the task‐task relation and the modality‐modality relation are incorporated to construct the classifiers. Specifically, the classifiers, which correspond to individual tasks of multiple imaging centers, are trained jointly. At the same time, the common disease‐related features are selected across different imaging centers. In the testing stage, the features of the testing subject from a specific imaging center are obtained and fed into the corresponding classifier, which yields the diagnosis for the testing subject under consideration.

Figure 2.

Figure 2

Schematic overview of the proposed framework. [Color figure can be viewed at http://wileyonlinelibrary.com]

The notations used in this article are summarized as follows. We denote matrices with boldface uppercase letters, vectors with boldface lowercase letters, and scalars with normal italic letters, respectively. Specifically, we denote the identity matrix as I, and its i th column vector as ei where the i th element is 1 and others are all zeros. Obviously, we can obtain the i th row of a matrix X by the operation ei'X, and the j th column of X by the operation Xej. We further denote the transpose operator, the trace operator, and the inverse of a matrix X as X', tr(X), and X1, respectively. We also use vec(X) to vectorize the matrix X to a single column vector.

Multi‐Task Learning for Multi‐Center Disease Classification

Assume that there are T centers and thus T supervised learning tasks for the feature data with M modalities. Each task corresponds to the classification for a specific imaging center. Denote Nt as the number of the subjects (in the t‐th center) for the t‐th task, Dm as the number of the features for the m‐th modality, Xtm Nt×Dm as the training data matrix in the t‐th task for the m‐th modality data, and yt=[y1,y2,,yNt]' Nt as the vector of the training labels (e.g., “−1” for patients and “1” for healthy controls). Let wtm Dm parameterize a linear discriminant function for the m‐th modality of the t‐th task.

Our goal is to learn a specific classifier that is capable of diagnosing subjects from each imaging center, while the learning of all classifiers is jointly conducted following the multi‐task way. Traditionally, the classifiers of different centers are learned separately. That is, the supervised learning on each center is treated as an independent task. Equation (1) formulates T learning tasks, each of which corresponds to an independent center:

minwt(ytXtwt22+γwt1), t=1, 2,,T, (1)

where D is the number of features,  Xt Nt×D denotes the single modality data in the t‐th task, and  wt D parameterizes a linear discriminant function of the t‐th task. In Eq. (1), yt Xt wt22 is the loss function measuring the fitness of the learned model on the training data, wt1 is the l 1‐norm regularization term for feature selection, and γ>0 is the parameter trading‐off between the loss function and the regularization term. Note that this strategy solves T optimization problems separately and ignores the relations between them.

To fully utilize the relationship among tasks, the classifiers of different tasks should be learned jointly, which can be regarded as a multi‐task learning problem. The multi‐task learning process with sparse feature selection can then be formulated as the following optimization problem:

minW(t=1TytXtwt22+γW2,1) (2)

where W=[w1,w2,,wT] D×T. Each column in W is the coefficient vector for classifying the subjects of the specific task. In the same row, the coefficients record the contributions of the same feature toward different tasks. Here, W2,1 is the l 2,1‐norm of the matrix W. It ensures that a small number of features are jointly selected for all imaging centers. At the same time, the coefficients are encouraged to be similar across different centers for joint feature selection. The parameter γ is a regularization parameter, such that larger γ leads to fewer features selected for classification.

Task‐Task Regularization

Although the l 2,1‐norm of the matrix W guides the joint selection of the common features across different tasks, the task‐task relationship might still be easily ignored. It is reasonable to presume that the respective coefficients for two imaging centers are highly correlated. To this end, we devise the task‐task regularization to assure that, if the data from two centers are closely related, their corresponding coefficient vectors should also be similar. As we regard the classification problem of each imaging center as one task, the task‐task relation can be formulated as follows:

Rtasktask(W)=12ijTgi,jwiwj22 (3)

where gi,j is an element in G T×T which encodes the relationship between all pairs of tasks. The task similarity matrix G can be determined from the training data. In particular, we follow:

gi,j=exp(2x¯ix¯j22/σ2) (4)

where x¯i is the mean vector of the training data in the i‐th task and σ2=i=1nj=1nx¯ix¯j22/T2.

With the task‐task regularization appended, the classifiers for different imaging centers can be simultaneously learned as the modeling of one individual classifier may contribute to better performances of other classifiers. That is, by incorporating the task‐task relation into Eq. (2), we have the following model:

minW(t=1TytXtwt22+γW2,1+ηijTgi,jwiwj22) (5)

Modality‐Modality Regularization

By further considering the multi‐modality settings, Eq. (5) can be further rewritten as:

minW(t=1Tytm=1MXtmwtm22+γm=1MWm2,1+ηijTm=1Mgi,jmwimwjm22) (6)

where Wm=[w1m,w2m,,wTm] Dm×T. Also, we have

gi,jm=exp(2x¯imx¯jm22/σm2) (7)

where x¯im is the mean vector of the training subjects in the i‐th task for the m‐th modality and σm2=i=1Tj=1Tx¯imx¯jm22/T2.

Apparently, Eq. (6) does not take into account the high variation across the features of multiple modalities. To utilize the complementary information of multiple modalities, we consider the modality‐modality relation in this article as well. Given a subject, the discriminant functions for different modalities tend to yield the identical label for the subject. That is,

Rmodalitymodality(W)=t=1Tp,q=1MXtpwtpXtqwtq22 (8)

should be minimized. By including Eq. (8) into Eq. (6), we can get the final M3CC model as:

minW(t=1Tytm=1MXtmwtm22+γm=1MWm2,1+ηijTm=1Mgi,jmwimwjm22+θt=1Tp,q=1MXtpwtpXtqwtq22) (9)

where γ,  η, and θ control the respective regularization terms. Figure 3 illustrates Xtm, wtm, and yt of Eq. (9) when T = 4 and M = 2.

Figure 3.

Figure 3

An exemplar illuatration of the data used in our method. [Color figure can be viewed at http://wileyonlinelibrary.com]

When there is a new test subject xt from the t‐th task and its features of M modalities are available, the classification result is given by:

f(xt)=sign(m=1Mxtmwtm)

Solution to the Objective Function

Let Λm (m=1,,M) be a Dm×Dm diagonal matrix with the i‐th diagonal element λiim computed as

λiim=12(ei)Wm2 (10)

According to [Nie et al., 2010], the solution to minWW2,1 is identical to the solution to minWtr(WΛW) and, accordingly, optimizing minWW2,1 can be transferred to iteratively optimizing Λ and W. Therefore, minimizing the objective function in Eq. (9) can be attained by optimizing Λm and Wm in the following Eq. (11):

minW(t=1Tytm=1MXtmwtm22+ηijTm=1Mgi,jmwimwjm2+θt=1Tp,q=1MXtpwtpXtqwtq2+γm=1Mtr((Wm)'ΛmWm)) (11)

As Wm and Λm are coupled, it is non‐trivial to find the optimal solution of Eq. (11). To this end, we apply an iterative approach by alternatively optimizing Λm and Wm. That is, we first update the matrix Λm when fixing Wm, and then update the matrix Wm when fixing Λm. The algorithm is summarized as follows:

Algorithm 1. Multi‐modality multi‐center classification (M3CC).

Input: Xtm, m=1,,M, t=1,,T; yt, t=1,,T; γ;  η; θ; ε

Output:  W

  1. t=0, and initialize wtm to ( m=1,,M, t=1,,T) to generate the initial W with random values;

  2. while true

  3. t= t+1;

  4. Compute Λm(t) using Eq. (10);

  5. Compute Ptm and P;
    Ptm=2θM(Xtm)'Xtm+γΛm
    P=(P10D×D0D×D0D×DP20D×D0D×D0D×DPT),Pt=(Pt10Pt20PtM), (12)
  6. Compute Stm and S;
    Stm=(Xtm)'2θ(Xtm)'=(12θ)(Xtm)'
    S=(S10D×D0D×D0D×DS20D×D0D×D0D×DST),St=(St1Xt1St1Xt2St1XtMSt2Xt1St2Xt2St2XtMStMXt1StMXt2StMXtM), (13)
  7. Compute qtm and Q ( LGm is the Laplacian matrix of Gm);
    qtm=2η(LGmet)
    Q=(Q11Q12Q1TQ21Q22Q2TQT1QT2QTT),Qij=((qi1)jI00(qi2)jI0(qiM)jI) (14)
  8. Compute rtm and r;
    rtm=(Xtm)'yt
    R=(r11r22rT1r11r22rT2r1Mr2MrTM),r=vec(R) (15)
  9. Compute W with vecW=(P+S+Q)1r;

  10. Compute JWt with Eq. (9);

  11. if JWtJWt1>ε, continue; else, abort;

  12. end

In Algorithm 1, both multiple tasks and multiple modalities are considered together in the objective function, which makes our method unique from other existing ones, such as [Zhu et al., 2017]. More details of our algorithm, including the derivation of the solution and the proof of its convergence, are provided in Appendix. Note that, P, Q, and S are the block matrices with TM×TM blocks. In our experiments ( T = 4 and M = 2), it only takes <2 s to compute (P+S+Q)1r by a computer with Intel i5‐3230M CPU and 4 GB memory on MATLAB R2015b platform.

Difference from Existing Works

It is worth noting that there have been several previous studies on multi‐modality multi‐task learning in neuroimaging. Zhang and Shen have proposed a multi‐modal multi‐task (M3T) learning method with two separate steps, that is, multi‐task feature selection and multi‐modal classification with support vector machine, to predict multiple response variables (e.g., clinical scores) from multi‐modality image data [Zhang and Shen, 2012]. Our M3CC method is different from M3T, although both of them fall in the category of multi‐modality multi‐task diagnosis methods for neural disorders. First, in M3T, the learning of each response variable is treated as one task and the numbers of the subjects should be identical for all tasks. In M3CC, the classification of each imaging center is treated as one task and the numbers of the subjects can vary across tasks, which makes our method more flexible than M3T. Second, the multi‐task feature selection and multi‐modality classification in M3T are treated separately. The connection between feature selection and classification is basically ignored. In our proposed M3CC method, the feature selection and classification are integrated into a unified learning process, in which they interact with each other for better overall performance. That is, the selected features are optimal for the classifiers, and the classifiers are tuned to the best for the selected features. Third, the center‐center relation is not considered in M3T. In M3CC, both the center‐center and the modality‐modality relations are integrated seamlessly, so that both tasks and modalities become complementary to contribute toward feature selection and classification simultaneously.

Note that Wang et al. proposed a sparse multi‐task learning model to identify disease sensitive and quantitative trait‐relevant biomarkers from multi‐modality data [Wang et al., 2012]. It regards the disease classification and cognitive score regression as different learning tasks and thus requires the same number of subjects in each task. To this end, it is infeasible to be directly applied to the multi‐center data as studied in this work. Moreover, neither center‐center nor modality‐modality relation was considered in Wang et al. All these characteristics make their method quite different from M3CC proposed in this work.

EXPERIMENTS

Experimental Settings

In our study, ASD diagnosis was performed based on the regional morphological features and inter‐regional functional features, which were extracted from T1‐weighted MRI and rs‐fMRI, respectively. All the regional morphological features were concatenated into the feature vector with 303 elements. For inter‐regional functional features, only the upper triangle of the correlation matrix was utilized due to the symmetry of the matrix. These measures were reshaped into the vector with 6,670 elements. Prior to training, simple feature selection was conducted to the functional inter‐regional measures on the training set. Specifically, 400 functional features related with rs‐fMRI were selected according to the correlation between features and labels. All features (303 regional features and 400 inter‐regional features) were further normalized regarding the z‐score and used for subsequent evaluation. In the testing stage, the identical 400 selected features were also applied on the testing data.

We considered ASD diagnosis as the binary classification problem with ASD patients labeled as “−1” and healthy controls labeled as “1.” A nested cross‐validation was adopted to evaluate the performances of different methods [Meng et al., 2017]. Specifically, the subjects of each imaging center were randomly divided into 10 disjoint subsets. Then one subset was selected in turn from each center to generate the test set. The remaining subsets were utilized for training. In the training procedure of each fold, another round of cross‐validation was performed to find the optimal parameters. In this way, the training procedure generated the classifiers with the optimal parameters that are corresponding to individual imaging centers. In the testing stage, each test subject utilized a respective classifier, which was trained for the corresponding imaging center, to estimate the diagnosis result. To avoid the bias caused by the fold selection, we further repeated the nested cross‐validation for 20 times. The statistics of all 20 repetitions were finally reported.

To quantitatively evaluate the performances of all competing methods, we used the metrics of Accuracy (ACC), Sensitivity (SEN), and Specificity (SPE) in the next. Letting TP, TN, FP, and FN denote true positive, true negative, false positive, and false negative, respectively, then the ACC, SEN, and SPE can be defined as:

ACC=TP+TNTP+TN+FP+FN (16)
SEN=TPTP+FN (17)
SPE=TNTN+FP (18)

Summary of Competing Methods

We conducted comprehensive experiments to validate the effectiveness of our M3CC. Table 2 summarizes the methods under comparisons. In Comparisons with State‐of‐the‐Art Methods section, we report the results of M3CC compared with state‐of‐the‐art methods. Note that some existing multi‐task neuroimaging methods (such as M3T [Zhang and Shen, 2012] and the relational regularization feature selection method [Zhu et al., 2017]) cannot be directly applied to our case, as they suffer from the requirement of equal numbers of subjects across different imaging centers. Specifically, we considered the following experimental settings. (1) To show the advantage of utilizing multi‐center image data, we performed CSVC [Chang and Lin, 2017] to the data of each imaging center independently (denoted as CSVC‐1). We also treated the data from four imaging centers together and applied CSVC (denoted as CSVC‐4). (2) One of the main contributions in this work is to perform feature selection and classification jointly. To this end, we compared our method to the direct classification without feature selection. The classification was done by M2SVC [Zhang et al., 2011] with all T1 and rs‐fMRI features utilized. We also compared with the sequential scheme of “first feature selection and then classification.” To this end, we integrated both the regional morphological features and inter‐regional functional features. Then, we utilized the function lasso() in Matlab for feature selection. Notice that, although LASSO provides both classification and feature selection capabilities, we only utilized its feature selection capability in our experiment to observe its contribution. After that, LibSVM implementation of CSVC procedure was utilized for classification. This strategy was denoted as “LASSO + CSVC.” (3) We compared the proposed method with other popular multi‐task learning algorithms, including LeastL21 in MALSAR package [Zhou et al., 2011]. (4) We further compared our proposed method with Yahata's method [Yahata et al., 2016], which is a representative ASD classifier using a small number of brain connections. This method was denoted as “SCCA + LR.”

Table 2.

Summary of the methods for comparison

Method Description
M3CC The proposed M3CC method on multi‐modality multi‐center data. Totally, 303 regional morphological features and (roughly selected) 400 inter‐regional functional features were utilized.
M3CC ( η = 0, θ = 0) M3CC with η = 0 and θ = 0, such that the task‐task and modality‐modality regularizations were disabled.
CSVC‐1 C‐SVC in LibSVM. The image data from each center were treated separately. Linear kernel was adopted in CSVC.
CSVC‐4 C‐SVC in LibSVM. The image data from four centers were combined and treated together. Linear kernel was adopted in CSVC.
M2SVC Multi‐modal classification proposed by Zhang et al. [2011]. The image data from all centers were utilized together. No feature selection was involved.
LASSO+CSVC All features were selected with LASSO. After that, the LibSVM implementation of C‐SVC was used for classification.
LeastL21 The features of all modalities were first concatenated to generate a long feature vector. After that, the multiple task feature learning algorithm in the package of MALSAR1.1 was called with the function Least_L21(). Each image center corresponded to one task.
SCCA+LR Run Yahata's method on multi‐modality data collected from multiple centers. According to Yahata et al. [2016], L1‐regularized sparse canonical correlation analysis was utilized to identify a small number of features and logistic regression utilized as classifiers.
SMMCC (SSV) Run SMMCC on features of subcortical structure volumes.
SMMCC (GMV) Run SMMCC on features of cortical GM volumes.
SMMCC (GMT) Run SMMCC on features of mean cortical thickness.
SMMCC (WMV) Run SMMCC on features of subcortical WM volumes.
SMMCC (BAV) Run SMMCC on features of Brodmann area volumes.
SMMCC (BAT) Run SMMCC on features of Brodmann area thickness.
SMMCC (T1) Concatenate all the regional morphological features and run SMMCC on them.
SMMCC (fMRI) Extract 400 features from rs‐fMRI features and run SMMCC on them.
MMSCC‐1 Run MMSCC on each imaging center individually.
MMSCC‐4 Run MMSCC on four imaging centers jointly.

In Comparisons with SMMCC section and Comparisons with MMSCC section, we investigated the superiority of M3CC over its single modality version and single task version, respectively. In this way, we further demonstrate the importance of incorporating multi‐modal and multi‐center image data for ASD diagnosis. Specifically, we derive single modality multi‐center classification (SMMCC; for single modality) and multi‐modality single center classification (MMSCC; for single center) from M3CC and show their performances regarding various inputs.

Comparisons with State‐of‐the‐Art Methods

Tables 3, 4, 5, 6 show the overall classification performances of the compared methods (including ACC, SEN, SPE, and AUC) after 20 repetitions of the 10‐fold cross‐validation. To confirm whether our method performed statistically better than the comparison methods, we also performed two‐sample t‐tests on the classification accuracies achieved by our method and other methods, with the corresponding P‐values reported in these tables. The best results are in bold in these tables.

Table 3.

Classification results for NYU dataset

Methods ACC SEN SPE AUC P‐value
CSVC‐1 0.7348 0.7083 0.7595 0.7339 <1 e – 4
CSVC‐4 0.7194 0.7707 0.6716 0.7212 <1 e – 4
M2SVC 0.7076 0.7824 0.6379 0.7102 <1 e – 4
LeastL21 0.6857 0.7000 0.6724 0.6862 <1 e – 4
LASSO + CSVC 0.7170 0.7565 0.6802 0.7183 <1 e – 4
LASSO + M2SVC 0.7254 0.7509 0.7017 0.7263 <1 e – 4
SCCA + LR 0.6607 0.7069 0.6111 0.7554 <1 e – 4
M3CC (η = 0, θ = 0) 0.6813 0.6565 0.7043 0.6804 <1 e – 4
M3CC 0.7651 0.7846 0.7469 0.7658

Table 4.

Classification results for STANDFORD dataset

Methods ACC SEN SPE AUC P‐value
CSVC‐1 0.5062 0.6400 0.3725 0.5062 <1 e – 4
CSVC‐4 0.5821 0.6429 0.5214 0.5821 <1 e – 4
M2SVC 0.6137 0.6525 0.5750 0.6137 <1 e – 4
LeastL21 0.5563 0.5975 0.5150 0.5563 <1 e – 4
Lasso + CSVC 0.6225 0.5950 0.6500 0.6225 <1 e – 4
Lasso + M2SVC 0.6287 0.6225 0.6350 0.6287 <1 e – 4
SCCA + LR 0.5250 0.6500 0.4000 0.6175 <1 e – 4
M3CC (η = 0, θ = 0) 0.5250 0.4875 0.5625 0.5250 <1 e – 4
M3CC 0.6826 0.6695 0.6958 0.6826

Table 5.

Classification results for UM‐1 dataset

Methods ACC SEN SPE AUC P‐value
CSVC‐1 0.5069 0.6750 0.3226 0.4988 <1 e – 4
CSVC‐4 0.6344 0.6863 0.5776 0.6319 <1 e – 4
M2SVC 0.6408 0.6941 0.5823 0.6382 <1 e – 4
LeastL21 0.5369 0.6618 0.4000 0.5309 <1 e – 4
Lasso + CSVC 0.6469 0.6588 0.6339 0.6463 <1 e – 4
Lasso + M2SVC 0.6615 0.6985 0.6210 0.6597 <1 e – 4
SCCA + LR 0.6462 0.7097 0.5882 0.6148 <1 e – 4
M3CC (η = 0, θ = 0) 0.5369 0.6647 0.3968 0.5307 <1 e – 4
M3CC 0.6840 0.6794 0.6889 0.6842

Table 6.

Classification results for YALE dataset

Methods ACC SEN SPE AUC P‐value
CSVC‐1 0.5250 0.4675 0.5773 0.5224 <1 e – 4
CSVC‐4 0.6122 0.6024 0.6212 0.6118 <1 e – 4
M2SVC 0.6464 0.6650 0.6295 0.6473 0.0003
LeastL21 0.5524 0.5500 0.5545 0.5523 <1 e – 4
Lasso + CSVC 0.6452 0.6225 0.6659 0.6442 0.0005
Lasso + M2SVC 0.6619 0.6700 0.6545 0.6623 0.0385
SCCA + LR 0.5476 0.5909 0.5000 0.5614 <1 e – 4
M3CC (η = 0, θ = 0) 0.5393 0.3475 0.7136 0.5306 <1 e – 4
M3CC 0.6704 0.6800 0.6616 0.6708

It can be observed from these results that our M3CC can achieve the mean classification accuracies of 0.7651, 0.6826, 0.6840, and 0.6704 on NYU, STANFARD, UM‐1, and YALE, respectively, which are much better than its rivals on the same tasks. Both CSVC‐4 and LASSO + CSVC are the kernel methods based on LibSVM [Chang and Lin, 2017]. CSVC‐4 performed CSVC on the data from four imaging centers together without feature selection, while LASSO + CSVC selected features with LASSO before CSVC was called. Comparing their performances, we observe that the LASSO‐based feature selection helped improve classification results for CSVC. Different from both methods aforementioned, M3CC integrated feature selection into learning and obtained much better results than them. The results demonstrated the superiority of joint feature selection and classification in M3CC. Conversely, CSVC‐1, CSVC‐4, LASSO + CSVC, and LeastL21 treated multi‐modal data together yet ignored the modality‐modality relationship. While, M3CC treated multiple modalities jointly by introducing modality‐modality regularization, thus outperforming its rivals accordingly.

Comparisons with SMMCC

To demonstrate the superiority of utilizing multi‐modality data on ASD diagnosis, we ran M3CC on each individual modality data only. For ease of description, we use the abbreviation “SMMCC” to denote the special case of M3CC with M = 1 and θ = 0.

We ran SMMCC on different types of regional morphological features, including SSV, cortical gray matter volumes (GMV), mean cortical thickness (GMT), subcortical white matter volumes (WMV), Brodmann area volumes (BAV), and Brodmann area thickness (BAT). We also concatenated all regional morphological features and fed them into SMMCC(T1). Meanwhile, we used400 functional features from rs‐fMRI and ran SMMCC (fMRI).

Figure 4 summarizes the classification accuracy of SMMCC on each individual modality, compared with M3CC on multi‐modality data. One can observe that M3CC always achieved the best classification accuracy, which fully proves the superiority of utilizing multi‐modality data in ASD diagnosis. Meanwhile, SMMCC with different types of regional morphological features obtained comparable results. Specifically, SMMCC(T1), which utilized all types of regional morphological features, always achieved better results than using a single type of regional morphological features. This observation implied that different types of regional morphological features provide complementary information and help improve the accuracy of ASD diagnosis.

Figure 4.

Figure 4

Classification accuracies for M3CC and SMMCC. [Color figure can be viewed at http://wileyonlinelibrary.com]

Comparisons with MMSCC

In this subsection, we further investigated how multi‐center images contribute to better ASD diagnosis. We use the abbreviation “MMSCC” to denote the special case of M3CC when T = 1 and η = 0. In the experiment, we fed the multi‐modality data of each imaging center into MMSCC‐1 independently, that is, to train and test the classifier for each individual center. We also fed the multi‐modality data from all four imaging centers into MMSCC‐4, without making any difference between them or considering their relationship. In this way, the four imaging centers shared the same classifier.

Figure 5 summarizes the classification accuracies of MMSCC and compares it with M3CC. We can observe that M3CC outperformed MMSCC for all imaging centers. Although both MMSCC‐4 and M3CC were trained with the multi‐modality data acquired from all four imaging centers, M3CC considered the center‐center relation and thus attained better outcomes. Comparing the classification accuracies of MMSCC‐4 and MMSCC‐1, one may easily observe that both of them obtained comparable results on NYU. Meanwhile, MMSCC‐1 was worse than MMSCC‐4 regarding the other three centers. This phenomenon might be related to the fact that NYU has enough training data, while the other tasks are relatively limited in terms of the numbers of their subjects.

Figure 5.

Figure 5

Classification accuracies for M3CC and MMSCC. [Color figure can be viewed at http://wileyonlinelibrary.com]

Discriminative Regional and Inter‐Regional Features

We report the most discriminative features that were selected from both T1‐weighted MRI and rs‐fMRI to identify the ASD patients from healthy controls. Checking the features selected by M3CC at each cross‐validation fold, we found that the features selected across different imaging centers could be slightly different. However, the common features, which were closely related with the disease, were always selected and assigned with large discriminant coefficients in different imaging centers. Quantitatively, we selected the top 50 regional morphological features and top 100 inter‐regional functional features for each image center over all cross‐validation folds. The discriminative features were finally determined if they were commonly selected by all four centers.

Table 7 shows the discriminative regional morphological features selected by M3CC. One can observe that the selected features included measures from SSV, mean cortical thickness, cortical GMV, and subcortical WMV. The diversity of the sources of discriminative features indicates that different types of regional morphological features were complementary to each other when identifying ASD from healthy controls. It is also observable that the selected features were distributed in various brain regions, which indicates the spread of morphological abnormalities over the whole brain in ASD patients. It is worth noting that 12 out of the 24 features were from the cortical GM, which implies the significance of detecting anomalies in cortical GM for ASD diagnosis.

Table 7.

Top regional morphological features selected from T1 MRI

Type ROI
1 SSV Left‐Inf‐Lat‐Vent
2 SSV Left‐Thalamus‐Proper
3 SSV 3rd‐Ventricle
4 SSV 4th‐Ventricle
5 SSV CSF
6 SSV 5th‐Ventricle
7 SSV CC_Central
8 GMT Fusiform_L
9 GMV InferiorTemporal_L
10 GMT LateralOrbitofrontal_L
11 GMT Parahippocampal_L
12 GMT Precentral_L
13 GMV RostralAnteriorCingulate_L
14 GMV FrontalPole_L
15 BAV BA3a_L
16 BAT BA3a_L
17 BAT BA6_L
18 BAV V1_L
19 GMV Cuneus_R
20 GMV Entorhinal_R
21 GMT LateralOrbitofrontal_R
22 GMT MedialOrbitofrontal_R
23 GMT Paracentral_R
24 GMV Parsorbitalis_R
25 GMT Parsorbitalis_R
26 GMT ParsTriangularis_R
27 GMV PosteriorCingulate_R
28 GMT SuperiorParietal_R
29 GMT TransverseTemporal_R
30 BAT BA4a_R
31 BAT MT_R
32 WMV Cuneus_L
33 WMV MedialOrbitofrontal_L
34 WMV FrontalPole_R

Abbreviations: L = left hemisphere; R = right hemisphere; SSV = subcortical structure volumes; GMV = cortical GM volumes; GMT = mean cortical thickness; WMV = subcortical WM volumes; BAV = Brodmann area volumes; BAT = Brodmann area thickness.

Table 8 shows the most discriminative inter‐regional functional features selected from rs‐fMRI. The number in the parenthesis indicates the indices of the functional regions in AAL template. Figure 6 further visualizes the common discriminative connections shared by four imaging centers using connectogram. The intra‐hemisphere connections in the left and right hemispheres are plotted in green and red, respectively, while the inter‐hemisphere connections are plotted in black. The thickness of each line indicates the total weight of the feature across four centers.

Table 8.

Top inter‐regional functional features selected from rs‐fMRI

ROI 1 ROI 2
1 Precentral_L (1) Rectus_R (28)
2 Precentral_L (1) ParaHippocampal_R (40)
3 Precentral_L (1) Temporal_Inf_R (90)
4 Precentral_R (2) Vermis_1_2 (109)
5 Frontal_Sup_R (4) SupraMarginal_L (63)
6 Frontal_Sup_R (4) SupraMarginal_R (64)
7 Frontal_Sup_Orb_L (5) Cingulum_Mid_L (33)
8 Frontal_Sup_Orb_R (6) Cuneus_L (45)
9 Frontal_Mid_L (7) Parietal_Sup_R (60)
10 Frontal_Mid_L (7) Cerebelum_3_R (96)
11 Frontal_Inf_Oper_R (12) Cerebelum_8_R (104)
12 Frontal_Inf_Orb_L (15) Caudate_L (71)
13 Rolandic_Oper_R (18) Angular_R (66)
14 Olfactory_L (21) Occipital_Sup_R (50)
15 Olfactory_L (21) Precuneus_R (68)
16 Frontal_Med_Orb_L (25) Cuneus_R (46)
17 Frontal_Med_Orb_R (26) Frontal_Inf_Tri_L (13)
18 Rectus_R (28) Insula_L (29)
19 Cingulum_Ant_L (31) Frontal_Mid_R (8)
20 Cingulum_Mid_L (33) Frontal_Mid_Orb_R (10)
21 Cingulum_Mid_L (33) Cingulum_Mid_R (34)
22 Cingulum_Mid_R (34) Fusiform_R (56)
23 Cingulum_Mid_R (34) Caudate_L (71)
24 Hippocampus_R (38) Hippocampus_L (37)
25 ParaHippocampal_L (39) Postcentral_L (57)
26 Amygdala_R (42) Calcarine_R (44)
27 Calcarine_L (43) Frontal_Sup_Orb_R (6)
28 Cuneus_L (45) Frontal_Med_Orb_R (26)
29 Occipital_Inf_R (54) Occipital_Sup_L (49)
30 Occipital_Inf_R (54) Parietal_Sup_L (59)
31 Fusiform_L (55) Calcarine_R (44)
32 Parietal_Sup_R (60) Occipital_Mid_L (51)
33 SupraMarginal_L (63) ParaHippocampal_L (39)
34 Angular_L (65) Cingulum_Post_R (36)
35 Thalamus_L (77) Precentral_R (2)
36 Heschl_L (79) Cingulum_Post_L (35)
37 Heschl_R (80) Precentral_R (2)
38 Temporal_Sup_L (81) Cingulum_Mid_L (33)
39 Cerebelum_3_R (96) Frontal_Inf_Tri_L (13)

Figure 6.

Figure 6

Connectogram of common discriminative rs‐fMRI connections selected by our method. The intrahemisphere connections in the left and right hemispheres are plotted in green and red, respectively, while the inter‐hemisphere connections are plotted in black. The thickness of each line indicates the total weight of the feature across four centers. [Color figure can be viewed at http://wileyonlinelibrary.com]

The connections that contribute to accurate ASD diagnosis are not only restricted within the same hemisphere or same lobe, but also across both hemispheres and all lobes. Although the number of the intra‐hemisphere connections in the left hemisphere is similar with the number in the right hemisphere, the intra‐hemisphere connections in the left hemisphere has larger weights in Figure 6, which is consistent with the left‐hemisphere hypothesis for ASD [Chandana et al., 2005; Chugani et al., 1997]. Besides, the selected connections involved multiple cortical regions and subcortical structures, which have been related to ASD in the literature. For example, the temporal lobes, occipital lobes and calcarine sulcus are involved in processing the auditory and visual stimuli, language and nonlinguistic social stimuli. Previous studies have identified the abnormalities in all these regions for ASD patients [Johnson et al., 2005; Redcay, 2008; Schultz et al., 2000; Wetherby et al., 2004]. It is also interesting to notice the asymmetry between the left and the right hemispheres, such as the connections between bilateral median cingulate regions and bilateral hippocampus regions. These observations provide important biomarker information that helps understand ASD.

As a summary, M3CC can find the most discriminative features, including regions and their connections. These evidences can be inferred by the proposed task‐task and modality‐modality regularization terms. The selection of these features is effective to the joint multi‐modality multi‐center classification for clinical ASD diagnosis.

CONCLUSIONS

This article proposes a multi‐center classification framework for multi‐modality ASD diagnosis. Specifically, two types of features, that is, regional morphological features and inter‐regional functional features are extracted from T1‐weighted MRI and rs‐fMRI. To make full use of the complementary information shared by different imaging centers, the classification problem for each center is treated as one task. Both task‐task relation and modality‐modality relation are added as regularization terms in the learning process, where all centers are jointly classified by multi‐task learning.

In summary, the merits of the proposed method can be concluded as follows:

  1. Both “task‐task” and “modality‐modality” relations are integrated. We devise the task‐task relation with the assumption that, if the image data from two imaging centers are related with each other, the feature representations of ASD should be highly similar. That is, for corresponding features, their discriminant coefficients toward disease labels should be similar. Meanwhile, we devise the modality‐modality relation with the assumption that the discriminant functions for different modalities of one subject tend to reach the same disease label. As different imaging centers have different numbers of subjects and also different modalities have different numbers of features, the proposed framework can be generalized to handle the real clinical applications flexibly.

  2. In the proposed framework, the feature selection and classification are jointly considered, to allow them to interact with each other for improving the classification performance. In this way, the selected features are the most suitable ones for the classifier, and the classifier parameters are tuned to the best for the selected features. This is quite different from most existing methods for multi‐task multi‐modality disease diagnosis that always considered feature selection and classifier modeling as two separate steps [Jie et al., 2015; Zhang and Shen, 2012]. One obvious disadvantage of these methods is that the processes of feature selection and classifier modeling have different learning criteria, such that their inherent correlation is ignored.

We prove that Algorithm 1 makes the value of the objective function in Eq. (11) decrease monotonically.

Theorem 1.

Given Λms, m=1,,M, are fixed, Eq. (11) is minimized if and only if W is computed from vec(W)=(P+S+Q)1r, where P, S, Q, r are computed with Eqs. (12)–(15).

By taking the derivative of the objective function in Eq. (11) with respect to wtm and set it to zero, we have:

Ptmwtm+Wmqtm+Stmi=1MXtiwti=rtm (19)

where Ptm, qtm, Stm, rtm is computed with Eqs. (12)–(15). For each modality in the t‐th task, we can have an equation in the form of Eq. (19). To estimate wtm for the m‐th modality of t‐th task, we should also learn wtm' from other modality m in the t‐th task and wt'm from other t‐th task on the modality m. Eventually, we have to learn all wtms jointly as:

(P+S+Q)vec(W)=r (20)

where P, Q, S, and r are computed as Eqs. (12)–(15). The analytic solution of wtm can be easily obtained from Eq. (20) by taking the inverse of matrix P+Q+S, that is

vec(W)=(P+S+Q)1r (21)

Conversely, the Hessian matrix of the objective function in Eq. (11) is in the form of:

H=(H10D×D0D×D0D×DH20D×D0D×D0D×DHT),Ht=(Ht1000Ht2000HtM) (22)

where Htm=(4θ(M1)+2)(Xtm)TXtm+2γΛm+4η(LG)ttI, which is positive definite for γ > 0, η > 0, θ > 0, and M ≥ 1. Thus, the Theorem is proved.

Lemma 1

[Nie et al., 2010]. For any nonzero vectors u, ut d, the following inequality holds:

u2u222ut2ut2ut222ut2 (23)

Theorem 2.

In each iteration, Algorithm 1 monotonically decreases the objective function value in Eq. (11).

In the l‐th iteration,  l=1,2,, Eq. (11) can be rewritten as follows:

J(W(l),Λ(l))=t=1T||ytm=1MXtmwtm(l)||22+ηijTm=1Mgi,jmwim(l)wjm(l)2+θt=1Tp,q=1M||Xtpwtp(l)Xtqwtq(l)||2+γm=1Mtr((Wm(l))'Λm(l)Wm(l))=J1(wtm(l))+γm=1Mtr((Wm(l))'Λm(l)Wm(l)) (24)

where

J1(W(l))=t=1T||ytm=1MXtmwtm(l)||22+ηijTm=1Mgi,jmwim(l)wjm(l)2+θt=1Tp,q=1MXtpwtp(l)Xtqwtq(l)2 (25)

From Theorem 1, we can infer that vecW=(P+S+Q)1r is the local minimum of JW when Λ is fixed. Thus, we have:

J(W(l+1),Λ(l))J(W(l),Λ(l)), i.e.,

J1(W(l+1))+γm=1Mtr(Wm(l+1)'Λm(l)Wm(l+1))
J1(W(l))+γm=1Mtr(Wm(l)'Λm(l)Wm(l))

Substituting Eq. (11) into it, we have:

J1(W(l+1))+γmMi=1Dm(ei)'Wm(l+1)222(ei)'Wm(l)2J1(W(l))+γm=1Mi=1Dm(ei)'Wm(l)222(ei)'Wm(l)2

With a simple modification, we can have:

J1(W(l+1))+γm=1Mi=1Dm((ei)'Wm(l+1)222(ei)'Wm(l)2(ei)'Wm(l+1)2+(ei)'Wm(l+1)2)J1(W(l))+γmMi=1Dm((ei)'Wm(l)222(ei)'Wm(l)2(ei)'Wm(l)2+(ei)'Wm(l)2) (26)

With Eqs. (23) and (26), we can easily arrive at

J1(W(l+1))+γm=1Mi=1Dm(ei)'Wm(l+1)2J1(W(l))+γm=1Mi=1Dm(ei)'Wm(l)2

i.e.,

J(W(l+1),Λ(l+1))J(W(l),Λ(l))

That is to say, the Algorithm 1 monotonically decreases the objective function value in Eq. (9) and the proposed iterative algorithm will converge to its local optimum.

Contributor Information

Qian Wang, Email: wang.qian@sjtu.edu.cn.

Dinggang Shen, Email: dgshen@med.unc.edu.

REFERENCES

  1. Anagnostou E, Taylor MJ (2011): Review of neuroimaging in autism spectrum disorders: What have we learned and where we go from here. Mol Autism 2:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Argyriou A, Evgeniou T (2007): Multi‐task feature learning In: Advances in neural information processing systems 19. The MIT Press, Cambridge MA, USA. pp 41–48.
  3. Argyriou A, Evgeniou T, Pontil M (2008): Convex multi‐task feature learning. Mach Learn 73:243–272. [Google Scholar]
  4. Chandana SR, Behen ME, Juhász C, Muzik O, Rothermel RD, Mangner TJ, Chakraborty PK, Chugani HT, Chugani DC (2005): Significance of abnormalities in developmental trajectory and asymmetry of cortical serotonin synthesis in autism. Int J Dev Neurosci 23:171–182. [DOI] [PubMed] [Google Scholar]
  5. Chang C‐C, Lin C‐J (2017): LIBSVM: A library for support vector machine, 2001. Software available at http://www. csie. ntu. edu. tw/∼ cjlin/libsvm
  6. Chugani DC, Muzik O, Rothermel R, Behen M, Chakraborty P, Mangner T, Da Silva EA, Chugani HT (1997): Altered serotonin synthesis in the dentatothalamocortical pathway in autistic boys. Ann Neurol 42:666–669. [DOI] [PubMed] [Google Scholar]
  7. Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL, Dale AM, Maguire RP, Hyman BT (2006): An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31:968–980. [DOI] [PubMed] [Google Scholar]
  8. Fan Y, Resnick SM, Wu X, Davatzikos C (2008): Structural and functional biomarkers of prodromal Alzheimer's disease: A high‐dimensional pattern classification study. Neuroimage 41:277–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, Van Der Kouwe A, Killiany R, Kennedy D, Klaveness S (2002): Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain. Neuron 33:341–355. [DOI] [PubMed] [Google Scholar]
  10. Friston KJ, Williams S, Howard R, Frackowiak RS, Turner R (1996): Movement‐related effects in fMRI time‐series. Magn Reson Med 35:346–355. [DOI] [PubMed] [Google Scholar]
  11. Hinrichs C, Singh V, Xu G, Johnson SC (2011): Predictive markers for AD in a multi‐modality framework: An analysis of MCI progression in the ADNI population. Neuroimage 55:574–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Jiang Y, Chung F‐L, Ishibuchi H, Deng Z, Wang S (2015): Multitask TSK fuzzy system modeling by mining intertask common hidden structure. IEEE Trans Cybern 45:534–547. [DOI] [PubMed] [Google Scholar]
  13. Jie B, Zhang D, Cheng B, Shen D (2015): Manifold regularized multitask feature learning for multimodality disease classification. Hum Brain Mapp 36:489–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Jin Y, Wee CY, Shi F, Thung KH, Ni D, Yap PT, Shen D (2015): Identification of infants at high‐risk for autism spectrum disorder using multiparameter multiscale white matter connectivity networks. Hum Brain Mapp 36:4880–4896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Johnson MH, Griffin R, Csibra G, Halit H, Farroni T, De Haan M, Tucker LA, Baron‐Cohen S, Richards J (2005): The emergence of the social brain network: Evidence from typical and atypical development. Dev Psychopathol 17:599–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Liu F, Wee C‐Y, Chen H, Shen D (2014): Inter‐modality relationship constrained multi‐modality multi‐task feature selection for Alzheimer's Disease and mild cognitive impairment identification. NeuroImage 84:466–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Liu J, Ji S, Ye J (2009): Multi‐task feature learning via efficient ℓ2,1‐norm minimization In: The Twenty‐Fifth Conference on Uncertainty in Artificial Intelligence. Montreal, QC, Canada: AUAI Press; pp 339–348. [Google Scholar]
  18. Liu M, Zhang D, Shen D (2015): View‐centralized multi‐atlas classification for Alzheimer's disease diagnosis. Hum Brain Mapp 36:1847–1865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Lord C, Jones RM (2012): Annual Research Review: Re‐thinking the classification of autism spectrum disorders. J Child Psychol Psychiatry 53:490–509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Meng X, Jiang R, Lin D, Bustillo J, Jones T, Chen J, Yu Q, Du Y, Zhang Y, Jiang T (2017): Predicting individualized clinical measures by a generalized prediction framework and multimodal fusion of MRI data. NeuroImage 145:218–229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Nie F, Huang H, Cai X, Ding CH (2010): Efficient and robust feature selection via joint ℓ2,1‐norms minimization. In: Advances in neural information processing systems 23. The MIT Press, Cambridge MA, USA. pp 1813–1821.
  22. Redcay E (2008): The superior temporal sulcus performs a common function for social and speech perception: Implications for the emergence of autism. Neurosci Biobehav Rev 32:123–142. [DOI] [PubMed] [Google Scholar]
  23. Schultz RT, Gauthier I, Klin A, Fulbright RK, Anderson AW, Volkmar F, Skudlarski P, Lacadie C, Cohen DJ, Gore JC (2000): Abnormal ventral temporal cortical activity during face discrimination among individuals with autism and Asperger syndrome. Arch Gen Psychiatry 57:331–340. [DOI] [PubMed] [Google Scholar]
  24. Shi F, Wang L, Peng Z, Wee C‐Y, Shen D (2013): Altered modular organization of structural cortical networks in children with autism. PLoS One 8:e63131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Tzourio‐Mazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, Joliot M (2002): Automated anatomical labeling of ativations in SPM using a macroscopic anatomical parcellation of the MNI MRI single‐subject brain. Neuroimage 15:273–289. [DOI] [PubMed] [Google Scholar]
  26. Van Den Heuvel MP, Pol HEH (2010): Exploring the brain network: A review on resting‐state fMRI functional connectivity. Eur Neuropsychopharmacol 20:519–534. [DOI] [PubMed] [Google Scholar]
  27. Vemuri P, Wiste H, Weigand S, Shaw L, Trojanowski J, Weiner M, Knopman DS, Petersen RC, Jack C (2009): MRI and CSF biomarkers in normal, MCI, and AD subjects predicting future clinical change. Neurology 73:294–301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Walhovd K, Fjell A, Brewer J, McEvoy L, Fennema‐Notestine C, Hagler D, Jennings R, Karow D, Dale A (2010): Combining MR imaging, positron‐emission tomography, and CSF biomarkers in the diagnosis and prognosis of Alzheimer disease. Am J Neuroradiol 31:347–354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Wang H, Nie F, Huang H, Risacher SL, Saykin AJ, Shen L (2012): Identifying disease sensitive and quantitative trait‐relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning. Bioinformatics 28:i127–i136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wang L, Wee C‐Y, Tang X, Yap P‐T, Shen D (2016): Multi‐task feature selection via supervised canonical graph matching for diagnosis of autism spectrum disorder. Brain Imaging Behav 10:33–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Wee CY, Wang L, Shi F, Yap PT, Shen D (2014): Diagnosis of autism spectrum disorders using regional and interregional morphological features. Hum Brain Mapp 35:3414–3430. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Wetherby AM, Woods J, Allen L, Cleary J, Dickinson H, Lord C (2004): Early indicators of autism spectrum disorders in the second year of life. J Autism Dev Disord 34:473–493. [DOI] [PubMed] [Google Scholar]
  33. Yahata N, Morimoto J, Hashimoto R, Lisi G, Shibata K, Kawakubo Y, Kuwabara H, Kuroda M, Yamada T, Megumi F (2016): A small number of abnormal brain connections predicts adult autism spectrum disorder. Nat Commun 14:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Zhang D, Shen D (2012): Multi‐modal multi‐task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease. Neuroimage 59:895–907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Zhang D, Wang Y, Zhou L, Yuan H, Shen D (2011): Multimodal classification of Alzheimer's disease and mild cognitive impairment. Neuroimage 55:856–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Zhang K, Gray JW, Parvin B (2010): Sparse multitask regression for identifying common mechanism of response to therapeutic targets. Bioinformatics 26:i97–i105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Zhang Y, Zhou G, Jin J, Zhao Q, Wang X, Cichocki A (2015): Sparse Bayesian classification of EEG for brain‐computer interface. IEEE Trans Neural Networks Learn Syst 27:2256–2267. [DOI] [PubMed] [Google Scholar]
  38. Zhou J, Chen J, Ye J (2011): Malsar: Multi‐Task Learning via Structural Regularization. Arizona State University, AZ, USA. [Google Scholar]
  39. Zhu X, Suk H‐I, Wang L, Lee S‐W, Shen D (2017): A novel relational regularization feature selection method for joint regression and classification in AD diagnosis. Med Image Anal 38:205–214. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Human Brain Mapping are provided here courtesy of Wiley

RESOURCES