A Manifold Regularized Multi-Task Learning Model for IQ Prediction From Two fMRI Paradigms

Li Xiao; Julia M Stephen; Tony W Wilson; Vince D Calhoun; Yu-Ping Wang

doi:10.1109/TBME.2019.2921207

. Author manuscript; available in PMC: 2021 Feb 15.

Published in final edited form as: IEEE Trans Biomed Eng. 2019 Jun 5;67(3):796–806. doi: 10.1109/TBME.2019.2921207

A Manifold Regularized Multi-Task Learning Model for IQ Prediction From Two fMRI Paradigms

Li Xiao ¹, Julia M Stephen ², Tony W Wilson ³, Vince D Calhoun ⁴, Yu-Ping Wang ⁵

PMCID: PMC7883481 NIHMSID: NIHMS1667073 PMID: 31180835

Abstract

Objective:

Multi-modal brain functional connectivity (FC) data have shown great potential for providing insights into individual variations in behavioral and cognitive traits. The joint learning of multi-modal data can utilize intrinsic association, and thus can boost learning performance. Although several multi-task based learning models have already been proposed by viewing feature learning on each modality as one task, most of them ignore the structural information inherent across the modalities, which may play an important role in extracting discriminative features.

Methods:

In this paper, we propose a new manifold regularized multi-task learning model by simultaneously considering between-subject and between-modality relationships. Specifically, the l_2,1 -norm (i.e., group-sparsity) regularizer is enforced to jointly select a few common features across different modalities. A novelly designed manifold regularizer is further imposed as a crucial underpinning to preserve the structural information both within and between modalities. Such designed regularizers will make our model more adaptive to realistic neuroimaging data, which are usually of small sample size but high dimensional features.

Results:

Our model is validated on the Philadelphia Neurodevelopmental Cohort dataset, where our modalities are regarded as two types of functional MRI (fMRI) data collected under two paradigms. We conduct experimental studies on fMRI-based FC network data in two task conditions for intelligence quotient (IQ) prediction. The results show that our proposed model can not only achieve improved prediction performance, but also yield a set of IQ-relevant biomarkers.

Conclusion and Significance:

This paper develops a new multi-task learning model, enabling the discovery of significant biomarkers that may account for a proportion of the variance in human intelligence.

Keywords: Functional connectivity, functional MRI, geometry, intelligence, multi-modal, multi-task learning

I. Introduction

In recent decades, the human brain functional connectome has emerged as an important “fingerprint” to provide insights into individual variations in behavioral and cognitive traits [1]–[3]. The functional connectome is quantitatively characterized by a functional connectivity network (FCN) based on graph theory, where the spatially distributed but functionally linked regions-of-interest (ROIs) in the brain represent the nodes and the functional connectivities (FCs) defined as the correlations between the time courses of ROIs represent the edges. The Pearson correlation is widely adopted to measure the FC for its efficiency. It is also worth noting that among neuroimaging studies, functional magnetic resonance imaging (fMRI) is one of the most popular modalities to analyze brain FCNs due to its non-invasiveness, high spatial resolution, and good temporal resolution [4]–[6].

Functional connectome-based analyses using fMRI have offered great potential for understanding the brain-behavior and -cognition relationship, while accounting for variables such as age, gender, intelligence, and disease [7]–[17]. For instance, Meier et al. [7] constructed FCNs from resting-state fMRI, and then based on these resting-state FCNs, healthy younger and older adults were discriminated by a support vector machine (SVM) classifier. Tian et al. [9] investigated gender-related differences in the topological organization of resting-state FCNs within the hemisphere es on the basis of typical statistical tests. While FCNs are usually constructed from resting-state fMRI, task fMRI based FCs can better explore how individual traits are influenced by brain activity changes induced by trait-related tasks [18], [19]. Calhoun et al. [14] used independent component analysis to study fMRI based FCNs from a large group of schizophrenia patients, individuals with bipolar disorder, and healthy controls while performing an auditory oddball task, followed by a multivariate statistical testing framework to infer group differences in properties of identified FCNs. Greene and Gao et al. [16], [17] showed that predictive models built from task fMRI based FC data (e.g., working memory or emotion) can lead to better predictions of fluid intelligence than models built from resting-state fMRI based FC data by the experiments on two large, independent datasets. As such, certain tasks may bring about meaningful findings across subjects with different traits, essentially facilitating biomarker identification beyond what can be found in the resting state.

The majority of previous work has focused on one imaging modality (e.g., resting-state or task fMRI). In neuroimaging research studies, it is common to acquire multi-modal imaging from the same experimental subjects to provide complementary information. It has also been suggested in [20], [21] that there is a commonality between different modalities (here brain imaging modality can refer to different functional tasks or different imaging modalities) implicated by the same underlying pathology. To this end, it is highly desirable to develop an approach for a joint analysis of multiple modalities to boost learning performance. Recently, there have been notable efforts to incorporate multiple modalities in a multi-task learning framework for brain cognitive score prediction, and classification of schizophrenia and Alzheimer’s disease (AD) diagnosis [22]–[29]. Specifically, Adeli et al. [22] presented a multi-task multi-linear regression model for the prediction of multiple cognitive scores with incomplete longitudinal imaging data, where the combination of the l₁ and nuclear norm regularizers not only enforced the learned mapping weights to be smooth across the time points and across the tasks, but also accounted for feature selection. Zhang et al. [26] proposed a multi-modal multi-task learning model, where multi-task feature learning jointly selected a small number of common features from multiple modalities, and then a multi-modal SVM fused these selected features for both classification and regression. Jie et al. [27] and Lei et al. [28] studied a manifold regularized multi-task learning model by viewing the feature learning on each modality as one task. In addition to the group-sparsity regularizer that ensures a few common features to be jointly selected across multiple modalities (tasks), it included the manifold regularizer that preserves the structure information of the data (or called the subject-subject relation) within each single modality. Zhu et al. [29] extended the model in [27] by imposing another two manifold regularizers that preserve the feature-feature relation and response-response relation, respectively. However, all these multi-task based models ignore the subject-subject relation between modalities, which could otherwise improve the final performance. It is worth pointing out [30], [31] that, although different modalities have different feature representations, they are related to the description of the same phenomenon and therefore share information, which we call the co-occurrence information. By examining the relation of subjects between different modalities that we wish to uncover in this paper can reveal such important co-occurrence information.

In this paper, motivated by the work in [27], we propose a new manifold regularized multi-task learning model, which considers not only the relation of subjects within each single modality but also the relation of subjects between modalities. We extend the model in [27] by replacing the manifold regularizer with a novel one which defines the similarity (or the relation) of subjects by using the Gaussian radial basis function, and specifically the similarity of subjects between different modalities is calculated by propagating the similarity information of subjects within individual modality based on a weighted graph diffusion process. Motivation for this idea is derived from multi-view spectral clustering studied in [30], [31], and we will introduce it in detail in the next section. From the machine learning point of view, this well-designed manifold regularizer can extract more discriminative features and thereby improve the performance of subsequent prediction. To validate the efficiency and effectiveness of our proposed model, we perform extensive experiments on the publicly available Philadelphia Neurodevelopmental Cohort (PNC) dataset [32], [33] Here we predict the continuous-value intelligence quotient (IQ) scores of subjects by using fMRI data in two task conditions (working memory and emotion), with the goal of this study to investigate which common FCs from the two functional imaging modalities (here our modalities refer to fMRI data collected under two paradigms) contribute most to individual variations in IQ. To be specific, we first construct two FCNs from the two corresponding task fMRI datasets for each subject, respectively. We then regard these FCs as features extracted from the fMRI data and input them into our proposed model for subsequent analysis. It is shown that our proposed model yields improved performance in comparison to the competing models under the metrics of the root mean square error and the correlation coefficient.

The main contributions of this paper are twofold. First, we propose a new manifold regularized multi-task learning model that has two apparent advantages: 1) incorporate complementary information from multiple modalities by jointly learning a small number of common features; and 2) employ a novel manifold regularizer to preserve the structure information of the data both within and between modalities. Second, we apply the proposed model on the real PNC dataset to identify relevant FC biomarkers for IQ prediction using two sets of task fMRI data, and the experimental results show that the proposed model can not only outperform the existing state-of-the-art models, but also discover IQ-relevant predictors that are in accordance with prior studies.

The remainder of this paper is organized as follows. Section II describes the existing multi-task based learning models and our proposed new model, respectively. Section III presents the experimental results on the PNC data and some discussions. Finally, we conclude this paper in Section IV.

Notations:

Throughout this paper, uppercase boldface, lowercase boldface, and normal italic letters are used to denote matrices, vectors, and scalars, respectively. The superscript T denotes the transpose of a vector or a matrix. For a matrix A, we denote its i-th row, j-th column, (i, j)-th entry, and trace as Aⁱ, A_j, A_i,j, and tr(A), respectively. For a vector a, its i-th entry is denoted as a(i). We further denote the Frobenius norm and l_2,1 -norm of a matrix A as $‖ A ‖_{F} = \sqrt{\sum_{i, j} A_{i, j}^{2}}$ and $‖ A ‖_{2, 1} = \sum_{i} {‖ A^{i} ‖}_{2} = \sum_{i} \sqrt{\sum_{i, j} A_{i, j}^{2}}$ , respectively. Let $ℝ$ denote the set of real numbers.

II. Methods

Multi-task learning (MTL) aims to improve the performance of multiple tasks by exploiting their relationships, particularly when these tasks have some relatedness or commonality [34], [35]. In [27], a manifold regularized multi-task learning model has been recently proposed for jointly selecting a small number of common features from multiple modalities and achieved superior performance in AD classification, where each modality was viewed as one task. Importantly, this model considered the structure information of the data within each single modality by adding a manifold regularizer, compared with the classical multi-task learning model. Motivated by the approach in [27], in this paper we propose a new manifold regularized multi-task learning model, which includes our newly designed manifold regularizer that considers the structure information of the data both within each single modality and between modalities. In this section, we first briefly introduce the existing multi-task based learning models, and subsequently present our proposed model as well as the optimization algorithm.

A. Classical Multi-Task Learning (MTL)

Assume that there are M different modalities (i.e., tasks). We denote the m-th modality as $X^{(m)} = [x_{1}^{(m)}, x_{2}^{(m)}, {\dots, x_{N}^{(m)}]}^{T} \in ℝ^{N \times d}$ for m = 1, 2, …, M, where $x_{i}^{(m)} \in ℝ^{d}$ represents the feature vector of the i-th subject in the m-th modality, and d and N respectively stand for the numbers of features and subjects. Let $y \in ℝ^{N}$ be the response vector from these subjects, and $w^{(m)} \in ℝ^{d}$ be the regression coefficient vector for the m-th modality. Then, the MTL model is to solve the following optimization problem:

\min_{W} \frac{1}{2} \sum_{m = 1}^{M} {‖ y - X^{(m)} w^{(m)} ‖}_{2}^{2} + β ‖ W ‖_{2, 1},

(1)

where $W = [w^{(1)}, w^{(2)}, \dots, w^{(M)}] \in ℝ^{d \times M}$ denotes the regression coefficient matrix and β is a regularization parameter that balances the tradeoff between residual error and sparsity. The l_2,1 -norm encourages these multiple predictors from different modalities to share similar parameter sparsity patterns, through which the MTL model can result in improved performance for the modality-specific models over training the models separately. It is readily seen that (1) is reduced to the least absolute shrinkage and selection operator (LASSO) problem [36] when the number of modalities equals one.

B. Manifold Regularized Multi-Task Learning (M2TL)

In the classical MTL model above, only the relation between data and the response values is considered, while ignoring the structure information of data, which most likely leads to large deviations. With the expectation that similar subjects should have similar response values, a manifold regularizer that takes into account the subject-subject relation within each single modality is therefore introduced as follows:

\frac{1}{2} \sum_{i, j}^{N} S_{i, j}^{(m)} {({\hat{y}}^{(m)} (i) - {\hat{y}}^{(m)} (j))}^{2},

(2)

where ${\hat{y}}^{(m)} = X^{(m)} w^{(m)} = [{\hat{y}}^{(m)} (1), {\hat{y}}^{(m)} (2), \dots, {\hat{y}}^{(m)} (N)]^{T} \in ℝ^{N}$ is the estimated response vector and $S^{(m)} = [S_{i, j}^{(m)}] \in ℝ^{N \times N}$ is the similarity matrix that defines the similarity for each pair of subjects in the m-th modality. As for the similarity matrix S^(m), we construct an adjacency graph by regarding each subject as a node and using the K-nearest neighbor rule along with the Gaussian radial basis function to calculate the edge weights as the similarities. If $x_{i}^{(m)}$ is among K nearest neighbors of $x_{j}^{(m)}$ or $x_{j}^{(m)}$ is among K nearest neighbors of $x_{i}^{(m)}$ , their similarity $S_{i, j}^{(m)}$ is defined as

S_{i, j}^{(m)} = \exp (- \frac{{‖ x_{i}^{(m)} - x_{j}^{(m)} ‖}_{2}^{2}}{σ^{(m)}}),

(3)

where σ^(m) is a free parameter to be fixed empirically as the mean of ${{‖ x_{i}^{(m)} - x_{j}^{(m)} ‖}_{2}^{2}}_{i \neq j}$ ; otherwise, $S_{i, j}^{(m)}$ is set to zero, i.e., $S_{i, j}^{(m)} = 0$ . Let L^{(m )} = D^{(m )} − S^{(m )} be the Laplacian matrix of the graph, where D^(m) is a diagonal matrix with the diagonal elements being $D_{i, i}^{(m)} = \sum_{j = 1}^{N} S_{i, j}^{(m)}$ for 1 ≤ i ≤ N. Then, (2) can be simplified as

\frac{1}{2} \sum_{i, j}^{N} S_{i, j}^{(m)} {({\hat{y}}^{(m)} (i) - {\hat{y}}^{(m)} (j))}^{2} = {({\hat{y}}^{(m)})}^{T} L^{(m)} {\hat{y}}^{(m)} = {(X^{(m)} w^{(m)})}^{T} L^{(m)} (X^{(m)} w^{(m)}) .

(4)

Based on (4), the M2TL model was developed and successfully applied to AD classification in [27], [29]:

\begin{array}{l} \min_{W} \frac{1}{2} \sum_{m = 1}^{M} {‖ y - X^{(m)} w^{(m)} ‖}_{2}^{2} + β ‖ W ‖_{2, 1} \\ + γ \sum_{m = 1}^{M} {(X^{(m)} w^{(m)})}^{T} L^{(m)} (X^{(m)} w^{(m)}), \end{array}

(5)

where β and γ are two regularization parameters.

C. Proposed New M2TL (NM2TL)

Compared with the MTL model, one appealing property of the M2TL model is that the introduced manifold regularizer $γ \sum_{m = 1}^{M} {(X^{(m)} w^{(m)})}^{T} L^{(m)} (X^{(m)} w^{(m)})$ in (5) can preserve the structure information of data. However, it only considers the relation of subjects within each single modality separately, but the important mutual relation of subjects between modalities is ignored. Motivated by this, in this subsection we propose a new M2TL (NM2TL) model that effectively considers both the relation of subjects within the same modality and that between modalities.

We first design the following novel manifold regularizer

R (W, γ, λ) = \frac{1}{2} \sum_{p, q}^{M} \sum_{i, j}^{N} θ_{p, q} S_{i, j}^{(p, q)} {({\hat{y}}^{(p)} (i) - {\hat{y}}^{(q)} (j))}^{2},

(6)

where θ_p,q is a constant such that θ_p,q = γ when p = q, and θ_p,q = λ when p ≠ q. Similarly, $S^{(p, q)} = [S_{i, j}^{(p, q)}] \in ℝ^{N \times N}$ is the similarity matrix for each pair of subjects between the p-th and q-th modalities, i.e., $S_{i, j}^{(p, q)}$ denotes the similarity of the i-th subject in the p-th modality and the j-th subject in the q-th modality. Note that $R (W, γ, λ)$ in (6) is composed of two parts: the first part $γ \frac{1}{2} \sum_{p}^{M} \sum_{i, j}^{N} S_{i, j}^{(p, p)} {({\hat{y}}^{(p)} (i) - {\hat{y}}^{(p)} (j))}^{2}$ preserves the relation of subjects within each single modality; and the second part $λ \frac{1}{2} \sum_{p \neq q}^{M} \sum_{i, j}^{N} S_{i, j}^{(p, q)} {({\hat{y}}^{(p)} (i) - {\hat{y}}^{(q)} (j))}^{2}$ preserves the relation of subjects between modalities. The two free parameters γ and λ respectively control the effects of the two corresponding parts. In Fig. 1, the difference between the manifold regularizers in the M2TL and NM2TL models can readily be recognized.

Fig. 1. — The illustration of the relation of data among modalities when M = 2 and N = 3. Circles and rectangles respectively represent the subjects in two modalities. Blue connections denote the relation of subjects within each single modality, and orange connections denote the relation of subjects between modalities. (a) and (b) characterize the manifold regularizers in the M2TL model and our proposed NM2TL model, respectively. The M2TL model overlooks the inter-modal connections.

A natural question is how to define the similarity of subjects between two modalities (or nodes from two graphs). We expect that if $x_{i}^{(q)}$ and $x_{j}^{(q)}$ (i.e., two subjects in the same modality) are similar, the co-occurring subject $x_{i}^{(p)}$ corresponding to $x_{i}^{(q)}$ should also be similar with $x_{j}^{(q)}$ . As presented in [30], [31], the similarity of $x_{i}^{(p)}$ and $x_{j}^{(q)}$ was calculated in a smooth way by summing over all N co-occurrences, $x_{k}^{(p)}$ and $x_{k}^{(q)}$ for 1 ≤ k ≤ N, i.e.,

S_{i, j}^{(p, q)} = \sum_{k = 1}^{N} S_{i, k}^{(p)} S_{k, j}^{(q)},

(7)

or in matrix form

S^{(p, q)} = S^{(p)} S^{(q)},

(8)

where S^(p) and S^(q) are the similarity matrices for the p-th and q-th modalities and calculated by (3), respectively. We then put them in a large MN ×MN matrix of the following block-wise form:

S = [\begin{matrix} γ S^{(1, 1)} & λ S^{(1, 2)} & \dots & λ S^{(1, M)} \\ λ S^{(2, 1)} & γ S^{(2, 2)} & \dots & λ S^{(2, M)} \\ ⋮ & ⋮ & \dots & ⋮ \\ λ S^{(M, 1)} & λ S^{(M, 2)} & \dots & γ S^{(M, M)} \end{matrix}],

(9)

such that along the diagonal, γ is used to tune the withinmodality similarity, and off the diagonal, λ is used to tune the between-modality similarity. It is obvious that S is still symmetrical. Accordingly, by calculating the diagonal matrix D where its diagonal elements are $D_{i, i} = \sum_{j} S_{i, j}$ for 1 ≤ i ≤ MN, we get

L = D - S .

(10)

Therefore, it is not hard to verify that (6) can be equivalently expressed as

\begin{array}{l} R (W, γ, λ) = {[\begin{matrix} {\hat{y}}^{(1)} \\ {\hat{y}}^{(2)} \\ ⋮ \\ {\hat{y}}^{(M)} \end{matrix}]}^{T} L [\begin{matrix} {\hat{y}}^{(1)} \\ {\hat{y}}^{(2)} \\ ⋮ \\ {\hat{y}}^{(M)} \end{matrix}] \\ = {[\begin{matrix} X^{(1)} w^{(1)} \\ X^{(2)} w^{(2)} \\ ⋮ \\ X^{(M)} w^{(M)} \end{matrix}]}^{T} L [\begin{matrix} X^{(1)} w^{(1)} \\ X^{(2)} w^{(2)} \\ ⋮ \\ X^{(M)} w^{(M)} \end{matrix}] . \end{array}

(11)

Based on the new manifold regularizer in (11), the NM2TL model is proposed as follows:

\min_{W} \frac{1}{2} \sum_{m = 1}^{M} {‖ y - X^{(m)} w^{(m)} ‖}_{2}^{2} + β ‖ W ‖_{2, 1} + R (W, γ, λ),

(12)

where β, γ, and λ denote control parameters of the respective regularizers. In our NM2TL model (12), the l_2,1 -norm regularizer ensures a sparse set of common features to be jointly learned from multiple modalities, and the manifold regularizer attempts to preserve the structure information of the data both within each single modality and between modalities. Thus, it may extract more discriminative features.

Remark 1:

More recently, a similar model has been developed in [37] for identifying the associations between genetic risk factors and multiple neuroimaging modalities under the guidance of the a priori diagnosis information (i.e., AD status). Specifically, a diagnosis-aligned regularizer was introduced to fully explore the relation of subjects with the class level diagnosis information in multi-modal imaging such that subjects from the same class will be close to each other after being mapped into the label space, i.e.,

R (W) = \sum_{p \leq q}^{M} \sum_{i, j}^{N} S_{i, j}^{(p, q)} {({\hat{y}}^{(p)} (i) - {\hat{y}}^{(q)} (j))}^{2},

(13)

where the similarity $S_{i, j}^{(p, q)}$ is defined as

S_{i, j}^{(p, q)} = {\begin{array}{l} 1, & if the i -th and j -th subjects from the same class \\ 0, & otherwise. \end{array}

(14)

In this way, we can identify a set of common features that are associated with both risk genetic factors and disease status in order to have a better understanding of the biological pathway specific to AD. Our manifold regularizer in the NM2TL model can be clearly distinguished from the above diagnosis-aligned regularizer in a number of aspects: 1) Our proposed manifold regularizer aims to preserve the geometric structure across modalities such that if the distance of subjects is small, their mapped response values in the label space will also be close. However, the diagnosis-aligned regularizer (13) aims to preserve the class level diagnosis information. 2) Different from the binary similarity measure as a quantitative description of class level diagnosis in (13), we calculate the similarity of subjects using the Gaussian radial basis function in our proposed manifold regularizer, and particularly the similarity of subjects between different modalities is obtained by propagating the similarity information of subjects within individual modality based on a weighted graph diffusion process. This similarity measure has been proven to be effective in preserving the structure information of the original data [27], [30], [31]. 3) We use two different parameters γ and λ in our proposed manifold regularizer to balance the relative contribution of the structure information of the data within a single modality and between modalities, resulting in a better fit to realistic data, while only one parameter is used to control the effect of the diagnosis-aligned regularizer (13) in [37].

D. Optimization Algorithm

Clearly, the objective function in (12) is convex but non-differentiable with respect to $W \in ℝ^{d \times M}$ . We can write it as a summation of two functions:

f (W) = \frac{1}{2} \sum_{m = 1}^{M} {‖ y - X^{(m)} w^{(m)} ‖}_{2}^{2} + R (W, γ, λ),

(15)

g (W) = β ‖ W ‖_{2, 1},

(16)

where f(W) is convex and differentiable, while g(W) is convex but non-differentiable. In this scenario, we optimize W in (12) by the commonly used accelerated proximal gradient method [37]–[40].

We iteratively update W with the following procedure:

W (t + 1) = \underset{W}{\arg \min} Ω_{l} (W, W (t)),

(17)

where

Ω_{l} (W, W (t)) = f (W (t)) + {〈 W - W (t), \nabla f (W (t)) 〉}_{F} + \frac{1}{2 l} ‖ W - W (t) ‖_{F}^{2} + g (W),

(18)

W(t) stands for the value of W obtained at the t-th iteration, 〈W −W(t),∇f (W(t))〉_F = tr((W −W(t))^T∇f(W(t)) denotes the Frobenius inner product of two matrices, ∇f $(W (t)) = [\nabla f (w^{(1)} (t)), \nabla f (w^{(2)} (t)), \dots, \nabla f (w^{(M)} (t))] \in ℝ^{d \times M}$ is the gradient of f(W) at point W(t), and l is a step size. As a result of simple calculation, we get

\nabla f (w^{(m)} (t)) = {(X^{(m)})}^{T} (X^{(m)} w^{(m)} (t) - y) + 2 \sum_{k = 1}^{M} {(X^{(m)})}^{T} L_{m, k} X^{(k)} w^{(k)} (t),

(19)

where $L_{m, k} \in ℝ^{N \times N}$ denotes the (m, k)-th block of L in (10), i.e., $L_{m, k} = {[L_{i, j}]}_{\begin{matrix} 1 + (m - 1) N \leq i \leq m N . \\ 1 + (k - 1) N \leq j \leq k N \end{matrix}}$

Since $‖ W - W (t) + l \nabla f (W (t)) ‖_{F}^{2} = ‖ W - W (t) ‖_{F}^{2} + 2 l {〈 W - W (t), \nabla f (W (t)) 〉}_{F} + l^{2} ‖ \nabla f (W (t)) ‖_{F}^{2}$ , we have Ω_l (W,W(t)) in (18) as $Ω_{l} (W, W (t)) = f (W (t)) - \frac{l}{2} ‖ \nabla f (W (t)) ‖_{F}^{2} + \frac{1}{2 l} ‖ W - W (t) + l \nabla f (W (t)) ‖_{F}^{2} + g (W)$ . Then, by ignoring the term (i.e., $f (W (t)) - \frac{l}{2} ‖ \nabla f (W (t)) ‖_{F}^{2}$ ) independent of W in (17), the update procedure can be written as

W (t + 1) = \underset{W}{\arg \min} \frac{1}{2} ‖ W - V (t) ‖_{F}^{2} + l g (W),

(20)

where V (t) = W(t) − l∇f(W(t)). In fact, (20) is equivalently expressed as

W (t + 1) = prox_{\lg} (V (t)),

(21)

where prox_lg denotes the proximal operator [39] of the scaled function lg, i.e., given a function $u : χ \to ℝ$ , the proximal operator of u is defined by $p r o x_{u} (x) = \arg \min_{z \in χ} \frac{1}{2} ‖ z - x ‖^{2} + u (z)$ for any x. Due to the separability of W(t + 1) on each row, i.e., Wⁱ(t + 1), in (20), we can solve the optimization problem for each row individually:

W^{i} (t + 1) = \underset{W^{i}}{\arg \min} \frac{1}{2} {‖ W^{i} - V^{i} (t) ‖}_{2}^{2} + l β {‖ W^{i} ‖}_{2} .

(22)

In (22), the closed-form solution of Wⁱ(t + 1) can be easily obtained [39]:

W^{i *} = {\begin{array}{l} (1 - \frac{l β}{{‖ V^{i} (t) ‖}_{2}}) V^{i} (t), & if {‖ V^{i} (t) ‖}_{2} \geq l β \\ 0, & otherwise. \end{array}

(23)

Furthermore, in order to accelerate the proximal gradient method, we introduce an auxiliary variable

Q (t) = W (t) + \frac{α (t - 1) - 1}{α (t)} (W (t) - W (t - 1))

(24)

and compute the gradient descent based on Q(t) instead of W(t), where the coefficient α(t) is set as

α (t) = \frac{1 + \sqrt{1 + 4 α {(t - 1)}^{2}}}{2} .

(25)

The pseudocode of the proposed optimization algorithm is summarized in Algorithm I. In the next section, we adopt our model to IQ prediction experiments using fMRI data collected under two paradigms in the real-world dataset. The flowchart of our proposed framework is outlined in Fig. 2.

Fig. 2. — The flowchart of the proposed framework in this study.

Algorithm 1:

Input: the data

{X^{(m)}}_{m = 1}^{M}

and the response vector y; Output: W;

1: Initialization: t = 1, α(0) = 1, l₀ = 1, σ = 0.5, W(0) = W(1) = 0, β, γ, λ;

2: for t = 1 to Max-Iteration, do

3: Computer Q(t) by (24)

4: l = l_t−1

5: while f (W (t + 1)) + g(W (t + 1))

> Ω_l(W(t + 1)), Q(t)),

here W(t + 1) is computed by (20), do

6: l = σl

7: end while

8: l_t = l

9: end for

10: if convergence then

11: W = W (t + 1), terminate

12: end if

Open in a new tab

III. Experimental Results

A. Data Preprocessing

In this study, we used the Philadelphia Neurodevelopmental Cohort (PNC) dataset [32], [33] for performance evaluation. The PNC is a large-scale collaborative research project between the Brain Behavior Laboratory at the University of Pennsylvania and the Center for Applied Genomics at the Childrens Hospital of Philadelphia. The primary objective of the PNC project was to characterize brain and behavior interaction with genetics that combines neuroimaging, diverse clinical and cognitive phenotypes, and genomics. Nearly 900 adolescents aged 8–22 years underwent multimodal neuroimaging including resting-state fMRI, and fMRI of working memory and emotion identification tasks (called nback fMRI and emotion fMRI, respectively) in this research. All data acquired as part of the PNC can be freely downloaded from the public dbGaP site (www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000607.v1.p1).

The nback fMRI and emotion fMRI tasks are well known to be associated with intelligence difference [16]. The nback fMRI assesses working memory, which is important for reasoning and other complex cognitive tasks related to IQ [41]. The emotion fMRI involves both perceptual processing and recognition of the emotional meaning of a facial stimulus, which would combine visual sensory input with retrievable memory and thus could be related to IQ as well [42], [43]. In this work we investigated the relationship between individual differences in IQ and brain activity during the engagement of cognitive abilities i.e., working memory and emotion identification. The IQ scores of subjects were assessed with the Wide Range Achievement Test (WRAT) from a 1-hour computerized neurocognitive battery (CNB) administered by the PNC. The WRAT is a standardized achievement test to measure an individual’s learning ability, e.g., reading recognition, spelling, and math computation [44], and hence provides a reliable estimate of IQ. To mitigate the influence of age over the final results, we excluded subjects whose ages were below 16 years [45]. As a consequence, we were left with 355 subjects (age: 16–22 and mean = 18.21 years; WRAT score: 70–145 and mean = 100.57; female/male: 204/151), providing both nback fMRI and emotion fMRI. The distribution of IQ scores of these subjects is shown in Fig. 3.

Fig. 3. — The IQ score distribution among the 355 subjects.

All MRI scans were performed on a single 3T Siemens TIM Trio whole-body scanner. In the fractal n-back task to probe working memory, subjects were required to respond to a presented fractal only when it was the same as the one presented on a previous trial. In the emotion identification task, subjects were asked to identify 60 faces displaying neutral, happy, sad, angry, or fearful expressions. All image data were acquired with a single-shot, interleaved multi-slice, gradient-echo, echo planar imaging sequence. We implemented image preprocessing for nback fMRI and emotion fMRI of the selected 355 subjects. The preprocessing procedures were similar to those used in [45]–[48]. Specifically, standard preprocessing steps were applied using SPM12 (www.fil.ion.ucl.ac.uk/spm/), which primarily consisted of motion correction, co-registration, spatial normalization to standard MNI space, and spatial smoothing with a 3 mm FWHM Gaussian kernel. The functional time courses were subsequently band-pass filtered at 0.01–0.1 Hz. We utilized a 264-region parcellation [49] to investigate whole-brain connectivity. These 264 ROIs spanned the cerebral cortex, subcortical structures, and the cerebellum. The BOLD data were averaged within 10mm diameter spheres surrounding each of the 264 ROI coordinates. We then calculated the Pearson correlation between the time courses of each pair of ROIs, resulting in a 264 × 264 correlation matrix (FC matrix) for each subject in each single fMRI modality (here we regarded our modalities as fMRI data collected under the two paradigms). To avoid repeated information, only the lower triangular portion of the symmetrical correlation matrix was reformed into a vector with 34716 correlation values. Fisher’s z-transform was applied to these correlations to ensure normality. The 34716 FCs (Fisher’s z-transformed values) were the features used in all subsequent analysis. As a result, we extracted 34716 features from nback fMRI and 34716 features from emotion fMRI for each subject.

B. Experimental Settings

In our experiments, we compared the performance of the proposed NM2TL model and three other competing models: (1) SM (denoted as single modality based model with LASSO [36], which is used to detect a significant subset of FCs from nback or emotion fMRI); (2) MTL [26]; and (3) M2TL [27]. We used a 5-fold cross-validation (CV) technique to evaluate the IQ prediction performance of all these predictive models. That is, the whole set of subjects was first randomly partitioned into 5 disjoint subsets of approximately equal size; then each subset was successively selected as the test set and the other 4 subsets were used for training the predictive model; and finally the trained model was applied to predict IQ scores of the subjects in the test set. This process was repeated 10 times to reduce the effect of sampling bias in the CV. All regularization parameters in the models, including the group-sparsity level β and the manifold regularization parameters γ and λ, were tuned by a 5-fold inner CV on the training set through a grid search within their respective ranges, i.e., β, γ, λ ∈ {10⁻³, 3 × 10⁻³, 10⁻², 3 × 10⁻², 10⁻¹, 0.3, 1, 3, 10, 30}. The parameter K in the K-nearest neighbor rule for the graph similarity matrix calculation was empirically set as 10, which is based on empirical analysis as used in [50], [51] for the choice of the number of nearest neighbors K. Moreover, we optimized K in the range of {10, 20, 30, 40}, and found that 10 worked relatively better for our experiments.

One of the challenges encountered when using these predictive models is that whole-brain FC data consist of a large number of features (i.e., FCs) and a relatively small number of samples (i.e., subjects). This would give rise to various issues, such as proneness to overfitting, difficult interpretability, and computational burden. To this end, we used a simple univariate feature filtering technique to reduce the number of features prior to inputting into the predictive models. Specifically, we discarded features for which the p-values of the correlation with IQ scores of subjects in the nback and emotion fMRI training set were both greater than or equal to 0.05, and then trained the predictive models. All the remaining features of training subjects were normalized to have zero mean and unit norm, and the estimated mean and norm values of training subjects were used to normalize the corresponding features of testing subjects. Accordingly, we also conducted the mean-centering on IQ scores of training subjects and then used the mean IQ value of training subjects to normalize the IQ scores of testing subjects. The model performance on each modality was quantified as the root mean square error (RMSE) and the correlation coefficient (CC) between predicted and actual IQ scores of subjects in the test set.

C. Regression Results

Table I summarizes the regression performance of all competing models for IQ prediction. As we can see from Table I, the proposed NM2TL model consistently outperformed the other predictive models in terms of both the RMSE and the CC. Specifically, our proposed NM2TL model achieved the best CCs of 0.3472 for nback fMRI and 0.3443 for emotion fMRI, and the best RMSEs of 14.8251 for nback fMRI and 14.8658 for emotion fMRI. The next best performance was obtained by the M2TL model, i.e., 0.3348 for nback fMRI and 0.3308 for emotion fMRI in terms of the CC, and 14.9881 for nback fMRI and 15.0056 for emotion fMRI in terms of the RMSE. As shown in Table I, the MTL model, which utilized the multi-task learning for a joint analysis of two modalities (tasks), achieved mostly better regression performance than the single-task based model (i.e., the SM model). It suggests that it is beneficial to use the multi-task learning for integrating complementary information from multiple modalities by jointly selecting a sparse set of common features. In addition, the manifold regularizers in the M2TL and the NM2TL models that can exploit the structure information of data still help increase the performance. Specifically, the proposed NM2TL model outperformed the MTL model, improving the performance by 0.0255 and 0.0221 in the CCs, and by 0.4812 and 0.3987 in the RMSEs, for nback fMRI and emotion fMRI, respectively. Meanwhile, in Table I we reported the p-values of pairwise t-test based on the results of the 5-fold CV to show statistically significant improvement of our proposed model. In light of the fact that the best performance over the IQ regressions was all obtained by our proposed NM2TL model, we can demonstrate that the designed manifold regularizer in our proposed model was effective in identifying more discriminative features associated with IQ. Therefore, it is shown that from the machine learning point of view, properly using different regularizers in the least square regression model has been proven as a valid way to circumvent the overfitting problem and find a compact solution, especially in the high feature-dimension and low sample-size scenarios (e.g., in the field of neuroimaging analysis).

TABLE I.

The Comparison of Regression Performance of nback fMRI and Emotion fMRI by Different Predictive Models

Model		CC (mean ± std)	p-value	RMSE (mean ± std)	p-value
SM	nback	0.3181 ±0.0187	<0.001	15.3882 ±0.1546	<0.001
SM	emotion	0.3240 ± 0.0144	0.0033	15.3060 ±0.1051	<0.001
MTL	nback	0.3217 ± 0.0183	0.0026	15.3063 ±0.1507	<0.001
MTL	emotion	0.3222 ± 0.0175	0.0043	15.2645 ± 0.1669	<0.001
M2TL	nback	0.3348 ±0.0118	0.0458	14.9881 ±0.1238	0.0070
M2TL	emotion	0.3308 ± 0.0139	0.0337	15.0056 ±0.1635	0.0353
NM2TL	nback	0.3472 ± 0.0141	-	14.8251 ± 0.0714	-
NM2TL	emotion	0.3443 ± 0.0125	_	14.8658 ± 0.1047	_

Open in a new tab

p-values were calculated by pairwise t-test comparisons between the regression accuracy of our NM2TL model and other competing models for each modality.

std denotes the standard deviation.

We next investigated the parameters’ sensitivity by changing the values of β, γ, λ in (12). The results in Fig. 4 show that the three parameters interactively affected the final performance, and our model was sensitive to them within only a small range. For better understanding the effect of these parameters, we also presented the performance of the MTL model as baseline that does not include any manifold regularization term. It is worth noting that when γ = λ = 0, our proposed NM2TL model will be degraded to the MTL model. As we can observe from Fig. 5, our proposed NM2TL model and the M2TL model both consistently outperformed the MTL model (baseline) under all values of β. It can further embody the advantage of adding the manifold regularization term on top of the classical MTL model. Moreover, Fig. 5 shows that for each selected value of γ and/or λ, the curve representing the performance with respect to different values of β was very smooth as long as β ≤ 10⁻¹, which indicates that our proposed NM2TL model and the M2TL model were very robust to β when β lies in the range of small values.

Fig. 5. — The regression performance with respect to the values of β, i.e., β ∈ {10⁻³, 3 × 10⁻³, 10⁻², 3 × 10⁻², 10⁻¹, 0.3, 1, 3, 10, 30}, and the selection of γ and λ. (a) The performance of nback fMRI and emotion fMRI in terms of the CC. (b) The performance of nback fMRI and emotion fMRI in terms of the *RMSE*.

Because feature learning on each modality was viewed as one task in the above multi-task based learning models, we achieved individual prediction results for each modality. To achieve even better performance, we simply combined the two obtained labels (i.e., predicted IQ scores), ${\hat{y}}^{nback}$ and ${\hat{y}}^{emotion}$ , for nback fMRI and emotion fMRI as follows:

\hat{y} = α {\hat{y}}^{nback} + (1 - α) {\hat{y}}^{emotion},

(26)

where α is a non-negative parameter with 0 ≤ α ≤ 1. It is obvious that when α = 0 and α = 1, $\hat{y}$ denotes the individual-modality based result for only emotion fMRI and nback fMRI, respectively. We then tested all other values of α, ranging from 0.1 to 0.9 at a step size of 0.1. Fig. 6 presents the regression results of our proposed model, including the CC and RMSE, with respect to different values of α. As we can see from Fig. 6, the regression performance of combining two modalities as in (26) was better than the individual-modality based ones.

Fig. 6. — The regression results of combining nback fMRI and emotion fMRI as in (26) with respect to different values of α.

D. Discussion and Future Work

Human intelligence can be broadly defined as the ability of comprehending and successfully responding to a wide variety of factors in the external environment [52]. Also, IQ scores can be related to performance on cognitive tasks. Therefore, it is reasonable to examine the relationship between individual variations in IQ and brain activity during the engagement of the two cognitive tasks (i.e., working memory and emotion identification) in this paper. In the following, based on our proposed NM2TL model, we investigated the potential of both brain FCs and ROIs as biomarkers that are highly related to IQ, respectively.

To avoid the risk of overfitting by the above 5-fold CV and demonstrate the reproducibility of the identified biomarkers, we performed an additional experiment using 10-fold CV and then examined the overlap between the biomarkers identified using these two CV techniques. Specifically, we run 10-fold CV similar to that for the 5-fold CV, and the proposed model achieved the CCs and RMSEs of 0.3512 ± 0.0221 and 14.7882 ± 0.1038 for nback fMRI and 0.3506 ± 0.0152 and 14.8011 ± 0.1054 for emotion fMRI. First, to identify the most discriminative FCs, we averaged the absolute values (referred to as the weights, hereafter) of the obtained regression coefficients W by the 5-fold CV and 10-fold CV trials, respectively. These averaged weights were used to measure the relative importance of the corresponding FC features in predicting IQ. With respect to nback fMRI, for the ease of visualization, we selected 150 FCs with the largest averaged weights for each CV technique, and visualized their overlapping FCs using the BrainNet Viewer [53]. The same procedure was performed on emotion fMRI. We found that there were 99 overlapping FCs between the top selected 150 FCs in the two CV techniques for nback fMRI, and 108 overlapping FCs for emotion fMRI. We visualized these overlapping FCs separately in Fig. 7, and they were mainly within or across frontal, parietal, temporal, and occipital lobes, which are in accordance with the previous studies in the literature. For instance, in [54], [55], temporal lobe dysfunction has been shown to be related to attention-deficit/hyperactivity disorder (ADHD), which is significantly correlated with IQ impairments. Several regions within frontal, parietal, temporal, and occipital lobes have been identified as significant predictors of IQ in [16], [17], [56]. Furthermore, to extract the most discriminative ROIs, we computed the ROI weights by summing the weights across all FCs for each ROI. Similarly, for the ease of visualization, we selected 100 ROIs with the largest weights for each CV technique and for each of nback fMRI and emotion fMRI, respectively. We found that there were 85 and 86 overlapping ROIs between the top selected 100 ROIs in the two CV techniques, respectively, for nback fMRI and emotion fMRI, as shown separately in Fig. 8. The result demonstrates again that a large majority of these overlapping ROIs were located in frontal, parietal, temporal, and occipital lobes.

Fig. 7. — The visualization of the overlapping FCs between the top 150 FCs selected separately by the 5-fold CV and 10-fold CV techniques, for (a) nback and (b) emotion modalities, respectively. The left are brain plots of functional graph in the anatomical space, where the selected FCs are represented as the edges. The thicknesses of the edges consensus the corresponding FCs with their weights. The ROIs are color-coded according to the cortical lobes: frontal (FRO), parietal (PAR), temporal (TEM), occipital (OCC), limbic (LIM), cerebellum (CER), and sub-lobar (SUB). The right are matrix plots that represent the total number of the overlapping edges connecting the ROIs across the cortical lobes.

Fig. 8. — The visualization of the overlapping ROIs between the top 100 ROIs selected separately by the 5-fold CV and 10-fold CV techniques, for (a) nback and (b) emotion modalities, respectively. The left are brain plots of functional graph in the anatomical space, where the selected ROIs are represented as the nodes. The sizes of the nodes consensus the corresponding ROIs with their weights. The ROIs are color-coded according to the cortical lobes. The right are bar plots that represent the total number of the overlapping ROIs in each cortical lobe.

In this paper, we focused on only two functional imaging modalities (here our modalities refer to different types of fMRI data collected under multiple paradigms), i.e., nback fMRI and emotion fMRI collected under two paradigms. The PNC dataset also includes resting-state fMRI. An interesting future work is to incorporate all three modalities (i.e., three types of fMRI data from different paradigms) together by means of the proposed NM2TL model or its variants, which may extract more discriminative information across modalities and further improve the IQ regression performance [24]. Another interesting note is that the similarity measure of the data regardless of within each single modality or between modalities could largely affect the contribution of the manifold regularizer to the regression performance. Therefore, in order to reveal the intrinsic structure information inherent in multiple modalities, finding an effective and powerful strategy to learn the similarity of the data would be a high priority for improving our model.

IV. Conclusion

In this paper, based on the general linear regression model, we proposed a new manifold regularized multi-task learning model for joint analysis of multiple datasets. Instead of including all high-dimensional features for the prediction, our proposed model was devised to extract significant ones, resulting in better accuracy of subsequent prediction. In our proposed model, besides employing the group-sparsity regularizer to jointly select a small set of common features across multiple modalities (tasks), we designed a novel manifold regularizer to preserve the structure information both within and between modalities. Furthermore, we validated the effectiveness of our proposed model on the PNC dataset by using fMRI based FC networks in two task conditions for IQ prediction. The experimental results demonstrated that our proposed model achieved superior performance in IQ prediction compared with other competing ones. Moreover, we discovered IQ-relevant biomarkers supported by the previous reports which may account for a proportion of the variance in human intelligence.

Acknowledgments

This work was supported in part by the National Institutes of Health under Grants R01GM109068, R01MH104680, R01MH107354, R01AR059781, R01EB006841, R01EB005846, R01MH103220, R01MH116782, and P20GM103472, and in part by the National Science Foundation under Grant 1539067.

Contributor Information

Li Xiao, Department of Biomedical Engineering, Tulane University.

Julia M. Stephen, Mind Research Network.

Tony W. Wilson, Department of Neurological Sciences, University of Nebraska Medical Center.

Vince D. Calhoun, Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, and also with the Department of Electrical and Computer Engineering, University of New Mexico.

Yu-Ping Wang, Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118 USA.

References

[1].Sporns O, “The human connectome: A complex network,” Ann. NY Acad. Sci, vol. 1224, pp. 109–125, 2011. [DOI] [PubMed] [Google Scholar]
[2].Cao M et al. , “Topological organization of the human brain functional connectome across the lifespan,” Develop. Cogn. Neurosci, vol. 7, pp. 76–93, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Zuo X-N et al. , “Network centrality in the human functional connectome,” Cerebral Cortex, vol. 22, pp. 1862–1875, 2012. [DOI] [PubMed] [Google Scholar]
[4].Allen EA et al. , “Tracking whole-brain connectivity dynamics in the resting state,” Cerebral Cortex, vol. 24, pp. 663–676, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Calhoun VD et al. , “The chronnectome: Time-varying connectivity networks as the next frontier in fMRI data discovery,” Neuron, vol. 84, pp. 262–274, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Yu Q et al. , “Modular organization of functional network connectivity in healthy controls and patients with schizophrenia during the resting state,” Frontiers Syst. Neurosci, vol. 5, 2012, Art. no. 103. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Meier TB et al. , “Support vector machine classification and characterization of age-related reorganization of functional brain networks,” NeuroImage, vol. 60, pp. 601–613, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Qiu A et al. , “Manifold learning on brain functional networks in aging,” Med. Image Anal, vol. 20, pp. 52–60, 2015. [DOI] [PubMed] [Google Scholar]
[9].Tian L et al. , “Hemisphereand gender-related differences in small-world brain networks: A resting-state functional MRI study,” NeuroImage, vol. 54, pp. 191–202, 2011. [DOI] [PubMed] [Google Scholar]
[10].Pezoulas VC et al. , “Resting-state functional connectivity and network analysis of cerebellum with respect to IQ and gender,” Frontiers Human Neurosci, vol. 11, 2017, Art. no. 189. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Finn ES et al. , “Functional connectome fingerprinting: Identifying individuals using patterns of brain connectivity,” Nature Neurosci, vol. 18, pp. 1664–1671, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Song M et al. , “Brain spontaneous functional connectivity and intelligence,” NeuroImage, vol. 41, pp. 1168–1176, 2008. [DOI] [PubMed] [Google Scholar]
[13].Barch DM et al. , “Function in the human connectome: Task-fMRI and individual differences in behavior,” NeuroImage, vol. 80, pp. 169–189, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Calhoun VD et al. , “Exploring the psychosis functional connectome: Aberrant intrinsic networks in schizophrenia and bipolar disorder,” Frontiers Psychiatry, vol. 2, 2012, Art. no. 75. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Beaty RE et al. , “Robust prediction of individual creative ability from brain functional connectivity,” Proc. Nat. Acad. Sci, 2018, vol. 115, pp. 1087–1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Greene AS et al. , “Task-induced brain state manipulation improves prediction of individual traits,” Nature Commun, vol. 9, 2018, Art. no. 2807. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Gao S et al. , “Task integration for connectome-based prediction via canonical correlation analysis,” in Proc. IEEE 15th Int. Symp. Biomed. Imag., 2018, pp. 87–91. [Google Scholar]
[18].Buckner RL, Krienen FM, and Yeo BT, “Opportunities and limitations of intrinsic functional connectivity MRI,” Nature Neurosci, vol. 16, pp. 832–837, 2013. [DOI] [PubMed] [Google Scholar]
[19].Finn ES et al. , “Can brain state be manipulated to emphasize individual differences in functional connectivity?” NeuroImage, vol. 160, pp. 140–151, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Kaufmann T et al. , “Delayed stabilization and individualization in connectome development are related to psychiatric disorders,” Nature Neurosci, vol. 20, pp. 513–515, 2017. [DOI] [PubMed] [Google Scholar]
[21].Calhoun VD and Sui J, “Multimodal fusion of brain imaging data: A key to finding the missing link(s) in complex mental illness,” Biol. Psychiatry Cogn. Neurosci. Neuroimag, vol. 1, pp. 230–244, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Adeli E et al. , “Multi-task prediction of infant cognitive scores from longitudinal incomplete neuroimaging data,” NeuroImage, vol. 185, pp. 783–792, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[23].Calhoun VD, Kiehl KA, and Pearlson GD, “Modulation of temporally coherent brain networks estimated using ICA at rest and during cognitive tasks,” Human Brain Mapping, vol. 7, pp. 828–838, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Çetin MS et al. , “Thalamus and posterior temporal lobe show greater inter-network connectivity at rest and across sensory paradigms in schizophrenia,” NeuroImage, vol. 97, pp. 117–126, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25].Michael AM et al. , “A method to fuse fMRI tasks through spatial correlations: Applied to schizophrenia,” Human Brain Mapping, vol. 30, pp. 2512–2529, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26].Zhang D and Shen D, “Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease,” NeuroImage, vol. 59, pp. 895–907, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Jie B et al. , “Manifold regularized multitask feature learning for multimodality disease classification,” Human Brain Mapping, vol. 36, pp. 489–507, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Lei B et al. , “Neuroimaging retrieval via adaptive ensemble manifold learning for brain disease diagnosis,” IEEE J. Biomed. Health Inform, vol. 23, no. 4, pp. 1661–1673, July 2019. [DOI] [PubMed] [Google Scholar]
[29].Zhu X et al. , “A novel relational regularization feature selection method for joint regression and classification in AD diagnosis,” Med. Imag. Anal, vol. 38, pp. 205–214, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].De Sa VR, “Spectral clustering with two views,” in Proc. ICML Workshop Learn. Multiple Views, 2005, pp. 20–27. [Google Scholar]
[31].Lindenbaum O, Yeredora A, Salhovb M, and Averbuchb A, “Multi-view diffusion maps,” Inf. Fusion, vol. 55, pp. 127–149, March 2020. [Google Scholar]
[32].Satterthwaite TD et al. , “Neuroimaging of the Philadelphia neurodevelopmental cohort,” NeuroImage, vol. 86, pp. 544–553, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Satterthwaite TD et al. , “The Philadelphia neurodevelopmental cohort: A publicly available resource for the study of normal and abnormal brain development in youth,” NeuroImage, vol. 124, pp. 1115–1119, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Caruana R, “Multitask learning,” Mach. Learn, vol. 28, pp. 41–75, 1997. [Google Scholar]
[35].Argyriou A and Evgeniou T, “Multi-task feature learning,” in Proc. Adv. Neural Inf. Process. Syst, 2007, pp. 41–48. [Google Scholar]
[36].Tibshirani R, “Regression shrinkage and selection via the lasso: A retrospective,” J. Roy. Statist. Soc, vol. 73, pp. 273–282, 2011. [Google Scholar]
[37].Wang M et al. , “Discovering network phenotype between genetic risk factors and disease status via diagnosis-aligned multi-modality regression method in Alzheimer’s disease,” Bioinformatics, vol. 35, pp. 1948–1957, doi: 10.1093/bioinformatics/bty911. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Nesterov Y, “A method of solving a convex programming problem with convergence rate $O (1 / k^{2})$ ,” Sov. Math. Doklady, vol. 27, pp. 372–376, 1983. [Google Scholar]
[39].Parikh N and Boyd S, “Proximal algorithms,” Found. Trends Optim, vol. 1, pp. 123–231, 2014. [Google Scholar]
[40].Zhu X et al. , “Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification,” IEEE Trans. Biomed. Eng, vol. 63, no. 3, pp. 607–618, March 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[41].Conway A, Kane MJ, and Engle RW, “Working memory capacity and its relation to general intelligence,” Trends Cogn. Sci, vol. 7, pp. 547–552, 2003. [DOI] [PubMed] [Google Scholar]
[42].Barbato M et al. , “Theory of mind, emotion recognition and social perception in individuals at clinical high risk for psychosis: Findings from the NAPLS-2 cohort,” Schizophrenia Res., Cogn, vol. 2, pp. 133–139, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43].Fusar-Poli P et al. , “Functional atlas of emotional faces processing: A voxel-based meta-analysis of 105 functional magnetic resonance imaging studies,” J. Psychiatry Neurosci, vol. 34, pp. 418–432, 2009. [PMC free article] [PubMed] [Google Scholar]
[44].Wilkinson GS and Robertson GJ, Wide Range Achievement Test 4 (WRAT4). Lutz, FL, USA: Pearson, 2006. [Google Scholar]
[45].Zille P, Calhoun VD, and Wang YP, “Enforcing co-expression within a brain-imaging genomics regression framework,” IEEE Trans. Med. Imag, vol. 37, no. 12, pp. 2561–2571, December 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[46].Fang J et al. , “Fast and accurate detection of complex imaging genetics associations based on greedy projected distance correlation,” IEEE Trans. Med. Imag, vol. 37, no. 4, pp. 860–870, April 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[47].Hu W et al. , “Multi-modal brain connectivity study using deep collaborative learning,” in Graphs in Biomedical Image Analysis and Integrating Medical Imaging and Non-Imaging Modalities. Cham, Switzerland: Springer, 2018, vol. 37, pp. 66–73. [Google Scholar]
[48].Xiao L et al. , “Alternating diffusion map based fusion of multimodal brain connectivity networks for IQ prediction,” IEEE Trans. Biomed. Eng, vol. 66, no. 8, pp. 2140–2151, August 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[49].Power JD et al. , “Functional network organization of the human brain,” Neuron, vol. 72, pp. 665–678, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[50].von Luxburg U, “A tutorial on spectral clustering,” Statist. Comput, vol. 17, pp. 395–416, 2007. [Google Scholar]
[51].Cheng H, Liu Z, and Yang J, “Sparsity induced similarity measure for label propagation,” in Proc. IEEE 12th Int. Conf. Comput. Vis., 2009, pp. 317–324. [Google Scholar]
[52].Neisser U et al. , “Intelligence: Knowns and unknowns,” Amer. Psychol, vol. 51, pp. 77–101, 1996. [Google Scholar]
[53].Xia M, Wang J, and He Y, “BrainNet viewer: A network visualization tool for human brain connectomics,” PloS One, vol. 8, no. 7, 2013. Art. no. e68910. [DOI] [PMC free article] [PubMed] [Google Scholar]
[54].Rubia K et al. , “Temporal lobe dysfunction in medication-naive boys with attention-deficit/hyperactivity disorder during attention allocation and its relation to response variability,” Biol. Psychiatry, vol. 62, pp. 999–1006, 2007. [DOI] [PubMed] [Google Scholar]
[55].Haier RJ et al. , “The neuroanatomy of general intelligence: Sex matters,” NeuroImage, vol. 25, pp. 320–327, 2005. [DOI] [PubMed] [Google Scholar]
[56].Hearne LJ, Mattingley JB, and Cocchi L, “Functional brain networks related to individual differences in human intelligence at rest,” Sci. Rep, vol. 6, 2016, Art. no. 32328. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] [1].Sporns O, “The human connectome: A complex network,” Ann. NY Acad. Sci, vol. 1224, pp. 109–125, 2011. [DOI] [PubMed] [Google Scholar]

[R2] [2].Cao M et al. , “Topological organization of the human brain functional connectome across the lifespan,” Develop. Cogn. Neurosci, vol. 7, pp. 76–93, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Zuo X-N et al. , “Network centrality in the human functional connectome,” Cerebral Cortex, vol. 22, pp. 1862–1875, 2012. [DOI] [PubMed] [Google Scholar]

[R4] [4].Allen EA et al. , “Tracking whole-brain connectivity dynamics in the resting state,” Cerebral Cortex, vol. 24, pp. 663–676, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Calhoun VD et al. , “The chronnectome: Time-varying connectivity networks as the next frontier in fMRI data discovery,” Neuron, vol. 84, pp. 262–274, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Yu Q et al. , “Modular organization of functional network connectivity in healthy controls and patients with schizophrenia during the resting state,” Frontiers Syst. Neurosci, vol. 5, 2012, Art. no. 103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Meier TB et al. , “Support vector machine classification and characterization of age-related reorganization of functional brain networks,” NeuroImage, vol. 60, pp. 601–613, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Qiu A et al. , “Manifold learning on brain functional networks in aging,” Med. Image Anal, vol. 20, pp. 52–60, 2015. [DOI] [PubMed] [Google Scholar]

[R9] [9].Tian L et al. , “Hemisphereand gender-related differences in small-world brain networks: A resting-state functional MRI study,” NeuroImage, vol. 54, pp. 191–202, 2011. [DOI] [PubMed] [Google Scholar]

[R10] [10].Pezoulas VC et al. , “Resting-state functional connectivity and network analysis of cerebellum with respect to IQ and gender,” Frontiers Human Neurosci, vol. 11, 2017, Art. no. 189. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Finn ES et al. , “Functional connectome fingerprinting: Identifying individuals using patterns of brain connectivity,” Nature Neurosci, vol. 18, pp. 1664–1671, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Song M et al. , “Brain spontaneous functional connectivity and intelligence,” NeuroImage, vol. 41, pp. 1168–1176, 2008. [DOI] [PubMed] [Google Scholar]

[R13] [13].Barch DM et al. , “Function in the human connectome: Task-fMRI and individual differences in behavior,” NeuroImage, vol. 80, pp. 169–189, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Calhoun VD et al. , “Exploring the psychosis functional connectome: Aberrant intrinsic networks in schizophrenia and bipolar disorder,” Frontiers Psychiatry, vol. 2, 2012, Art. no. 75. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Beaty RE et al. , “Robust prediction of individual creative ability from brain functional connectivity,” Proc. Nat. Acad. Sci, 2018, vol. 115, pp. 1087–1092. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Greene AS et al. , “Task-induced brain state manipulation improves prediction of individual traits,” Nature Commun, vol. 9, 2018, Art. no. 2807. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Gao S et al. , “Task integration for connectome-based prediction via canonical correlation analysis,” in Proc. IEEE 15th Int. Symp. Biomed. Imag., 2018, pp. 87–91. [Google Scholar]

[R18] [18].Buckner RL, Krienen FM, and Yeo BT, “Opportunities and limitations of intrinsic functional connectivity MRI,” Nature Neurosci, vol. 16, pp. 832–837, 2013. [DOI] [PubMed] [Google Scholar]

[R19] [19].Finn ES et al. , “Can brain state be manipulated to emphasize individual differences in functional connectivity?” NeuroImage, vol. 160, pp. 140–151, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Kaufmann T et al. , “Delayed stabilization and individualization in connectome development are related to psychiatric disorders,” Nature Neurosci, vol. 20, pp. 513–515, 2017. [DOI] [PubMed] [Google Scholar]

[R21] [21].Calhoun VD and Sui J, “Multimodal fusion of brain imaging data: A key to finding the missing link(s) in complex mental illness,” Biol. Psychiatry Cogn. Neurosci. Neuroimag, vol. 1, pp. 230–244, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Adeli E et al. , “Multi-task prediction of infant cognitive scores from longitudinal incomplete neuroimaging data,” NeuroImage, vol. 185, pp. 783–792, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] [23].Calhoun VD, Kiehl KA, and Pearlson GD, “Modulation of temporally coherent brain networks estimated using ICA at rest and during cognitive tasks,” Human Brain Mapping, vol. 7, pp. 828–838, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Çetin MS et al. , “Thalamus and posterior temporal lobe show greater inter-network connectivity at rest and across sensory paradigms in schizophrenia,” NeuroImage, vol. 97, pp. 117–126, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] [25].Michael AM et al. , “A method to fuse fMRI tasks through spatial correlations: Applied to schizophrenia,” Human Brain Mapping, vol. 30, pp. 2512–2529, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] [26].Zhang D and Shen D, “Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease,” NeuroImage, vol. 59, pp. 895–907, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Jie B et al. , “Manifold regularized multitask feature learning for multimodality disease classification,” Human Brain Mapping, vol. 36, pp. 489–507, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Lei B et al. , “Neuroimaging retrieval via adaptive ensemble manifold learning for brain disease diagnosis,” IEEE J. Biomed. Health Inform, vol. 23, no. 4, pp. 1661–1673, July 2019. [DOI] [PubMed] [Google Scholar]

[R29] [29].Zhu X et al. , “A novel relational regularization feature selection method for joint regression and classification in AD diagnosis,” Med. Imag. Anal, vol. 38, pp. 205–214, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].De Sa VR, “Spectral clustering with two views,” in Proc. ICML Workshop Learn. Multiple Views, 2005, pp. 20–27. [Google Scholar]

[R31] [31].Lindenbaum O, Yeredora A, Salhovb M, and Averbuchb A, “Multi-view diffusion maps,” Inf. Fusion, vol. 55, pp. 127–149, March 2020. [Google Scholar]

[R32] [32].Satterthwaite TD et al. , “Neuroimaging of the Philadelphia neurodevelopmental cohort,” NeuroImage, vol. 86, pp. 544–553, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Satterthwaite TD et al. , “The Philadelphia neurodevelopmental cohort: A publicly available resource for the study of normal and abnormal brain development in youth,” NeuroImage, vol. 124, pp. 1115–1119, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Caruana R, “Multitask learning,” Mach. Learn, vol. 28, pp. 41–75, 1997. [Google Scholar]

[R35] [35].Argyriou A and Evgeniou T, “Multi-task feature learning,” in Proc. Adv. Neural Inf. Process. Syst, 2007, pp. 41–48. [Google Scholar]

[R36] [36].Tibshirani R, “Regression shrinkage and selection via the lasso: A retrospective,” J. Roy. Statist. Soc, vol. 73, pp. 273–282, 2011. [Google Scholar]

[R37] [37].Wang M et al. , “Discovering network phenotype between genetic risk factors and disease status via diagnosis-aligned multi-modality regression method in Alzheimer’s disease,” Bioinformatics, vol. 35, pp. 1948–1957, doi: 10.1093/bioinformatics/bty911. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].Nesterov Y, “A method of solving a convex programming problem with convergence rate $O (1 / k^{2})$ ,” Sov. Math. Doklady, vol. 27, pp. 372–376, 1983. [Google Scholar]

[R39] [39].Parikh N and Boyd S, “Proximal algorithms,” Found. Trends Optim, vol. 1, pp. 123–231, 2014. [Google Scholar]

[R40] [40].Zhu X et al. , “Subspace regularized sparse multitask learning for multiclass neurodegenerative disease identification,” IEEE Trans. Biomed. Eng, vol. 63, no. 3, pp. 607–618, March 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] [41].Conway A, Kane MJ, and Engle RW, “Working memory capacity and its relation to general intelligence,” Trends Cogn. Sci, vol. 7, pp. 547–552, 2003. [DOI] [PubMed] [Google Scholar]

[R42] [42].Barbato M et al. , “Theory of mind, emotion recognition and social perception in individuals at clinical high risk for psychosis: Findings from the NAPLS-2 cohort,” Schizophrenia Res., Cogn, vol. 2, pp. 133–139, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] [43].Fusar-Poli P et al. , “Functional atlas of emotional faces processing: A voxel-based meta-analysis of 105 functional magnetic resonance imaging studies,” J. Psychiatry Neurosci, vol. 34, pp. 418–432, 2009. [PMC free article] [PubMed] [Google Scholar]

[R44] [44].Wilkinson GS and Robertson GJ, Wide Range Achievement Test 4 (WRAT4). Lutz, FL, USA: Pearson, 2006. [Google Scholar]

[R45] [45].Zille P, Calhoun VD, and Wang YP, “Enforcing co-expression within a brain-imaging genomics regression framework,” IEEE Trans. Med. Imag, vol. 37, no. 12, pp. 2561–2571, December 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] [46].Fang J et al. , “Fast and accurate detection of complex imaging genetics associations based on greedy projected distance correlation,” IEEE Trans. Med. Imag, vol. 37, no. 4, pp. 860–870, April 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] [47].Hu W et al. , “Multi-modal brain connectivity study using deep collaborative learning,” in Graphs in Biomedical Image Analysis and Integrating Medical Imaging and Non-Imaging Modalities. Cham, Switzerland: Springer, 2018, vol. 37, pp. 66–73. [Google Scholar]

[R48] [48].Xiao L et al. , “Alternating diffusion map based fusion of multimodal brain connectivity networks for IQ prediction,” IEEE Trans. Biomed. Eng, vol. 66, no. 8, pp. 2140–2151, August 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] [49].Power JD et al. , “Functional network organization of the human brain,” Neuron, vol. 72, pp. 665–678, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] [50].von Luxburg U, “A tutorial on spectral clustering,” Statist. Comput, vol. 17, pp. 395–416, 2007. [Google Scholar]

[R51] [51].Cheng H, Liu Z, and Yang J, “Sparsity induced similarity measure for label propagation,” in Proc. IEEE 12th Int. Conf. Comput. Vis., 2009, pp. 317–324. [Google Scholar]

[R52] [52].Neisser U et al. , “Intelligence: Knowns and unknowns,” Amer. Psychol, vol. 51, pp. 77–101, 1996. [Google Scholar]

[R53] [53].Xia M, Wang J, and He Y, “BrainNet viewer: A network visualization tool for human brain connectomics,” PloS One, vol. 8, no. 7, 2013. Art. no. e68910. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] [54].Rubia K et al. , “Temporal lobe dysfunction in medication-naive boys with attention-deficit/hyperactivity disorder during attention allocation and its relation to response variability,” Biol. Psychiatry, vol. 62, pp. 999–1006, 2007. [DOI] [PubMed] [Google Scholar]

[R55] [55].Haier RJ et al. , “The neuroanatomy of general intelligence: Sex matters,” NeuroImage, vol. 25, pp. 320–327, 2005. [DOI] [PubMed] [Google Scholar]

[R56] [56].Hearne LJ, Mattingley JB, and Cocchi L, “Functional brain networks related to individual differences in human intelligence at rest,” Sci. Rep, vol. 6, 2016, Art. no. 32328. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Manifold Regularized Multi-Task Learning Model for IQ Prediction From Two fMRI Paradigms

Li Xiao

Julia M Stephen

Tony W Wilson

Vince D Calhoun

Yu-Ping Wang

Abstract

Objective:

Methods:

Results:

Conclusion and Significance:

I. Introduction

Notations:

II. Methods

A. Classical Multi-Task Learning (MTL)

B. Manifold Regularized Multi-Task Learning (M2TL)

C. Proposed New M2TL (NM2TL)

Fig. 1.

Remark 1:

D. Optimization Algorithm

Fig. 2.

Algorithm 1:

III. Experimental Results

A. Data Preprocessing

Fig. 3.

B. Experimental Settings

C. Regression Results

TABLE I.

Fig. 4.

Fig. 5.

Fig. 6.

D. Discussion and Future Work

Fig. 7.

Fig. 8.

IV. Conclusion

Acknowledgments

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases