Abstract
Objective:
To explain individual differences in development, behavior, and cognition, most previous studies focused on projecting resting-state functional MRI (fMRI) based functional connectivity (FC) data into a low-dimensional space via linear dimensionality reduction techniques, followed by executing analysis operations. However, linear dimensionality analysis techniques may fail to capture nonlinearity of brain neuroactivity. Moreover, besides resting-state FC, FC based on task fMRI can be expected to provide complementary information. In view of the two motivations, we consider to nonlinearly fuse resting-state and task-based FC networks (FCNs) to seek a better representation.
Methods:
We propose a framework based on alternating diffusion map (ADM), which extracts geometry-preserving low-dimensional embeddings that successfully parameterize the intrinsic variables driving the phenomenon of interest. Specifically, we first separately present resting-state and task-based FCNs by symmetric positive definite matrices using sparse inverse covariance estimation for each subject, and then utilize the ADM to fuse them to extract significant low-dimensional embeddings, which are used as fingerprints to identify individuals.
Results:
The proposed framework is validated on the Philadelphia Neurodevelopmental Cohort data, where we conduct extensive experimental study on resting-state and fractal n-back task fMRI for the classification of intelligence quotient (IQ). The fusion of resting-state and n-back task fMRI by the proposed framework achieves better classification accuracy than any single fMRI does, and the proposed framework is shown to outperform several other data fusion methods.
Conclusion and Significance:
This paper is the first to demonstrate a successful extension of the ADM to fuse resting-state and task-based fMRI data for automatic prediction of IQ.
Keywords: Alternating diffusion map, classification, data fusion, dimensionality reduction, fMRI, functional connectivity, networks
I. Introduction
Over the past few decades, there has been great attention to data fusion techniques and their applications in various fields; see, e.g., [1]–[5] and references therein. Integrating multiple datasets acquired by different sensors for a phenomenon of interest may yield more informative knowledge than any individual dataset does, because the multiple datasets can provide complementary information of the observed phenomenon from several different views. A straightforward approach is to simply concatenate feature vectors from multiple datasets into a single feature vector. However, such a concatenation scheme is very sensitive to the scaling of the data. Multivariate approaches, such as canonical correlation analysis (CCA) [6], independent component analysis (ICA) [7], and partial least square (PLS) [8], have been independently developed by maximizing the correlation between the linear combinations of features from two datasets. Their penalized versions for high-dimensional settings and extensions to multiple datasets have been also proposed in [9]–[12]. To analyze the joint information between different tasks and different brain regions in multiple functional MRI (fMRI) datasets, Calhoun and his collaborators [13]–[17] have proposed many ICA-based multitask data fusion approaches (e.g., joint ICA and multimodal CCA+joint ICA) according to various optimization assumptions. All the aforementioned approaches are based on linear mixture models, so they cannot optimally handle datasets that appear to have nonlinear structures and relations. To overcome this issue, many kernel based data fusion approaches have been studied in recent years [18]–[25], where each dataset is individually used to construct a kernel matrix, and then the obtained kernel matrices are combined in linear or nonlinear ways to seek a unified kernel matrix that best represents all available information. A typical kernel based approach is multiple kernel learning [18], [19], which finds the unified kernel matrix by linearly combining the multiple kernel matrices. However, this approach assumes that the complementary information from multiple data sources is linearly provided, which might not necessarily be true in practice. Moreover, to learn an optimal unified kernel matrix, tuning the weight coefficient assigned to each single kernel matrix is computationally intensive. In [20]–[25], nonlinear kernel fusion processes have been proposed, which exploit the complementary information nonlinearly into the intrinsic low-dimensional geometry, but avoid assigning weight coefficient to each single kernel matrix.
An approach of particular interest in this paper is alternating diffusion map (ADM), which was proposed more recently in [23]–[25]. The ADM is based on the framework of diffusion map (DM) [26], one class of manifold learning algorithms [27], and can achieve nonlinear dimensionality reduction in such a way that the intrinsic common structures underlying multiple high-dimensional datasets is maintained. More concretely, the ADM takes advantage of the product of the kernel matrices constructed separately by each dataset based on a stochastic Markov matrix to produce a unified representation, which can be interpreted as employing diffusion processes on each dataset in an alternating manner. This allows to extract the common latent variables across multiple datasets that are assumed to drive the observed phenomenon, while filtering out other hidden variables that are sensor-specific and thought of as nuisance, irrelevant to the phenomenon. Hence, the ADM can provide a more reliable description of the phenomenon. So far the ADM has proven to be a powerful tool in voice detection from audio-visual signals [28], [29], Alzheimer’s disease classification from multiple electroencephalography (EEG) signals [30], and sleep stage classification from EEG and respiration signals [31]. Here we show that the ADM can also be adapted to multimodal fMRI data. To our knowledge, this paper is the first to demonstrate a successful extension of the ADM to fuse resting-state and task-based fMRI data for the prediction of intelligence quotient (IQ).
The proposed framework in this paper begins with a preprocessing stage in which a brain functional connectivity network (FCN) is individually extracted for each subject from fMRI data. Specifically, the brain is graphically depicted as a network with regions of interest (ROIs) as the nodes and functional connectivities (FCs) as the edges, where the FC between two nodes is defined as statistical dependence between the blood oxygenation level-dependent (BOLD) fMRI time series in the two ROIs. Different from conventionally representing the FCN by a sample covariance matrix of the multi-ROI time series, we represent it by a symmetric positive definite (SPD) matrix [32]–[34], which is computed based on sparse inverse covariance estimation using the graphical least absolute shrinkage and selection operator (GLASSO) algorithm [35]. Accordingly, two sets of SPD matrices are respectively derived from resting-state and task-based fMRI datasets. The FCN organization varies between individuals, and accordingly acts as a “fingerprint” of a subject [36]. Recent works [37]–[39] have also studied the relations between the functional and structural brain connectivity patterns to improve the reliability of individual “fingerprint”.
We therefore store the SPD matrices of all subjects and treat them as new features from fMRI data for subsequent analysis. However, the dimension of the SPD matrix is usually much larger than the number of subjects. For example, there are 34716 FCs with 264 ROIs in our study. If we directly use the SPD matrices to train a classifier, it will suffer from the curse of dimensionality, which often leads to overfitting and poor generalization performance. Fortunately, despite individual variation, human brains do in fact share common connectivity patterns across different subjects, i.e., variations of the SPD matrices representing brain connectivity are driven by a small subset of unknown parameters. It suggests that we adopt nonlinear dimensionality reduction (e.g., manifold learning) algorithms to extract the intrinsic variables of the SPD matrices prior to training a classifier. In this paper, based on the two sets of SPD matrices derived from two fMRI datasets, respectively, we use the ADM to fuse them to find meaningful low-dimensional embeddings, so that their shared source of variability is well kept while nuisance specific to any single set of SPD matrices is reduced. These low-dimensional embeddings are then used as significant fingerprints to classify individuals of different behaviors and cognitions (e.g., IQ).
As the set of SPD matrices is known to form a Riemannian manifold instead of a full Euclidean space, geometric distances, such as affine-invariant Riemannian distance [40] and root stein divergence distance [41]), have been proposed to measure the similarities of SPD matrices by considering the underlying manifold where they reside. These distances can better discover the Riemannian geometry than the traditional Euclidean distance, and have been used successfully to characterize FC differences [32]–[34], [42]. In this paper, we adopt a geodesic distance on SPD matrices, namely the Log-Euclidean distance [43], to measure the similarities of SPD matrices in the ADM because of its computational efficiency. The Euclidean distance and the Cholesky distance [44] are tested for comparison.
We finally validate our proposed framework by fusing two fMRI datasets (i.e., resting-state and fractal n-back task fMRI) from the publicly available Philadelphia Neurodevelopmental Cohort (PNC) data [47], [48] to build a predictor for subjects with different IQs. The subjects’ IQ scores were assessed by the Wide Range Achievement Test (WRAT) administered in the PNC. The WRAT is an achievement test that measures an individual’s learning ability including reading, spelling, and arithmetic [49], which can provide a reliable estimate of IQ. A large body of clinical studies has emerged to argue that distinct patterns of brain functional activity account for the proportion of difference of IQ among individuals [50], [51]. These findings suggest that the ADM of fusing multiple sets of FCNs in this paper has the potential to automate the task of classifying populations of low and high IQs. As will be seen experimentally, the classification results well demonstrate the advantage of our proposed framework. Specifically, the ADM achieves superior classification performance over that of the DM (using any single set of FCNs) and several existing fusion methods. In addition, the effectiveness of incorporating the log-Euclidean distance into the DM and the ADM is verified in comparison to the Euclidean and Cholesky distances.
The rest of this paper is organized as follows. In Section II, the proposed framework is presented, including the brain FCN construction and two manifold learning methods (i.e., the DM and the ADM). In Section III, a simulation example is first illustrated, and then the experimental results on the PNC data are shown. The discussions are in Section IV, followed by the conclusion in Section V.
Notations: Uppercase boldface, lowercase boldface, and normal italic letters are used to denote matrices, vectors, and scalars, respectively. The superscript T denotes the transpose of a vector or matrix. Ai,j denotes the (i, j)-th entry of matrix A, and a(i) denotes the i-th entry of vector a. We denote the set of real numbers as .
II. Methods
In this section, an overview of the proposed framework is outlined in Fig. 1. There are three major steps: (1) brain FCNs are extracted as SPD matrices from each fMRI dataset; (2) the ADM is applied for fusing two sets of FCNs derived from two fMRI datasets to find a meaningful low-dimensional representation; and (3) support vector machine (SVM) classification is carried out based on the low-dimensional embeddings. In the following, we will present the details of these steps.
Fig. 1:
Flowchart of the proposed framework. and denote two fMRI datasets for the same n subjects. is a SPD matrix that represents the brain FCN of subject i based on the j-th fMRI dataset. The ADM is applied for the fusion of brain FCNs derived from the two fMRI datasets to obtain a low-dimensional representation Z.
A. Brain FCN representation using SPD matrices
The BOLD fMRI signal, as a time series, measures neural activity by detecting changes in blood flow at many spatial locations of the brain. In fMRI, studies can focus on specific tasks as well as at rest, and brain networks are usually built based on the BOLD signals to describe FC across brain regions. The network nodes are brain ROIs, and the FC between two nodes is defined as temporal covariance or correlation of fMRI time series in the two nodes.
Let be a BOLD fMRI time series for a subject, where m is the number of time points and is a p-dimensional vector, corresponding to an observation of p brain ROIs at the i-th time point. Assume that F has been normalized to have zero mean and unit variance along each row. As described above, the FCN is represented by a covariance matrix R of the multi-ROI time series. To estimate R, we generally obtain the estimation of its inverse S = R−1 by maximizing the penalized log-likelihood over the space of all p × p SPD matrices:
| (1) |
where is the sample covariance matrix, and det(·), tr(·), ∥·∥1 denote the determinant, the trace, the sum of the absolute values of the entries of a matrix, respectively. In (1), the regularization parameter λ > 0 controls the tradeoff between the degree of sparsity and the log-likelihood estimation of S. In this paper, we use the Bayesian Information Criterion (BIC) [52] to select the optimum λ, and the maximization problem (1) can be efficiently solved via the graphical LASSO (GLASSO) algorithm [35] (its Matlab software package: http://statweb.stanford.edu/~tibs/glasso/).
B. Nonlinear dimensionality reduction of FCNs
From (1), we can individually compute the SPD matrices Ri, i = 1, 2, ⋯ , n, to represent the FCNs of n subjects from one fMRI dataset. In what follows, we shall use the terms “SPD matrices” and “FCNs” interchangeably. The SPD matrices are treated as the features extracted from subjects’ fMRI data for subsequent analysis, and considered as points distributed in a high-dimensional space. We have to reduce the dimension of these SPD matrices by finding the significant features, since many of the features may be noninformative or redundant while increasing computational cost and classification complexity. In spite of individual variation, brains do share some common FC patterns across different subjects. Therefore, the SPD matrices used to represent FCNs shall have some similar structures [53], and their variations only depend on a small subset of unknown parameters. Inspired by this evidence, we aim to generate a low-dimensional representation of the SPD matrices. Since brain activity involves multiple nonlinear neural dynamics, we adopt here a nonlinear dimensionality reduction algorithm for best representing the high-dimensional SPD matrices by their low-dimensional embeddings, where the intrinsic geometry of the SPD matrices can be well preserved in the embedding coordinates. The details are elaborated as follows.
1). Gaussian kernel function:
In machine learning, kernel functions are often used to define some similarity measures to learn the relations among subjects via the kernel trick, and in particular the Gaussian kernels are widely used. In this paper, we calculate a similarity matrix by using the Gaussian kernel function with a distance of SPD matrices, i.e.,
| (2) |
where σ > 0 is the bandwidth of the Gaussian kernel function and d(·, ·) is a distance chosen by the user to measure two SPD matrices. This construction defines a weighted graph, in which the nodes correspond to the n subjects , and is the weight matrix of the graph.
Different definitions of d(·, ·) would lead to different similarity matrices. An appropriate distance is crucial to perform the following dimension reduction while revealing the intrinsic geometry of the SPD matrices, since the set of SPD matrices is restricted to some Riemannian manifold, not a full Euclidean space. For ease of computation, we investigate one commonly used geodesic distance, i.e., the log-Euclidean distance (LEU) [43], that considers the specific geometry of the manifold. The LEU between Ri and Rj is given by
| (3) |
where, for a SPD matrix with its eigenvalue decomposition R = U · Diag(μ1, ⋯ , μp) · UT. the matrix logarithm of R is defined by log(A) = U · Diag(log(μ1), ⋯ , log(μp)) · UT. and ∥·∥F denotes the Frobenius matrix norm. For comparison, we consider the Cholesky distance (CK) [44] and the traditional Euclidean distance (EU) as well. The CK is given by
| (4) |
where Rlow, denotes the low triangular matrix obtained by the Cholesky decomposition of R, i.e., . The EU is given by
| (5) |
The LEU (3) is known as one of the most widely adopted distances for SPD matrices, because it is a geodesic distance induced by Riemannian metrics and provides a more accurate distance measure than the EU (5). Apart from these geodesic distances, a number of other distances (e.g., the CK (4)) that do not necessarily arise from Riemannian metrics can also be used to capture the nonlinearity among SPD matrices. Different from the LEU that is derived based on matrix logarithm, the CK induces a reparameterization measure of SPD matrices based on matrix decomposition, because a SPD matrix has a unique Cholesky decomposition. It is shown in [44] that the Gaussian kernels (2) with the LEU, the CK, and the EU are all positive semidefinite on manifolds for any σ > 0, such that one would be able to freely tune σ to reflect the data distribution.
2). DM for single FCN dataset:
Considering that the data points lie on an intrinsically low-dimensional manifold embedded into , we use the DM [26] to obtain their low-dimensional embeddings with d ≪ D. The DM is a graph-based nonlinear dimensionality reduction method, which extends and enhances ideas from other manifold learning methods by deploying a stochastic Markov matrix based on the similarities between data points in high-dimensional space to identify a low-dimensional representation that captures the intrinsic geometry in the dataset. The procedure of the DM is demonstrated in Fig. 2 and described in details below.
Fig. 2:
The DM for a single dataset , where K is a normalized kernel matrix (6), and U, L are the eigendecomposition parts of K (7). Assume that the original high-dimensional data points approximately reside a low-dimensional manifold embedded into . With the DM, they are mapped into the geometry-preserving low-dimensional embeddings .
Based on the similarity matrix W calculated in (2), we first get a normalized kernel matrix K by
| (6) |
such that each row sums to 1, where is a diagonal matrix with . Hence, we can imagine a Markov chain on the graph with the transition matrix K, in the sense that the (i, j)-th entry Ki,j represents the transition probability from node i to node j.
It is easy to check that K is similar to the positive semidefinite matrix Q−1/2WQ−1/2. As such, let λ0 ≥ λ1 ≥ ⋯ ≥ λn–1 ≥ 0 and ψ0, ψ1, ⋯, ψn–1 denote the ordered eigenvalues and corresponding normalized eigenvectors of K, i.e.,
| (7) |
where U = [ψ0, ψ1, ⋯ , ψn–1] and L = diag(λ0, λ1, ⋯ , λn–1). Moreover, we can readily verify that the largest eigenvalue λ0 is equal to 1 and its associated eigenvector ψ0 is a constant vector. Then, a compact representation, referred to as DM, is achieved by keeping only the d largest non-trivial eigenvalues and eigenvectors of K, i.e.,
| (8) |
where d is an estimated dimension of the embedding space.
The key idea in the DM is that the Euclidean distance between two embeddings (e.g., yi and yj) is approximately equal to the diffusion distance between the two corresponding data points (e.g., Ri and Rj) in the original space. The diffusion distance between the i-th and j-th subjects is defined as the weighted L2 distance between the transition probabilities of node i and node j, i.e.,
| (9) |
where φ stands for the stationary distribution of K, calculated by for 1 ≤ l ≤ n. The diffusion distance is a metric that can reveal the intrinsic geometry among data points. It is robust to noise as well, since the diffusion could be viewed as a nonlinear process that averages all possible connectivity between pairs of data points on the graph.
3). ADM based fusion of two FCN datasets:
The ADM [23]–[25] is a recently developed data fusion technique on the basis of the DM framework. The purpose of the ADM is to fuse two datasets to find a more coherent and accurate representation, in the sense that the information from the two datasets is diffused to yield the underlying common information (which is assumed to drive the phenomenon of interest), and meanwhile nuisance specific to any single dataset is reduced. Let and be the FCNs extracted from two different fMRI datasets for the same n subjects, respectively. By using the ADM described below, we can obtain low-dimensional embeddings .
In the same way as in (2), we separately construct similarity matrices of the two datasets: for all 1 ≤ i, j ≤ n and l = 1,2,
| (10) |
where σl is the tuneable kernel bandwidth and d(·, ·) is a chosen metric on the data points. From the similarity matrices, we get the normalized kernel matrices K(1) and K(2) as in (6), respectively. According to the ADM in [25], a unified kernel matrix is given by
| (11) |
Since is real and symmetric, it has real eigenvalues, and the eigenvectors are real and orthogonal to each other. As such, let be the eigenvalues of with decreasing magnitude, and be the corresponding normalized eigenvectors. Hence, a low-dimensional representation (referred to as ADM) for the common structures in the datasets is obtained by taking its eigenvectors corresponding to the largest eigenvalues in magnitude, i.e.,
| (12) |
where is an estimated dimension of the embedding space.
In the ADM, a Markov chain on a graph is first built for each dataset, where the subjects represent the nodes of the graph, and the normalized kernel matrix is viewed as the transition matrix of the Markov chain on the graph. In other words, we obtain two graphs with the same set of nodes and two different transition matrices (i.e., K(1) and K(2)). Then, we incorporate the information from the two datasets by the product of the transition matrices, which takes into account all the various connectivities of two nodes hopping within and across the two graphs. It is shown in [25] that the efficient low-dimensional embeddings (12) based on the matrix characterize the common structures (common latent variables) between the manifolds underlying the different datasets, and in the meantime attenuate the differences (sensor-specific variables) between the manifolds. The interested reader can find a theoretical foundation of the ADM in [23], [25].
4). Out-of-sample extension:
In the above, we present how to use the DM (or ADM) to provide a mapping for a training set with n FCNs (or 2n FCNs and ) to a d-dimensional (or -dimensional) space. In order to extend the mapping to new data points (unlabeled FCNs) without reapplying a large-scale eigendecomposition on the entire data, we introduce the Nyström extension [22], [32], [45], [46], which is an efficient non-parametric solution widely used for the methods involving spectral decomposition. Accordingly, for the DM and the ADM, respectively, we derive an explicit mapping between new FCNs and the low-dimensional embedding space obtained from the training set as follows.
Given a new FCN Rn+1, we want to extend the DM mapping to get yn+1. We first calculate the similarities Wn+1, j between Rn+1 and Rj, j = 1, 2, ⋯ , n, and then normalize them to get Kn+1, j for 1 ≤ j ≤ n, i.e.,
| (13) |
The extended eigenvectors for the new data point are approximated as weighted sums of the original eigenvectors, i.e.,
| (14) |
and the embedding yn+1 is given by
| (15) |
Given new FCNs and for a two-dataset scenario, we want to extend the ADM mapping to get zn+1. Similar to (13), we calculate, for 1 ≤ j ≤ n and l = 1, 2,
| (16) |
Let for l = 1, 2, and
| (17) |
Then, the extension is given by
| (18) |
and the embedding zn+1 is
| (19) |
C. Classification using SVM
In this paper, classification is explored as a potential application to validate our proposed framework, in that if the intrinsic manifold structures of data are faithfully preserved by the proposed framework, the obtained embeddings of the original high-dimensional data points that belong to different classes will be separated far from each other in the low-dimensional embedding space. The classification performance is assessed by using a linear kernel SVM with default hyper-parameters on the embeddings. We remark that we here choose a simple linear kernel SVM classifier for three reasons: 1) since the DM and the ADM mentioned above provide embedded features globally in linear coordinates, we limited the tests to linear classifiers; 2) SVM is known as one of the state-of-the-art classifiers and has been extensively used in biomedical data analysis because of its accurate classification performance [54], [55]; and 3) although there are many other advanced classifiers, the emphasis in this paper is the superior performance of the proposed framework, not the optimal classification scheme.
III. Experimental results and discussion
A. Simulation result
Let x, y, θ be three statistically independent uniform random variables on (0, 1). We generate n = 2000 samples (xi, yi, θi) of (x, y, θ), and define two sets of simulated samples in by
and
for 1 ≤ i ≤ n, where Γ is an orthonormal transformation matrix. Assume that these two datasets are observations acquired by two sensors, respectively, where θ is a common variable, and x and y are the variables that are sensor-specific. As can be seen in the first and third columns of Fig. 3, each set of simulated samples lies on a 2-dimensional Swiss roll embedded in .
Fig. 3:
Scatter plots of two 3-dimensional Swiss roll datasets and their 2-dimensional embeddings. Top row: subfigures obtained from . Bottom row: subfigures obtained from . For example, (a), (c) are scatter plots of , and (b), (d) are their 2-dimensional embeddings. Points in the first two columns are colored according to the common variable θ, and those in the last two columns are colored according to their own sensor-specific variables.
We first apply the DM separately to each dataset, and the 2-dimensional embeddings are presented in the 2nd and 4th columns of Fig. 3. The subfigures in each row are obtained from the same dataset, i.e., (a)–(d) are scatter plots of and their embeddings, and (e)–(h) are scatter plots of and their embeddings. In the first two columns of Fig. 3, data points are colored according to θi. In subfigures (c), (d), data points are colored according to xi. In subfigures (g), (h), data points are colored according to yi. One can see that all the scatter plots of the 2-dimensional embeddings exhibit a smooth color gradient, which implies accurate parametrization of both the common and the sensor-specific variables for each dataset.
We next apply the ADM to fuse the two datasets. The 2-dimensional embeddings are shown with different color coding schemes in Fig. 4. The data points in the leftmost subfigure are colored according to the common variable θ, while those in the middle and the rightmost subfigures are colored according to the sensor-specific variables x and y, respectively. We observe that the color gradient is smooth only for the common variable. Equivalently, this means that the embeddings obtained by the ADM successfully extract a parametrization of the common variable θ, while filtering out the nuisance variables x and y that are specific to each dataset.
Fig. 4:
Scatter plots of the 2-dimensional embeddings obtained by the ADM on the two datasets. Points in the subfigures (from left to right) are respectively colored according to the common variable θ and sensor-specific variables x, y.
B. Application to IQ classification
1). Data preprocessing and experimental setting:
The P-NC [47], [48] is a large-scale collaborative study of child development between the Children’s Hospital of Philadelphia and the Brain Behavior Laboratory at the University of Pennsylvania. The publicly available PNC data were downloaded from dbGap (www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000607.v1.p1). In this PNC, genetics, neuroimaging, and cognitive assessment measures were all acquired in nearly 900 adolescents aged from 8 to 21 years. In this paper, we study two functional imaging datasets (i.e., functional imaging of working memory task and resting state), and their classification performance on IQ. The scores of the WRAT administered in the PNC reflect subjects’ IQ levels, since the WRAT is a standardized achievement test that measures an individual’s ability, e.g., reading recognition, spelling, and math computation [49], which can provide a reliable estimate of IQ. To mitigate the influence of age over the results, we first selected a subset of all subjects for whom ages were above 16 years. Next, we converted their WRAT scores to z-scores, and only kept subjects whose absolute values of z-scores were above 0.5. As a consequence, we were left with n = 224 subjects that were separated into two groups according to IQ levels: the low and high IQ groups (Table I). The low IQ group consisted of the subjects with z-scores smaller than −0.5, and the high IQ group consisted of the subjects with with z-scores larger than 0.5.
TABLE I:
Characteristics of the subjects in this study. SD: standard deviation.
| Group | Age (Mean ± SD) | Male/Female | WRAT score (Mean ± SD) |
|---|---|---|---|
| Low IQ | 17.96 ± 1.36 | 31/59 | 46.96 ±4.91 |
| High IQ | 18.54 ± 1.50 | 61/73 | 64.25 ± 2.57 |
MRI examinations were conducted on a single 3T Siemens TIM Trio whole-body scanner. Both task-based and resting-state images were collected using a single-shot, interleaved multi-slice, gradient-echo, echo planar imaging sequence. All the images were preprocessed in SPM12 (www.fil.ion.ucl.ac.uk/spm/), including motion correction, spatial normalization to standard MNI space, and spatial smoothing with a 3mm FWHM Gaussian kernel. A regression procedure was applied to address motion-related influences and a 0.01Hz–0.1Hz bandpass filter was applied to the functional time series. In resting-state, subjects were instructed to stay awake with the eyes open, fixate on the displayed crosshair, and keep still. In fractal n-back task to probe working memory, subjects were required to respond to a presented fractal only when it was the same as the one presented on a previous trial. Based on a recently validated 264-region functional parcellation scheme [56], 264 ROIs were defined to describe the whole brain as 10mm diameter spheres centered upon ROI coordinates. Thus, for each subject, each type of fMRI data can be represented by a matrix of which the rows correspond to the ROIs and the columns the time points. All the fMRI data were centered and normalized by subtracting from each row the mean and dividing it by its standard deviation. We finally obtained two fMRI datasets, i.e., resting-state and fractal n-back task fMRI.
2). Visualization of brain FCNs:
Recall that for each subject the FCN is defined by a p × p SPD matrix obtained in Subsection II-A, where p = 264 is the number of ROIs. Within this network, there are 34716 unique edges (or connections) and 14 functional modules, i.e., somatomotor/hand, somatomotor/mouth, cingulo-opercular control, auditory, default mode, memory retrieval, visual, fronto-parietal control, salience, subcortical, ventral attention, dorsal attention, cerebellar, and uncertain. We sought to interrogate significantly different connections between low and high IQ groups. Two-sample t-tests were performed for each of the 34716 Fisher z-transformed connection strength values in the network. In the first column of Fig. 5, we displayed the number of connections by setting different pvalue thresholds (i.e., 0.05, 0.01, 0.005, 0.001) in terms of 7 typical modules. For ease of visualization, we ranked all connections according to their t-values, and selected the top 1% of the connections (i.e., uncorrected, pvalue < 5.21 × 10−3 for resting-state, and pvalue < 3.25 × 10−4 for n-back task). The number of these selected connections differing between groups was assessed for each of the 13 modules both for within- and between-module connections shown in the second column of Fig. 5, and the corresponding three-dimensional axial views in anatomical space are visualized using the BrainNet Viewer [58]. One can see that a majority of the significantly different connections associated with IQ were involved with the default mode, fronto-parietal control, and visual modules, which is in agreement with the reports in previous studies [59]–[61]. The default mode module has been linked to self-referential thought and autobiographical memory. The fronto-parietal module, including portions of the lateral prefrontal cortex and posterior parietal cortex, is thought to serve cognitive control abilities and working memory, among others. The visual module is related to the ability to process visual stimuli and to understand spatial relation between objects.
Fig. 5:
The brain FCN organizations associated with the connections differing significantly between the low and high IQ groups during (a) resting state and (b) fractal n-back task, respectively. The first column shows the number of connections by setting different pvalue thresholds. The last two columns display top 1% connections. The second column shows the number of within- and between-module connections, and the third column shows three-dimensional axial brain views of the functional graph in anatomical space, where node colors indicate module membership.
3). Classification results:
We first assessed the classification performance for high vs low IQ when only one single dataset (resting-state FCNs or n-back task FCNs) was used with and without applying the DM. Second, we evaluated the classification performance when both resting-state and n-back task FCNs were used with applying the ADM. Third, we compared the performance of the proposed ADM based framework with that of several other common data fusion methods.
In nonlinear dimensionality reduction of the FCNs by the D-M and the ADM, two important parameters have to be set, i.e., the kernel bandwidth σ in the Gaussian kernel matrix and the target dimension of the reduced space, both of which influence the embedding and thus the subsequent classification results. Too small σ will result in a sparse (or even disconnected) graph that is unable to capture the local structures around the data points, whereas too large σ will cause a dense graph that may generate a redundant description of the data. Analogously, if the target dimension (d in the DM or in the ADM) is too large or too small, the mapping will tend to be noisy and unstable or may not capture sufficient information about the manifold geometry. Choosing parameters from a reasonable range is of importance. Notably, a max-min scheme has been suggested in [57] for choosing σ:
| (20) |
where C is typically set in the range [2, 3]. In this paper, we fixed C = 2 for the kernel bandwidth in the DM. However, in the ADM, the unified kernel matrix (11) involves the product of two single kernel matrices. This insight indicates that the max-min measure for kernel bandwidth in the DM could be relaxed in the ADM. That is, smaller values for C could be used to set σ1 and σ2 in (10). Although an automated method for determining σ1 and σ2 has been proposed [28], we choose to tune them by cross-validation in this study. Different values of the kernel bandwidth employed in our experiments were tested by setting C ∈ {0.2, 0.4, ⋯, 2} for each dataset in the ADM. In both the DM and the ADM, the target dimension varied in the range of {10, 20, ⋯ , 100}.
A 5-fold cross-validation (CV) procedure was implemented to evaluate the classification performance in all experiments. The whole data were randomly partitioned into 5 equal-sized disjoint subsets with similar class distributions. Each subset in turn was used as the test set and the remaining 4 subsets were used to train the SVM classifier. Specifically, for every pair of training and test sets, the low-dimensional embeddings of the training set were first computed, and an SVM classifier was trained by the labeled samples in the embedded training set. Then, the low-dimensional embeddings of the test set were obtained by using the out-of-sample extension, and the trained SVM was applied to predict class labels of the samples in the embedded test set. The classifier accuracy was estimated by comparing against the ground-truth labels on the test set. The test result in the CV was the average of the 5 individual accuracy measures. The whole process was repeated 20 times to reduce the effect of sampling bias, and the average classification accuracy (ACC) was computed over all 20 realizations. All free parameters, i.e., the kernel bandwidth and the target dimension, were tuned from their respective ranges by 5-fold inner CV on the training set, and the parameters with the best performance in the inner CV were used in testing.
3.1). Results of the DM and the ADM:
The performance of the DM incorporating different distances (i.e., LEU (3), CK (4), and EU (5)) on SPD matrices was tested for each single dataset of FCNs, respectively. To see if more or less significant information got lost after the embedding, we also vectorized the original high-dimensional FC data without applying the DM and then directly used them to train an SVM classifier. The results are reported in Table II We found that the classification performance using n-back task FCNs was usually better than that using resting-state FCNs. This highlights the importance of n-back task FCs in IQ classification. Compared with the results of the vectorized method, the DM with the CK and the DM with the EU achieved similar or even worse results, while the DM with the LEU made significant improvement. It means that the incorporation of the LEU in the DM extracted the most informative low-diemensional embeddings, but the incorporation of the other distances (i.e., the CK and the EU) in the DM did not.
TABLE II:
Comparison of classification results (ACC ± SD%) based on single fMRI dataset with/without applying the DM.
| Vectorized | DM+LEU | DM+CK | DM+EU | |
|---|---|---|---|---|
| resting-state | 65.63 ± 2.53 | 70.06 ± 1.95 | 66.96 ± 2.47 | 63.84 ± 2.22 |
| n-back task | 69.29 ± 1.91 | 73.22 ± 2.10 | 70.76 ± 1.97 | 68.44 ± 2.46 |
We next compared the performance of the ADM for fusion of the two datasets of FCNs (i.e., resting-state FCNs and n-back task FCNs), as shown in the last row of Table III. The performance using the ADM based data fusion was better than that using the DM on any single dataset. In particular, the ADM with the LEU achieved 75.15% classification accuracy, which was better than the results of the DM on any single dataset in Table II and made improvement of about 5% in comparison to the vectorized method for each single dataset. It demonstrates the power of ADM based data fusion and also justifies the assumption that a proper fusion of different datasets can produce more coherent information useful to understand the observed phenomenon. According to the performance of both the DM and the ADM with respect to different distances on SPD matrices, the LEU always achieved the best result, the CK followed, and the EU was the lowest. This again indicates that it is important to consider the manifold property of SPD matrices to obtain low-dimensional embeddings, resulting in discrimination of individuals with different IQ levels.
TABLE III:
Comparison of classification results (ACC ± SD%) based on two fMRI datasets with applying different fusion methods.
| Method | LEU | CK | EU |
|---|---|---|---|
| Concatenated DM II | 73.89 ± 2.56 | 71.13 ± 2.42 | 67.74 ± 2.43 |
| Kernel-sum DM | 74.55 ± 2.24 | 72.12 ± 1.80 | 69.55±2.50 |
| Kernel-dot-product DM | 74.12 ± 1.97 | 71.07 ± 1.46 | 69.84 ± 1.81 |
| ADM | 75.15 ± 1.72 | 72.38 ± 1.96 | 69.58 ± 2.09 |
We also investigated the effect of free parameters on classification performance in such a way that the parameters of interest were successively set to one combination across their ranges and for every setting of the parameters the testing accuracy was computed in CV with the left-out parameters being optimally tuned. In the left of Fig. 6, the classification accuracies of the DM+LEU for each single dataset and the ADM+LEU for fusion of two datasets are shown with varying settings of the target dimension. As can be seen from the figure, the target dimension has large effect on the classification. If the selected target dimension is too small, the mapping will lose some important information. If the selected target dimension is too large, the embeddings will be still noisy and redundant such that they cannot effectively reflect the intrinsic structures of the original high-dimensional data. Both of the above cases will lead to poor classification accuracy. Similarly, selecting optimum kernel bandwidths in the ADM has a great effect on the classification performance. As shown in the right of Fig. 6, the parameters’ sensitivity by changing values of Cresting–state and Cn–back in the ADM+LEU are presented. We observed that the best parameter combination was always found in our experiments, such as in the ADM+LEU the selected target dimension was usually in the range of [20, 50], and the selected Cresting–state and Cn–back were usually in the range of [1.2,1.8].
Fig. 6:
Effect of parameters on the classification accuracy.
3.2). Comparison with other data fusion methods:
To further demonstrate the strength of the ADM, we compare it with the following other data fusion methods.
Concatenated DM I: concatenate all the features from two datasets into a single feature vector, and then apply the DM.
Concatenated DM II [57]: apply the DM to obtain low-dimensional embeddings of each dataset separately, and then concatenate the embeddings into a unified vector.
Kernel-sum DM [62]: add up the similarity matrices constructed from each dataset to get a unified similarity matrix as W = W(1) + W(2), and then perform the rest procedures of the DM based on W.
Kernel-dot-product DM [26]: multiply the similarities matrices constructed from each dataset element by element to get a unified similarity matrix as W = W(1) ∘ W(2), and then perform the rest procedures of the DM based on W.
For fair comparison, all experiments for the above methods were implemented by the same evaluation framework as the ADM. It turns out that the ADM with the LEU still achieved the highest accuracies among all the methods with all different distances on SPD matrices. It demonstrates the effectiveness of applying the LEU to measure the similarities of FCNs. When using the CK and the LEU on SPD matrices, the ADM performed best compared with the other methods. For the EU, there were no substantial differences between the accuracies of the kernel-sum DM, the kernel-dot-product DM, and the ADM. Similar to the ADM, the kernel-sum and kernel-dot-product DM methods define a unified similarity matrix that allows to sum or multiply the pairwise similarities between subjects from each dataset, resulting in a better combination of complementary information from each dataset. It is shown from Table III that both of them achieved better classification results than those using the concatenated methods (i.e., the concatenated DM I and the concatenated DM II). In the concatenated DM I, the classification accuracy was only 69.64%. The classification performance of the concatenated DM II was slightly better than that of the concatenated DM I. The poor classification performance based on the concatenated feature set in the concatenated methods may be largely ascribed to ignoring the mutual relations that exist between the datasets. This suggests that it is better to fuse heterogeneous datasets using kernel/similarity matrices rather than directly fusing in the original feature space.
IV. Discussion
A. Most discriminative brain FCs
It can be seen that the ADM with the LEU achieved the best classification performance. Equivalently, the low-dimensional embeddings obtained by this method best characterized the underlying data structures associated with IQ variability. Therefore, the alternating diffusion distance, defined by the Euclidean distance in the low-dimensional embedding space, i.e., ∥zi – zj for each pair of subjects i and j, can provide a measure between subjects in terms of the common latent variables of interest extracted from the two sets of FCNs. Based on the alternating diffusion distance, we attempted to evaluate the discriminative power of the features (i.e., FCs) according to their Laplacian scores [63] as follows.
In each CV of the ADM with the LEU, we first learnt the embeddings corresponding to the highest classification accuracy on the training set. We then constructed a k-nearest-neighbor graph with n nodes. The i-th node corresponds to zi. If zi is among k nearest neighbors of zj or zj is among k nearest neighbors of zi, we put the edge
| (21) |
with γ being set as , otherwise, put Si,j = 0. This graph structure can nicely reflect the common manifold geometry of data. Thus, the importance of a feature can be regarded as the degree it respects the graph structure.
Let denote the k-th resting-state FC of the i-th subject, and with n subjects. The Laplacian score of the k-th resting-state FC is defined by
| (22) |
where is the estimated variance on graph. By spectral graph theory, we compute as
| (23) |
where and Vi = ΣjSi,j. Similarly, the Laplacian score of the k-th n-back task FC can be defined as well. It is obvious that the smaller the Laplacian score is, the better the feature is. Since the Laplacian score of a feature is different in each CV, we averaged the Laplacian scores of each feature in all CV folds, and ranked the features according to their averaged Laplacian scores in increasing order. We visualized 100 resting-state and n-back task FCs with the smallest averaged Laplacian scores in Fig. 7, respectively. It is found that the majority of the selected FCs are located in frontal, parietal, temporal, and occipital lobes.
Fig. 7:
Visualization of the most discriminative 100 FCs for resting-state and n-back task FCNs, respectively. The upper are brain plots of functional graphs in anatomical space, where the selected FCs are represented as edges. The lower are matrix plots, where the rows and columns represent the cortical lobes: frontal (FRO), parietal (PAR), temporal (TEM), occipital (OCC), limbic (LIM), cerebellum (CER), and sub-lobar (SUB).
B. Future work and limitations
The free parameter tuning in the manifold learning methods, e.g., the kernel bandwidth and target dimension in the DM and the ADM in this paper, is crucial for classification. How to choose the optimal values for the free parameters remains an open and actively researched question. Although algorithms for automatic tuning of the optimal kernel bandwidth and target dimension in the DM and the ADM have been proposed in [28], they have been experimentally shown to be unsuitable for the datasets in this study. Therefore, in this paper we implemented grid search CV for parameter tuning. Note that Dudoit and van der Laan [64] have provided the asymptotic proof for choosing the tuning parameter with minimal CV error, which gives a theoretical basis for this approach.
In fMRI data analysis, not all ROIs are related to IQ differences. The ROIs are filtered to extract only those that can help to discriminate between high and low IQ. Therefore, feature selection could be performed to extract the most informative ROIs prior to constructing FCNs in our proposed framework. We will investigate the effect of using different feature selection approaches on the classification performance in future work.
In line with recent studies [65], task fMRI data have a better prediction of IQ than resting-state fMRI data do. Furthermore, it has been shown in [66] that combining multiple different task fMRI datasets can significantly improve predictive power of IQ, compared with using any single task fMRI dataset. Apart from resting-state and n-back task fMRI datasets, there exist emotional task fMRI and single nucleotide polymorphism (SNP) datasets in the PNC. Therefore, other interesting and important future work is to fuse all of the three neuroimaging datasets and one genomic dataset together by means of the ADM, which could capture more discriminative information and further improve the IQ classification performance.
V. Conclusion
In this paper, we considered a manifold based data fusion method (i.e., the ADM), by which the information from two datasets acquired by different sensors is diffused to extract the common information driving the phenomenon of interest, and simultaneously to reduce the sensor-specific nuisance. We tested the potential of the ADM for predicting IQ with the PNC dataset, resulting from a comprehensive study of brain development. Specifically, for each of resting-state and n-back task fMRI, we first represented the FCN by a SPD matrix using the graphical LASSO for each subject. This results in two FCNs (or two SPD matrices), i.e., resting-state and n-back task FCNs, for each subject. We next utilized the ADM to fuse the resting-state and n-back task FCNs to extract a meaningful low-dimensional representation. The obtained low-dimensional embeddings were used to train a linear kernel SVM classifier. The experimental results show that the prediction accuracy of the fused data by means of the ADM is larger than that of using any single set of FCNs, and the ADM also achieves superior classification performance in comparison with several other data fusion methods. Moreover, in the construction of similarity matrices, we employed the Log-Euclidean manifold based metric to measure the distance between SPD matrices. The effectiveness of incorporating it in the DM or the ADM was verified by the comparative experiments with the Cholesky metric and the traditional Euclidean metric on SPD matrices.
Acknowledgment
The authors wish to thank the NIH (R01GM109068, R01MH104680, R01MH107354, R01AR059781, R01EB006841, R01EB005846, P20GM103472), and NSF (#1539067) for their partial support.
Contributor Information
Li Xiao, Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118.
Julia M. Stephen, Mind Research Network, Albuquerque, NM 87106.
Tony W. Wilson, Department of Neurological Sciences, University of Nebraska Medical Center, Omaha, NE 68198.
Vince D. Calhoun, Mind Research Network, Albuquerque, NM 87106. Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131.
Yu-Ping Wang, Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118, (wyp@tulane.edu).
References
- [1].Calhoun VD and Adali T, “Feature-based fusion of medical imaging data,” IEEE Trans. Inf. Technol. Biomed, vol. 13, pp. 711–720, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Lahat D, Adali T, and Jutten C, “Multimodal data fusion: An overview of methods, challenges, and prospects,” Proc. IEEE, vol. 103, no. 9, pp. 1449–1477, 2015. [Google Scholar]
- [3].Adali T, Levin-Schwartz Y, and Calhoun VD, “Multimodal data fusion using source separation: Application to medical imaging,” Proc. IEEE, vol. 103, no. 9, pp. 1494–1506, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Calhoun VD and Sui J, “Multimodal fusion of brain imaging data: A key to finding the missing link(s) in complex mental illness,” Biol. Psych.: Cogn. Neurosci. Neuroimag, vol. 1, no. 3, pp. 230–244, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Yokoya N, Grohnfeldt C, and Chanussot J, “Hyperspectral and multi-spectral data fusion: A comparative review of the recent literature,” IEEE Geosci. Remote Sens. Mag, vol. 5, no. 2, pp. 29–56, 2017. [Google Scholar]
- [6].Hotelling H, “Relations between two sets of variates,” Biometrika, vol. 28, no. 3/4, pp. 321–377, 1936. [Google Scholar]
- [7].Comon P, “Independent component analysis, a new concept,” Signal Process, vol. 36, no. 3, pp. 287–314, 1994. [Google Scholar]
- [8].Wold H, “Partial least squares,” in Encyclopedia of Statistical Sciences, New York:Wiley, vol. 6, pp. 581–591, 1985. [Google Scholar]
- [9].Parkhomenko E, Tritchler D, and Beyene J, “Sparse canonical correlation analysis with application to genomic data integration,” Stat. Appl. Genet. Mol. Biol, vol. 8, no. 1, pp. 1–34, 2009. [DOI] [PubMed] [Google Scholar]
- [10].Fang J, Lin D, Schulz SC, et al. , “Joint sparse canonical correlation analysis for detecting differential imaging genetics modules,” Bioinformatics, vol. 32, no. 22, pp. 3480–3488, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Chun H and Keleş S, “Sparse partial least squares regression for simultaneous dimension reduction and variable selection,” J. of the Royal Stat. Soc.: Series B, vol. 72, no. 1, pp. 3–25, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Kettenring JR, “Canonical analysis of several sets of variables,” Biometrika, vol. 58, no. 3, pp. 433–451, 1971. [Google Scholar]
- [13].Vergara VM, Ulloa A, Calhoun VD, et al. , “A three-way parallel ICA approach to analyze links among genetics, brain structure and brain function,” NeuroImage, vol. 98, pp. 386–394, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Calhoun VD, Adali T, Kiehl KA, et al. , “A method for multitask fMRI data fusion applied to schizophrenia,” Hum. Brain Mapp, vol. 27, pp. 598–610, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Sui J, Pearlson G, Caprihan A, et al. , “Discriminating schizophrenia and bipolar disorder by fusing fMRI and DTI in a multimodal CCA+joint ICA model,” NeuroImage, vol. 57, no. 3, pp. 839–855, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Plis SM, Sui J, Lane T, et al. , “High-order interactions observed in multi-task intrinsic networks are dominant indicators of aberrant brain function in schizophrenia,” NeuroImage, vol. 102, pp. 35–48, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Çetin MS, Christensen F, et al. , “Thalamus and posterior temporal lobe show greater inter-network connectivity at rest and across sensory paradigms in schizophrenia,” NeuroImage, vol. 97, pp. 117–126, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Castro E, Gómez-Verdejo V, Martínez-Ramón M, et al. , “A multiple kernel learning approach to perform classification of groups from complex-valued fMRI data analysis: Application to schizophrenia,” NeuroImage, vol. 87, pp. 1–17, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Lin YY, Liu TL, and Fuh CS, “Multiple kernel learning for dimensionality reduction,” IEEE Trans. Pattern Anal. Mach. intell, vol. 33, no. 6, pp. 1147–1160, 2011. [DOI] [PubMed] [Google Scholar]
- [20].Wang B, Mezlini AM, Demir F, et al. , “Similarity network fusion for aggregation data types on a genomic scale,” Nat. Methods, vol. 11, pp. 333–337, 2014. [DOI] [PubMed] [Google Scholar]
- [21].Deng S-P, Cao S, Huang D-S, and Wang Y-P, “Identifying stages of kidney renal cell carcinoma by combining gene expression and DNA methylation data,” IEEE/ACM Trans. Comput. Biol. Bioinform, vol. 14, pp. 1147–1153, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Lindenbaum O, Yeredor A, Salhov M, and Averbuch A, “Multiview diffusion maps,” in arXiv:1508.05550, 2015. [Google Scholar]
- [23].Lederman RR and Talmon R, “Learning the geometry of common latent variables using alternating-diffusion,” Appl. Comput. Harmon. Anal, vol. 44, no. 3, pp. 509–536, 2018. [Google Scholar]
- [24].Katz O, Talmon R, Lo Y-L, and Wu H-T, “Alternating diffusion maps for multimodal data fusion,” Inf. Fusion, vol. 45, pp. 346–360, 2019. [Google Scholar]
- [25].Shnitzer T, Ben-Chen M, Guibas L, Talmon R, and Wu H-T, “Recovering hidden components in multimodal data with composite diffusion operators,” arXiv preprint arXiv:1808.07312 (2018). [Google Scholar]
- [26].Coifman RR and Lafon S, “Diffusion maps,” Appl. Comput. Harmon. Anal, vol. 21, no. 1, pp. 5–30, 2006. [Google Scholar]
- [27].Ma Y, Niyogi P, Sapiro G, and Vidal R, “Dimensionality reduction via subspace and submanifold learning,” IEEE Signal Process. Mag, vol. 28, no. 2, 2011. [Google Scholar]
- [28].Dov D, Talmon R, and Cohen I, “Kernel-based sensor fusion with application to audio-visual voice activity detection,” IEEE Trans. Signal Process, vol. 64, no. 24, pp. 6406–6416, 2016. [Google Scholar]
- [29].Dov D, Talmon R, and Cohen I, “Sequential audio-visual correspondence with alternating diffusion kernels,” IEEE Trans. Signal Process, vol. 66, no. 12, pp. 3100–3111, 2018. [Google Scholar]
- [30].Shnitzer T, Rapaport M, et al. , “Alternating diffusion maps for dementia severity assessment,” in Proc. IEEE ICASSP, pp. 831–835, 2017. [Google Scholar]
- [31].Liu G-R, Lo Y-L, Sheu Y-C, and Wu H-T, “Diffuse to fuse EEG spectra – intrinsic geometry of sleep dynamics for classification,” arXiv: 1803.01710, 2018. [Google Scholar]
- [32].Qiu A, Lee A, Tan M, et al. , “Manifold learning on brain functional networks in aging,” Med. Image Anal, vol. 20, no. 1, pp. 52–60, 2015. [DOI] [PubMed] [Google Scholar]
- [33].Ktena SI, Arslan S, and Rueckert D, “Gender classification and manifold learning on functional brain networks,” in BIH Symposium on Big Data Initiatives for Connectomics Research, 2015. [Google Scholar]
- [34].Young J, Lei D, and Mechelli A, “Discriminative Log-Euclidean kernels for learning on brain networks,” in International Workshop on Connectomics in Neuroimaging, 2017. [Google Scholar]
- [35].Friedman J, Hastie T, and Tibshirani R, “Sparse inverse covariance estimation with the graphical lasso,” Biostatistics, vol. 9, no. 3, pp. 432–441, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Amico E and Goñi J, “The quest for identifiability in human functional connectomes,” Sci. Rep, (2018) 8:8254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Mišić B, Betzel RF, de Reus MA, et al. , “Network-level structure-function relationships in human neocortex,” Cerebral Cortex, vol. 26, pp. 3285–3296, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Liang H and Wang H, “Structure-function network mapping and its assessment via persistent homology,” PLoS Comput. Biol, 13.1 (2017): e1005325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Amico E and Goñi J, “Mapping hybrid functional-structural connectivity traits in the human connectome,” Network Neuroscience, vol. 2, pp. 306–322, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Förstner W and Moonen B, “A metric for covariance matrices,” in Geodesy–The Challenge of the 3rd Millennium, pp. 144–152, 2012. [Google Scholar]
- [41].Sra S, “A new metric on the manifold of kernel matrices with application to matrix geometric mean,” in Advances in Neural Information Processing Systems 25, New York, NY, USA: Springer, pp. 299–309, 2003. [Google Scholar]
- [42].Zhang J, Zhou L, Wang L, and Li W, “Functional brain network classification with compact representation of SICE matrices,” IEEE Trans. Biomed. Eng, vol. 62, pp. 1623–1634, 2015. [DOI] [PubMed] [Google Scholar]
- [43].Arsigny V, Fillard P, Pennec X, and Ayache N, “Log-euclidean metrics for fast and simple calculus on diffusion tensors,” Magn. Reson. Med, vol. 56, pp. 411–421, 2006. [DOI] [PubMed] [Google Scholar]
- [44].Jayasumana S, Hartley R, Salzmann M, et al. , “Kernel methods on Riemannian manifolds with Gaussian RBF kernels,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 37, pp. 2464–2477, 2015. [DOI] [PubMed] [Google Scholar]
- [45].Fowlkes C, Belongie S, Chung F, and Malik J, “Spectral grouping using the Nyström method,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 26, pp. 214–225, 2004. [DOI] [PubMed] [Google Scholar]
- [46].Lindenbaum O, Yeredor A, and Cohen I, “Musical key extraction using diffusion maps,” Signal Process, vol. 117, pp. 198–207, 2015. [Google Scholar]
- [47].Satterthwaite TD, Elliott MA, Ruparel K, et al. , “Neuroimaging of the Philadelphia neurodevelopmental cohort,” NeuroImage, vol. 86, pp. 544–553, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Satterthwaite TD, Connolly JJ, Ruparel K, et al. , “The Philadelphia neurodevelopmental cohort: A publicly available resource for the study of normal and abnormal brain development in youth,” NeuroImage, vol. 124, pp. 1115–1119, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Wilkinson GS and Robertson GJ, Wide Range Achievement Test 4 (WRAT4), Lutz, FL, 2006. [Google Scholar]
- [50].Noble KG, Tottenham N, and Casey BJ, “Neuroscience perspectives on disparities in school readiness and cognitive achievement,” The Future of Children, vol. 15, pp. 71–89, 2005. [DOI] [PubMed] [Google Scholar]
- [51].Deary IJ, Penke L, and Johnson W, “The neuroscience of human intelligence differences,” Nat. Rev. Neurosci, vol. 11, pp. 201–211, 2010. [DOI] [PubMed] [Google Scholar]
- [52].Schwarz G, “Estimating the dimension of a model,” Ann. Statist., vol. 6, no. 2, pp. 461–464, 1978. [Google Scholar]
- [53].Zille P, Calhoun VD, Stephen JM, et al. , “Fused estimation of sparse connectivity patterns from rest fMRI. Application to comparison of children and adult brains,” IEEE Trans. Med. Imag, DOI: 10.1109/T-MI.2017.2721640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [54].Ramezani M, Abolmaesumi P, Marble K, et al. , “Fusion analysis of functional MRI data for classification of individuals based on patterns of activation,” Brain Imaging Behav, vol. 9, no. 2, pp. 149–161, 2015. [DOI] [PubMed] [Google Scholar]
- [55].Yang H, Liu J, Sui J, et al. , “A hybrid machine learning method for fusing fMRI and genetic data: combining both improves classification of schizophrenia,” Front. Hum. Neurosci, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Power JD, Cohen AL, et al. , “Functional network organization of the human brain,” Neuron, vol. 72, no. 4, pp. 665–678, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [57].Keller Y, Coifman RR, Lafon S, and Zucker SW, “Audio-visual group recognition using diffusion maps,” IEEE Trans. Signal Process, vol. 58, no. 1, pp. 403–413, 2010. [Google Scholar]
- [58].Xia M, Wang J, and He Y, “BrainNet Viewer: A network visualization tool for human brain connectomics,” PloS one, 8.7 (2013): e68910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Chaddock-Heyman L, Weng TB, Kienzler C, et al. , “Scholastic performance and functional connectivity of brain networks in children,” PloS one, 13.1 (2018): e0190073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Song M, Zhou Y, Li J, et al. , “Brain spontaneous functional connectivity and intelligence,” NeuroImage, vol. 41, pp. 1168–1176, 2008. [DOI] [PubMed] [Google Scholar]
- [61].Cole MW, Ito T, and Braver TS, “Lateral prefrontal cortex contributes to fluid intelligence through multinetwork connectivity,” Brain Connectivity, vol. 5, pp. 497–504, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [62].Zhou D and Burges CJC, “Spectral clustering and transductive learning with multiple views,” in Proc. 24th Int. Conf Mach. Learn, pp. 1159–1166, 2007. [Google Scholar]
- [63].He X, Cai D, and Niyogi P, “Laplacian score for feature selection,” in Advances in Neural information Processing Systems, pp. 507–514, 2006. [Google Scholar]
- [64].Dudoit S and van der Laan MJ, “Asymptotics of cross-validated risk estimation in estimator selection and performance assessment,” Stat. Meth, vol. 2, pp. 131–154, 2005. [Google Scholar]
- [65].Greene AS, Gao S, Scheinost D, and Constable RT, “Task-induced brain state manipulation improves prediction of individual traits,” Nat. Commun, 9.1 (2018): 2807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [66].Gao S, Greene AS, Constable RT, and Scheinost D, “Task integration for connectome-based prediction via canonical correlation analysis,” in IEEE 15th international Symposium on Biomedical imaging, 2018. [Google Scholar]







