Multi-Task Sparse Canonical Correlation Analysis with Application to Multi-Modal Brain Imaging Genetics

Lei Du; Kefei Liu; Xiaohui Yao; Shannon L Risacher; Junwei Han; Andrew J Saykin; Lei Guo; Li Shen

doi:10.1109/TCBB.2019.2947428

. Author manuscript; available in PMC: 2022 Feb 3.

Published in final edited form as: IEEE/ACM Trans Comput Biol Bioinform. 2021 Feb 3;18(1):227–239. doi: 10.1109/TCBB.2019.2947428

Multi-Task Sparse Canonical Correlation Analysis with Application to Multi-Modal Brain Imaging Genetics

Lei Du ¹, Kefei Liu ², Xiaohui Yao ³, Shannon L Risacher ⁴, Junwei Han ⁵, Andrew J Saykin ⁶, Lei Guo ⁷, Li Shen ⁸, Alzheimer’s Disease Neuroimaging Initiative

PMCID: PMC7156329 NIHMSID: NIHMS1060041 PMID: 31634139

Abstract

Brain imaging genetics studies the genetic basis of brain structures and functionalities via integrating genotypic data such as single nucleotide polymorphisms (SNPs) and imaging quantitative traits (QTs). In this area, both multi-task learning (MTL) and sparse canonical correlation analysis (SCCA) methods are widely used since they are superior to those independent and pairwise univariate analysis. MTL methods generally incorporate a few of QTs and could not select features from multiple QTs; while SCCA methods typically employ one modality of QTs to study its association with SNPs. Both MTL and SCCA are computational expensive as the number of SNPs increases. In this paper, we propose a novel multi-task SCCA (MTSCCA) method to identify bi-multivariate associations between SNPs and multi-modal imaging QTs. MTSCCA could make use of the complementary information carried by different imaging modalities. MTSCCA enforces sparsity at the group level via the G_2,1-norm, and jointly selects features across multiple tasks for SNPs and QTs via the ℓ_2,1-norm. A fast optimization algorithm is proposed using the grouping information of SNPs. Compared with conventional SCCA methods, MTSCCA obtains better correlation coefficients and canonical weights patterns. In addition, MTSCCA runs very fast and easy-to-implement, indicating its potential power in genome-wide brain-wide imaging genetics.

Keywords: Brain Imaging Genetics, Sparse Canonical Correlation Analysis, Multi-Task Sparse Canonical Correlation Analysis

1. Introduction

Imaging genetics is an emerging and important topic which integrates both the genetic factors and neuroimaging phenotypic measurements in brain science. This integration research of combining diverse genetic and genomic data is expected to uncover the genetic basis of brain structures and functionalities, and further offers new opportunities to interpret the causality of relationships between genetic variations and brain disorders such as the Alzheimer’s disease (AD) [1], [2]. Modern neuroimaging techniques, such as magnetic resonance imaging (MRI) and positron-emission tomography (PET), image the morphometry and metabolic processes of the brain based on different techniques, and generate different imaging data describing the brain from different perspectives. These multi-modal imaging data provide complementary information and have been demonstrated to offer comprehensive understandings of the brain structures, functionalities, and brain disorders [3]. Moreover, in biomedical studies, we usually face a huge size of genotyping biomarkers such as the single nucleotide polymorphisms (SNPs), which is a type of high-resolution markers in genome-wide association studies (GWAS). Therefore, developing the fast and efficient GWAS-oriented imaging genetics method which integrates multi-modal imaging data simultaneously is of great importance and meaning.

The multivariate learning methods are very popular in brain imaging genetics since both imaging data and genetic data are multivariate. The multi-task learning (MTL) techniques are of this kind and widely used in brain imaging genetics [4], [5]. Generally, these methods choose a few important imaging QTs relevant to their aim as dependent variables and SNPs as independent variables. Then joint effect of multi-locus genotype on few phenotypes is studied. This paradigm can select SNPs that are simultaneously relevant to the candidate brain phenotypes. However, the brain is demonstrated to be comprised of multiple regions. Then using only a small proportion of them could be lack of power since they may lose important information carried by cerebral components which are not included.

Although a brain-wide MTL model can be used, they are still insufficient since they cannot select relevant brain phenotypes from multiple brain cerebral components. Therefore, bi-multivariate methods become more and more popular in brain imaging genetics recently. Sparse canonical correlation analysis (SCCA) is such a technique which usually identifies the relationship between two views of data with sparse output induced by different regularization techniques [6], [7], [8], [9], [10], [11], [12], [13], [14], [15]. These two-view SCCA methods have limited power since they only utilize one modal imaging QTs. Given multi-modal imaging data, incorporating them together could make use of the information carried by different modalities and would be beneficial to uncover interesting findings that using one modality cannot. Therefore, jointly analyzing the relationship between all the imaging phenotypes from different modalities and genetic factors via one single integrative SCCA model is desirable and of great interest. And this integrative model would be helpful to elucidate the shared mechanism of genetic factors on the brain.

One possible solution is the multi-view SCCA modelling which considers the pairwise relationship between all omics data involved. This multi-view SCCA is a naive extension to existing two-view SCCA models, and a three-view one has been introduced in [13]. It learns only one single canonical weight for genetic loci which is overstrict thus cannot make full use of the complementary information embedded in different modalities of imaging phenotypes.

Using brain-wide imaging QTs from multiple modalities, in this paper, we propose a Multi-Task learning based SCCA (MTSCCA) framework [16], [17] which can study bi-multivariate associations between these phenotypes and genotypes simultaneously. MTSCCA treats each SNP and QT as a feature, and then models the association between each imaging modality and SNPs as a learning task. Different from those conventional SCCA, including both two-view and three-view methods, MTSCCA learns one canonical weight matrix for SNPs, in which each column vector corresponds to one canonical weight of one SCCA task. In contrast, only one canonical weight vector is associated with each imaging modality. To make the model practical, we take into consideration the group structure such as the linkage disequilibrium (LD) [18] in human genome via the group ℓ_2,1-norm (G_2,1-norm) [5] regularization. The joint individual feature selection for genetic and phenotypic markers is also taken into consideration via an ℓ_2,1-norm constraint. In addition, we propose a fast and efficient optimization algorithm which is guaranteed to converge to a local optimum. We apply MTSCCA to a very large real neuroimaging genetic data set from the Alzheimer’s disease neuroimaging initiative (ADNI) [19] cohort with all SNPs in the 19th chromosome and three different modalities of imaging QTs included. We intend to reveal the associations between these genetic markers and imaging phenotypes. Experimental results show that, compared with both two-view and multi-view SCCA methods, MTSCCA yields better canonical correlation coefficients and canonical weights. It also reports a compact set of SNPs and imaging QTs known to be associated with AD. Moreover, MTSCCA runs very fast and could be a powerful tool to genome-wide brain-wide bi-multivariate association analysis.

2. Methodology

We denote scalars as italic letters, column vectors as boldface lowercase letters, and matrices as boldface capitals. For X = (x_ij), its i-th row is denoted as xⁱ and j-th column is x_j, and X_i denotes the i-th matrix. ∥x∥₂ denotes the Euclidean norm, $‖ X ‖_{F} = \sqrt{\sum_{i} \sum_{j} x_{i j}^{2}}$ denotes the Frobenius norm.

2.1. Background

Let X_i and w_i, i = 1,…, I, represent the data matrices and the corresponding canonical weights, respectively. Further, we use X₁ load the SNP data, and those remaining X_k’s (k ≠ 1) load the imaging QT data of each imaging modality separately. Then the conventional SCCA is defined as

min_{w_{1}, \dots, w_{K}} \sum_{i < j} - w_{i}^{⊤} X_{i}^{⊤} X_{j} w_{j} s . t . ‖ X_{i} w_{i} ‖_{2}^{2} = 1, Ω (w_{i}) \leq b_{i}, i = 1, \dots, I,

(1)

where Ω (w_i) is the penalty function to accommodate sparsity and thus can select those features of interest. Many penalty functions have been studied in the literature such as Lasso (ℓ₁-norm function) [10], [13], [20], group Lasso [11] and graphical Lasso [9], [12]. Conventionally, we call it two-view SCCA (SCCA for short) when I = 2, and most existing studies fall into this category. Those methods using three or more sets of data (I ≥ 3) are called multi-view or multiset SCCA (mSCCA) [13]. The two-view SCCA only uses one modality of imaging QTs to study the genetic influence on brain functions or structures, and the mSCCA learns only one canonical weight for genetic data which must be correlated to all imaging QTs simultaneously.

2.2. MTSCCA

2.2.1. The MTSCCA Model

To distinguish from the notation in mSCCA, in this section, we use $X \in R^{n \times p}$ to represent the genetic data with n participants and p SNPs, and $Y_{j} \in R^{n \times q} (j = 1, \dots, c)$ to represent the phenotype data with q imaging measurements, where c is the number of imaging modalities (tasks). Let $U \in R^{p \times c}$ be the canonical weight matrix associated with X and $V \in R^{q \times c}$ be that associated with imaging QTs with each v_j corresponding to Y_j, we propose the novel multi-task based SCCA (MTSCCA) model as follows

min_{u_{j}, v_{j}} \sum_{j} - u_{j}^{⊤} X^{⊤} Y_{j} v_{j} s . t . ‖ X u_{j} ‖_{2}^{2} = 1, ‖ Y_{j} v_{j} ‖_{2}^{2} = 1, Ω (U) \leq b_{1}, Ω (V) \leq b_{2}, \forall j .

(2)

Obviously, our model is distinct from those mCCA. First, MTSCCA employs the multi-task framework which learns a series of related SCCA tasks together. This simultaneous learning has been empirically [21], [22] and theoretically [21], [23] shown to improve performance dramatically compared with learning each task independently [24]. Second, our model learns a canonical weight matrix U for SNPs, in which each column u_j corresponds to an individual SCCA task. This is helpful since it does not require a unique canonical weight of SNPs to be associated with all modalities of imaging QTs at the same time. Third, MTSCCA learns one canonical weight corresponding to each imaging modality separately, indicating that we do not need to calculate multiple canonical weights for a specific imaging modality. This helps the model focus on the identification of markers from the genetic data, indicating it is quite suitable for imaging genetics analysis. Finally, our mode can be well scalable in terms of both modeling and computation. According to Eqs. (1-2), the number of tasks of MTSCCA equals the number of imaging modalities which implies a linear relationship; while that of mCCA increases quadratically as the number of imaging modalities increase since it does a CCA task between every pair of data sets, including the pairwise SCCA among imaging modalities.

2.2.2. Group-sparsity and Joint Individual Feature selection for SNPs

Since numerous SNPs inherently exhibit group structure in the genome, a realistic modeling method should take this information into consideration. In Eq. (2), a canonical weight matrix is associated with SNPs, and thus the conventional group Lasso which is used to penalize a vector cannot be employed directly. To tackle this issue, we use the G_2,1-norm function [5] which is formulated as

‖ U ‖_{G_{2, 1}} = \sum_{k = 1}^{K} {‖ U^{k} ‖}_{F} = \sum_{k = 1}^{K} \sqrt{\sum_{i \in g_{k}} \sum_{j = 1}^{c} u_{i j}^{2}},

(3)

where the SNPs are partitioned into K groups $G = {g_{k}}_{k = 1}^{K}$ . This regularization penalizes the SNPs in the same group as a whole and expects to estimate equal or similar coefficients for them. According to [5], this penalty has two major merits. First, it incorporates the group structural knowledge into the model via packaging all SNPs in the same group together. This makes the model practical because it is in accordance with the genetic mechanism. Second, it penalizes the canonical weight coefficients of a group of variables across all SCCA tasks jointly. This setup can mutually promote each individual task.

Although using G_2,1-norm regularization is meaningful, there is a lack of feature selection at individual level. For those disease related SNPs, they could hardly be located in the same group. Generally, within a specific group, an individual variable could be relevant to the QTs and those remaining ones could be irrelevant. Therefore, we also model this via the ℓ_2,1-norm regularization which is the Lasso regularization adjusted for multi-task feature selection,

‖ U ‖_{2, 1} = \sum_{i = 1}^{p} {‖ u^{i} ‖}_{2} = \sum_{i = 1}^{p} \sqrt{\sum_{j = 1}^{c} u_{i j}^{2}} .

(4)

Using both G_2,1-norm and ℓ_2,1-norm regularization, MTSCCA can not only select features at the group level in accordance with the biological knowledge, but also jointly select feature at the individual level across all SCCA tasks.

2.2.3. Joint Individual Feature Selection Across Different Imaging Modalities

Apart from the identification of risk genetic factors, identifying the AD risk imaging biomarkers is also of great concern. In this study, in addition to the canonical weight matrix for SNPs, MTSCCA also learns one canonical weight for each imaging modality. For a larger number of imaging features, a non-sparse result without feature selection makes the model complex and hard to interpret. Therefore, sparsity-inducing regularization is necessary for those imaging biomarkers too.

In the MTSCCA model, we use the ℓ_2,1-norm function on the imaging QTs, i.e.

‖ V ‖_{2, 1} = \sum_{i = 1}^{q} {‖ v^{i} ‖}_{2} = \sum_{i = 1}^{q} \sqrt{\sum_{j = 1}^{c} v_{i j}^{2}} .

(5)

At first glance this is similar to that used to jointly select individual features for SNPs, but it is employed here based on a different motivation. Although collected based on different imaging technologies, all modalities of imaging QTs are measured from the same brain geography and have been mapped onto the same brain atlas via the segmentation and registration. Thus it is reasonable to assume equal or similar weights for those imaging QTs associated with the same brain area but attributed to different modalities. Therefore, the ℓ₂-norm imposed on vⁱ penalizes those QTs from the same brain area but different modalities together, and then the ℓ₁-norm is utilized to select them jointly.

2.3. The Efficient Optimization Algorithm

Now we can write the MTSCCA with penalties explicitly exhibited, i.e.

min_{u_{j}, v_{j}} \sum_{j = 1}^{c} - u_{j}^{⊤} X^{⊤} Y_{j} v_{j} s . t . ‖ X u_{j} ‖_{2}^{2} = 1, ‖ Y_{j} v_{j} ‖_{2}^{2} = 1, ‖ U ‖_{G_{2, 1}} \leq a, ‖ U ‖_{2, 1} \leq b_{1}, ‖ v ‖_{2, 1} \leq b_{2}, \forall j .

(6)

In order to solve Eq. (6), we modify the loss function to

min_{u_{j}, v_{j}} \sum_{j = 1}^{c} ‖ X u_{j} - Y_{j} v_{j} ‖_{2}^{2} s . t . ‖ X u_{j} ‖_{2}^{2} = 1, ‖ Y_{j} v_{j} ‖_{2}^{2} = 1, ‖ U ‖_{G_{2, 1}} \leq a, ‖ U ‖_{2, 1} \leq b_{1}, ‖ V ‖_{2, 1} \leq b_{2}, \forall j,

(7)

which is equivalent to the original one since ∀j, $‖ X u_{j} ‖_{2}^{2} = 1$ and $‖ Y_{j} v_{j} ‖_{2}^{2} = 1$ . Then we write its Lagrangian

L (U, V) = \sum_{j = 1}^{c} [‖ X u_{j} - Y_{j} v_{j} ‖_{2}^{2} + γ_{1} (‖ X u_{j} ‖_{2}^{2} - 1) + γ_{2} (‖ Y v_{j} ‖_{2}^{2} - 1)] + β (‖ U ‖_{G_{2, 1}} - a) + λ_{1} (‖ U ‖_{2, 1} - b_{1}) + λ_{2} (‖ V ‖_{2, 1} - b_{2}),

(8)

where β, λ₁, λ₂, γ₁ and γ₂ are tuning parameters, and β, λ₁ and λ₂ are positive values which control the model sparsity. By dropping the constants, we further have

L (U, V) = \sum_{j = 1}^{c} [‖ X u_{j} - Y_{j} v_{j} ‖_{2}^{2} + γ_{1} ‖ X u_{j} ‖_{2}^{2} + γ_{2} ‖ Y_{j} v_{j} ‖_{2}^{2}] + β ‖ U ‖_{G_{2, 1}} + λ_{1} ‖ U ‖_{2, 1} + λ_{2} ‖ V ‖_{2, 1}

(9)

from the point of view of optimization.

This equation is difficult to solve since it is non-convex in the loss function and non-smooth in penalty functions. Fortunately, it is convex in U with V fixed. Moreover, this objective is convex in v_j with those remaining v_k(k ≠ j) and U fixed. On this account, we can solve this problem via the alternative update rule which is widely used in the optimizing community.

2.3.1. Updating U

We first show solving U with V fixed. Since all u_j’s are associated with X, they can be jointly calculated via a multi-task framework. Taking the derivative of $L (U, V)$ with respect to U and letting it be 0, we arrive at

- X^{⊤} Y + β \tilde{D} U + λ_{1} D_{1} U + γ_{1}^{'} X^{⊤} X U = 0,

(10)

where $Y = [Y_{1} V_{1} Y_{2} v_{2} \dots Y_{c} v_{c}]$ , $2 \tilde{D} U$ is the subgradient of ∥u∥_{G_2,1} and 2d₁ u is that of ∥u∥_2,1; $\tilde{D}$ is a block diagonal matrix with entries being $\frac{1}{2 ‖ U^{k} ‖_{F}} I_{k} (k \in [1, K])$ , and i_k is an identity matrix of size equaling to the k-th group; d₁ is also a diagonal matrix with diagonal entries being $\frac{1}{2 ‖ u^{i} ‖_{2}} (i \in [1, p])$ ; and $γ_{1}^{'} = γ_{1} + 1$ .

Then we can easily have

(β \tilde{D} + λ_{1} D_{1} + γ_{1}^{'} X^{⊤} X) U = X^{⊤} Y,

(11)

and further

U = (β \tilde{D} + λ_{1} D_{1} + γ_{1}^{'} X^{⊤} X)^{- 1} X^{⊤} Y .

(12)

According to [5], this linear system in terms of u can be efficiently solved via an iterative algorithm by alternatively first updating $\tilde{D}$ and d₁ and then u. However, if the number of SNPs becomes larger and larger, this iterative algorithm is still computationally expensive.

A fast implementation:

The primary difficulty of the u-update is the calculation of the covariance matrix x^⊤x when there are a large number of features of x. In this paper, we use an approximation method to assure a fast implementation of x^⊤x via making use of the priori knowledge, i.e. the inherent structure of the SNPs within the genome. Fig. 1 is the illustration of the pairwise correlation coefficients and LD values in r² among a segment of SNPs at different loci from chromosome 19. The SNPs naturally form block structure along the diagonal, indicating a clear pattern of intra-block high correlation and inter-block low correlation. Since x is centered and normalized, x^⊤x is the same as the pairwise correlation coefficients as shown in Fig. 1. This indicates that x^⊤x holds block diagonal structure too, and its off-block-diagonal elements are nearly zero, i.e. $x_{g_{k}}^{⊤} x_{g_{t}} \approx 0$ (k ≠ t). In a word, the information of the covariance matrix are mainly carried by a series of block matrices along the diagonal. Most importantly, the size of these blocks are quite small compared with the original covariance matrix attributing to the fact that the LD block is usually much smaller than the number of SNPs (p_k ≪ p) in human genome [25].

Fig. 1. — Illustration of the pairwise correlation coefficients and LD values (r² ≥ 0.2) of SNPs from Chromosome 19 of an ADNI database. (1) The three sub figures above show the correlation coefficients r among SNPs with number of 1,000, and 5,000, and 13,000. (2) The three sub figures below are the corresponding values of LD. All figures show that SNPs clearly form groups and the block diagonal structure always exists as the number of SNPs increases.

This structure has been widely used to guide the recovery of group relationship among SNPs via the group Lasso [26] or G_2,1-norm [5]. However, they suffer from heavy computational issues caused by the enormous SNPs, and only when artificially assuming that x^⊤x is an identity matrix could alleviate this issue [6], [7], [13]. From Fig. 1, we know that the identity assumption will inevitably lose information carried by those blocks along the diagonal [11]. In this study, we not only make use of this grouping information to identify relationships among SNPs, but also explore a fast and easy-to-implement method to handle the computational issues.

Based on the analysis above, we propose that x^⊤x can be computationally simplified by a series of (x^⊤x)_{g_k} (abbreviated from $x_{g_{k}}^{⊤} x_{g_{k}}$ ) along the diagonal. We only omit those off-block-diagonal elements which has little influence on the performance. Fig. 2 is the illustration of the approximation where the off-block-diagonal elements are replaced by zero. It is clear that the primary information of x^⊤x are well preserved since we take into consideration the LD structure. Therefore, compared with those methods using identity assumption, our method preserves more information of the data, and could be useful in identification of important genetic markers [11]. Most importantly, other than those methods calculating x^⊤x via the brute force [11], we have a very fast implementation which is supported by the following theorem.

Fig. 2. — Illustration of the simplified covariance matrix X^⊤X, where X_{g_k} and X_{g_k+1} are two LD blocks, and $X_{g_{k}}^{⊤} X_{g_{k}}$ is abbreviated as (X^⊤X)_{g_k}. Since the correlation between the two blocks are very low ( $X_{g_{k}}^{⊤} X_{g_{k + 1}} \approx 0$ and $X_{g_{k + 1}}^{⊤} X_{g_{k}} \approx 0$ ), their covariance can be ignored.

Theorem 1. If x^⊤x is a block diagonal matrix, Eq.(11) can be solved by

U = \oplus_{k = 1}^{K} U^{k} = [\begin{matrix} U^{1} \\ ⋮ \\ U^{K} \end{matrix}], U^{k} = (β {\underline{\tilde{D}}}_{g_{k}} + λ_{1} {\underline{D_{1}}}_{g_{k}} + γ_{1}^{'} (X^{⊤} X)_{g_{k}})^{- 1} X_{g_{k}}^{⊤} [Y_{1} v_{1}, \dots, Y_{c} v_{c}],

(13)

where ${\underline{\tilde{D}}}_{g_{k}}$ is the k-th block of the diagonal matrix $\tilde{D}$ ; ${\underline{D_{1}}}_{g_{k}}$ is the k-th block of d₁; and ⊕ denotes the operation that concatenates matrices vertically.

Proof 1. Since SNPs exhibit group structures, we denote x = (⋯, x_{g_k}, ⋯) with k being the index of the k-th group.

Then the covariance matrix x^⊤x can be represented as

X^{⊤} X = [\begin{matrix} ⋱ \\ (x^{⊤} x)_{g_{k}} \\ ⋱ \end{matrix}] .

We have known that D₁ and ${\tilde{D}}_{1}$ are diagonal matrices, indicating they both are diagonally separable. Then according to Eq. (12), we have

U = {(\frac{}{} β [\begin{matrix} ⋱ \\ {\underline{\tilde{D}}}_{g_{k}} \\ ⋱ \end{matrix}] + λ_{1} [\begin{matrix} ⋱ \\ {\underline{D_{1}}}_{g_{k}} \\ ⋱ \end{matrix}] + γ_{1}^{'} [\begin{matrix} ⋱ \\ X_{g_{k}}^{⊤} X_{g_{k}} \\ ⋱ \end{matrix}] \frac{}{})}^{- 1} [\begin{matrix} ⋮ \\ X_{g_{k}}^{⊤} \\ ⋮ \end{matrix}] Y = [\begin{matrix} ⋮ \\ (β {\underline{\tilde{D}}}_{g_{k}} + λ_{1} {\underline{D_{1}}}_{g_{k}} + γ_{1}^{'} (X^{⊤} X)_{g_{k}})^{- 1} X_{g_{k}}^{⊤} Y \\ ⋮ \end{matrix}] = \oplus_{k = 1}^{K} U^{k} .

The advantages of this theorem are threefold. (1) The time complexity of Eq. (13) is $O (n p_{k}^{2} K)$ compared with that of Eq. (12) being O(np²), where p_k is the size of the k-th group, and $p = \sum_{k = 1}^{K} p_{k}$ . This is a significant improvement because that the LD block size is usually quite small, i.e. p_k ≪ p. (2) Benefiting from the computation effort reduction, the memory requirement is also saved a lot because storing X^⊤X is very memory expensive than storing several (x^⊤x)_{g_k}. (3) According to the proof, Eq. (13) is quite easy to implement, demonstrating it is very promising in big imaging genetic analysis. This is one of the contributions of this study and might provide a powerful tool for genome-wide and brain-wide bi-multivariate analysis.

2.3.2. Updating v_j

Note that each v_j is associated with each Y_j separately. This means that these v_j’s are not closely coupled such as u_j’s and should be tackled with separately. Next we will show how to solve v_j with v_k(k ≠ j) and U being fixed. Based on Eq. (9), we take the derivative with respect to v_j and set it to zero

- Y_{j}^{⊤} X u_{j} + λ_{2} D_{2} v_{j} + γ_{2}^{'} Y_{j}^{⊤} Y_{j} v_{j} = 0,

(14)

which can be rewritten as

(λ_{2} D_{2} + γ_{2}^{'} Y_{j}^{⊤} Y_{j}) v_{j} = Y_{j}^{⊤} X u_{j},

(15)

i.e.

v_{j} = (λ_{2} D_{2} + γ_{2}^{'} Y_{j}^{⊤} Y_{j})^{- 1} Y_{j}^{⊤} X u_{j},

(16)

where D₂ is a diagonal matrix with its i-th entry being $\frac{1}{2 ‖ v^{i} ‖_{2}} (i \in [1, q])$ on the diagonal; and $γ_{2}^{'} = γ_{2} + 1$ . Therefore, each v_j can also be solved alternatively through an iterative algorithm.

Now that the building blocks regarding updating U and each individual v_j are created, we present the pseudocode in Algorithm 1.

2.4. Convergence Analysis

We have the following theorem for Algorithm 1.

Theorem 2. Algorithm 1 decreases the objective value of Eq. (9) in each iteration.

Proof 2. In order to prove this theorem, we need two essential conclusions: (1) Eq. (12) decreases the objective Eq. (9) in each iteration; and (2) Eq. (16) decreases the objective Eq. (9) in each iteration.

Algorithm 1 Algorithm to solve Eq. (9)

\begin{matrix} Require : \\ X \in R^{n \times p}, Y_{j} \in R^{n \times q}, j = 1, \dots, c, β, λ_{1}, λ_{2}, γ_{1}, γ_{2} \\ Ensure : \\ Canonical weights U and V . \\ 1 : Initialize U \in R^{p \times c}, V \in R^{q \times c}; \\ 2 : while not convergence do \\ 3 : Update {\underline{\tilde{D}}}_{g_{k}} and {\underline{D_{1}}}_{g_{k}}; \\ 4 : Solve U according to Eq . (13), and normalize u_{j} to \\ ‖ X u_{j} ‖_{2}^{2} = 1; \\ 5 : Update D_{2}; \\ 6 : Solve v_{j} (j = 1, \dots, c) according to Eq . (16), and \\ normalize v_{j} to ‖ Y_{j} v_{j} ‖_{2}^{2} = 1; \\ 7 : end while \end{matrix}

Open in a new tab

We first prove the conclusion (1). According to Eq. (12), we have

\sum_{j = 1}^{c} {‖ X u_{j}^{(t + 1)} - Y_{j} v_{j}^{(t)} ‖}_{2}^{2} + γ_{1}^{'} Tr (U^{(t + 1)}^{⊤} X^{⊤} {XU}^{(t + 1)}) + β Tr (U^{(t + 1)}^{⊤} \tilde{D} U^{(t + 1)}) + λ_{1} Tr (U^{(t + 1)}^{⊤} D_{1} U^{(t + 1)}) \leq \sum_{j = 1}^{c} {‖ X u_{j}^{(t)} - Y_{j} v_{j}^{(t)} ‖}_{2}^{2} + γ_{1}^{'} Tr (U^{(t)}^{⊤} X^{⊤} {XU}^{(t)}) + β Tr (U^{(t)}^{⊤} \tilde{D} U^{(t)}) + λ_{1} Tr (U^{(t)^{⊤}} D_{1} U^{(t)}) \Rightarrow \sum_{j = 1}^{c} [{‖ X u_{j}^{(t + 1)} - Y_{j} v_{j}^{(t)} ‖}_{2}^{2} + γ_{1}^{'} {‖ X u_{j}^{(t + 1)} ‖}_{2}^{2}] + β \sum_{k = 1}^{K} \frac{{‖ U^{k (t + 1)} ‖}_{F}^{2}}{2 {‖ U^{k (t)} ‖}_{F}} + λ_{1} \sum_{i = 1}^{p} \frac{{‖ u^{i (t + 1)} ‖}_{2}^{2}}{2 {‖ u^{i (t)} ‖}_{2}} \leq \sum_{j = 1}^{c} [{‖ X u_{j}^{(t)} - Y_{j} v_{j}^{(t)} ‖}_{2}^{2} + γ_{1}^{'} {‖ X u_{j}^{(t)} ‖}_{2}^{2}] + β \sum_{k = 1}^{K} \frac{{‖ U^{k (t)} ‖}_{F}^{2}}{2 {‖ U^{k (t)} ‖}_{F}} + λ_{1} \sum_{i = 1}^{p} \frac{{‖ u^{i (t)} ‖}_{2}^{2}}{2 {‖ u^{i (t)} ‖}_{2}} .

(17)

According to Lemma 1 in [5] ${‖ U^{k (t + 1)} ‖}_{F} - \frac{{‖ U^{k (t + 1)} ‖}_{F}^{2}}{2 {‖ U^{k (t)} ‖}_{F}} \leq {‖ U^{k (t)} ‖}_{F} - \frac{{‖ U^{k (t)} ‖}_{F}^{2}}{2 {‖ U^{k (t)} ‖}_{F}}$ , and ${‖ u^{i (t + 1)} ‖}_{2} - \frac{{‖ u^{i (t + 1)} ‖}_{2}^{2}}{2 {‖ u^{i (t)} ‖}_{2}} \leq {‖ u^{i (t)} ‖}_{2} - \frac{{‖ u^{i (t)} ‖}_{2}^{2}}{2 {‖ u^{i (t)} ‖}_{2}}$ . Then applying two inequations to Eq. (17) with respect to each group and individual feature, we have

\sum_{j = 1}^{c} [{‖ X u_{j}^{(t + 1)} - Y_{j} v_{j}^{(t)} ‖}_{2}^{2} + γ_{1}^{'} {‖ X u_{j}^{(t + 1)} ‖}_{2}^{2}] + β \sum_{k = 1}^{K} {‖ U^{k (t + 1)} ‖}_{F} + λ_{1} \sum_{i = 1}^{p} {‖ u^{i (t + 1)} ‖}_{2} \leq \sum_{j = 1}^{c} [{‖ X u_{j}^{(t)} - Y_{j} v_{j}^{(t)} ‖}_{2}^{2} + γ_{1}^{'} {‖ X u_{j}^{(t)} ‖}_{2}^{2}] + β \sum_{k = 1}^{K} {‖ U^{k (t)} ‖}_{F} + λ_{1} \sum_{i = 1}^{p} {‖ u^{i (t)} ‖}_{2},

(18)

which can be rewritten in matrix form as

\sum_{j = 1}^{c} [{‖ X u_{j}^{(t + 1)} - Y_{j} v_{j}^{(t)} ‖}_{2}^{2} + γ_{1} {‖ X u_{j}^{(t + 1)} ‖}_{2}^{2}] + β {‖ U^{(t + 1)} ‖}_{G_{2, 1}} + λ_{1} {‖ U^{(t + 1)} ‖}_{2, 1} \leq \sum_{j = 1}^{c} [{‖ X u_{j}^{(t)} - Y_{j} v_{j}^{(t)} ‖}_{2}^{2} + γ_{1} {‖ X u_{j}^{(t)} ‖}_{2}^{2}] + β {‖ U^{(t)} ‖}_{G_{2, 1}} + λ_{1} {‖ U^{(t)} ‖}_{2, 1} .

(19)

$γ_{1}^{'}$ is replaced by γ₁ since ${‖ X u_{j}^{(t + 1)} ‖}_{2}^{2}$ and ${‖ X u_{j}^{(t)} ‖}_{2}^{2}$ have been normalized to 1. Thus the objective value is decreased in each iteration regarding updating u. Similarly, we have the following inequality.

\sum_{j = 1}^{c} [{‖ X u_{j}^{(t + 1)} - Y_{j} v_{j}^{(t + 1)} ‖}_{2}^{2} + γ_{2} {‖ Y_{j} v_{j}^{(t + 1)} ‖}_{2}^{2}] + λ_{2} {‖ V^{(t + 1)} ‖}_{2, 1} \leq \sum_{j = 1}^{c} [{‖ X u_{j}^{(t + 1)} - Y_{j} v_{j}^{(t)} ‖}_{2}^{2} + γ_{2} {‖ Y_{j} v_{j}^{(t)} ‖}_{2}^{2}] + λ_{2} {‖ V^{(t)} ‖}_{2, 1} .

(20)

Now based on Eqs. (19-20), we have $L (U^{(t + 1)}, V^{(t + 1)}) \leq L (U^{(t + 1)}, V^{(t)}) \leq L (U^{(t)}, V^{(t)})$ , which completes the proof.

According to Eq. (9), the objective is lower bounded by 0, and thus iteratively decreasing the objective value will converge to a local optimum. The proposed algorithm runs very fast owning to (1) its closed-form solution for each update; and (2) the divide-and-conquer strategy which is supported by Theorem 1.

3. Results

3.1. Benchmarks and Experimental Setup

In order to evaluate the performance of the proposed multi-task SCCA method, we choose the closely related mSCCA [13] and the conventional two-view SCCA as benchmark methods. A common problem of the two-view SCCA and mSCCA is that they suffer from heavy computational and memory requirement issues because they cannot handle the large covariance matrix calculation. To make the comparison available, based on Theorem 1, we implement the fast two-view SCCA and the fast mSCCA. This yields the three benchmark methods in this study, confirming another contribution of this study.

All the methods contain parameters that should be fine tuned before running experiments. We apply the nested 5-fold cross-validation in this work. Specifically, those tuning parameters were determined in the inner loop where a group of them generating the highest mean correlation coefficients, i.e. $CV (λ, β, γ) = \frac{1}{5} \sum_{j = 1}^{5} Corr (X_{- j} u_{j}, Y_{- j} v_{j})$ , will be chosen as the optimal parameters, where X_−j and Y_−j are the j-th subset of the inner testing set, and u_j and v_j are the canonical weights estimated from the inner training set. Once determined, these parameters are used in the external loop to generate the final results. Before tuning parameters, we use some heuristic strategy to reduce the computation burden since blindly tuning them by grid search is computational intensive. For all methods, γ₁ and γ₂ are used to address the scaling issue when calculating the correlation coefficient. On this account, fixing the denominator to be 1 or other integers will just affect the magnitude of U and V, and the relative relationship among each element remains the same. For example, suppose u_1,1 = 5, u_1,2 = 1 and $‖ X u_{1} ‖_{2}^{2} = 20$ ( $‖ u_{1} ‖_{2}^{2} = 20$ for two-view SCCA and mSCCA), tuning $‖ X u_{1} ‖_{2}^{2} = 1$ will lead to u_1,2 = 0.25 and u_1,2 = 0.05; while tuning $‖ X u_{1} ‖_{2}^{2} = 10$ will lead to u_1,1 = 2.5 and u_1,2 = 0.5. This will not affect the feature selection as u_1,1 will always be selected with higher priority than u_1,2. Therefore, we set γ₁ = γ₂ = 1 in this paper. Generally, too large parameters yield over-penalized results while too small ones yield less-penalized results. To avoid this issue, we tune the remaining parameters λ₁, λ₂, β from a moderate range 10ⁱ (i = −5, −4, ⋯ , 0, ⋯ , 4, 5) via the grid search strategy. Finally, in order to make the results stable, we repeat each experiment 100 times and show the average results. In the experiments, all methods are stopped when both $\max_{i} ∣ u_{i}^{(t + 1)} - u_{i}^{(t)} ∣ \leq ϵ (\forall u_{i})$ and $\max_{j} ∣ v_{j}^{(t + 1)} - v_{j}^{(t)} ∣ \leq ϵ (\forall v_{j})$ are satisfied, where ϵ is the tolerable error. We empirically set ϵ = 10⁻⁵ from experiments in this paper.

3.2. Simulation Study

This section present the comparison results on the synthetic data. We generate four data sets with different number of samples and features, sparsity levels and noise levels to assure a thorough comparison. The first three data sets are generated using the same ground truth but with different noise strengthes. The X, Y_j (j = 2) and z of them are all with n = 80, p = 120, q₁ = 100 and q₂ = 100. This could help show the performance when treating with different noises. The fourth data set is created to access the performance under high-dimensional situation, and n = 500, p = 2, 000, q₁ = 1, 000, q₂ = 1, 000 respectively. The details of each data set are described as follows.

Data 1: We first set $u = (\underset{60}{\underset{︸}{0, \dots, 0}}, \underset{20}{\underset{︸}{1, \dots, 1},} \underset{40}{\underset{︸}{0, \dots, 0}})^{⊤}$ , $v_{1} = (\underset{25}{\underset{︸}{0, \dots, 0},} \underset{25}{\underset{︸}{2, \dots, 2},} \underset{50}{\underset{︸}{0, \dots, 0}})^{⊤}$ and $v_{2} = (\underset{25}{\underset{︸}{0, \dots, 0},} \underset{25}{\underset{︸}{3, \dots, 3},} \underset{50}{\underset{︸}{0, \dots, 0}})^{⊤}$ . Then we generate a random latent vector μ of length n and normalize it to unit length. The data matrix X is created by x_ℓ,i ~ N(μ_ℓu_i, σ_x), where σ_x = 5 denotes the noise strength. Similarly, Y_j is created by (y_ℓ,i)_j ~ N(μ_ℓv_i,j, σ_{y_j}) with σ_y₁ = 5 and σ_y₂ = 5.
Data 2 - Data 3: These two data sets are created with the same ground truth as the first one with different noises, i.e. σ_x = σ_y₁ = σ_y₂ = 1 for Data 2 and σ_x = σ_y₁ = σ_y₂ = 0.1 for Data 3. Therefore, the correlation coefficients of these three data sets are different, and that of the first data set is the smallest and that of the third one is the highest.
Data 4: In this data set, $u = (\underset{500}{\underset{︸}{0, \dots, 0}}, \underset{100}{\underset{︸}{1, \dots, 1},} \underset{400}{\underset{︸}{0, \dots, 0},} \underset{100}{\underset{︸}{2, \dots, 2}}, \underset{900}{\underset{︸}{0, \dots, 0}})^{⊤}$ , $v_{1} = (\underset{200}{\underset{︸}{0, \dots, 0},} \underset{300}{\underset{︸}{1.5, \dots, 1.5},} \underset{500}{\underset{︸}{0, \dots, 0}})^{⊤}$ , $v_{2} = (\underset{200}{\underset{︸}{0, \dots, 0},} \underset{300}{\underset{︸}{2.5, \dots, 2.5},} \underset{500}{\underset{︸}{0, \dots, 0}})^{⊤}$ and σ_x = σ_y₁ = σ_y₂ = 0.1. The data matrix X is created by x_ℓ,i ~ N(μ_ℓu_i, σ_x), and Y_j is generated by (y_ℓ,i)_j ~ N(μ_ℓv_i,j, σ_{y_j}), with the random latent vector μ of length n.

We first show the training and testing canonical correlation coefficients (CCCs) in Table 1. On the first three data sets, all methods obtain a good score when the true CC is high, while perform poorly (overfitting) when the true CC is excessively low due to the high percentage of noise. MTSCCA identifies the highest training CCCs among all three unsupervised methods, including both two-view SCCA and mSCCA. This demonstrates that MTSCCA performs better than the two single-task based SCCA methods. In Data 4, we observe that MTSCCA obtains higher training and testing CCCs than two-view SCCA and mSCCA in this high-dimensional data set. This indicates that, owing to the multi-task modeling strategy, the ability of identifying bi-multivariate association can be improved.

TABLE 1.

Performance comparison on synthetic data. Training and testing canonical correlation coefficients (mean±std) of 5-fold cross-validation are shown for SCCA, mSCCA and MTSCCA. The best values are shown in boldface.

	Training Results			Testing Results
	SCCA	mSCCA	MTSCCA	SCCA	mSCCA	MTSCCA
Data 1	0.28±0.06	0.40±0.10	0.99±0.00	0.25±0.14	0.16±0.10	0.23±0.16
Data 2	0.59±0.06	0.49±0.06	0.63±0.06	0.31±0.15	0.25±0.19	0.41±0.18
Data 3	0.95±0.01	0.95±0.01	0.96±0.01	0.91±0.04	0.91±0.05	0.95±0.03
Data 4	0.89±0.02	0.89±0.02	0.99±0.00	0.85±0.05	0.85±0.06	0.97±0.01

Open in a new tab

In addition, the feature selection ability is also of great interest and importance. In Fig. 3, we show the scatter of the estimated u and v_j’s. For the two-view SCCA, each u_j is calculated independently from each single-task SCCA, and u is obtained by averaging u_j’s. The u of MTSCCA is also obtained by averaging u_j’s associating with three SCCA tasks. There are two estimated v_j’s for all methods and we show them separately in Fig. 4. In order to show the performance clearly, the ground truthes are also presented in the figure (first row). Within each subfigure, the horizontal axis represents the indices and the vertical axis represents the weight values. A feature with a larger canonical weight (in absolute value) contributes more to the bi-multivariate correlation. We observe that all methods cannot find out the correct locations of signals in the first data owing to the low signal-to-noise ratio. Combining the results of the first three data sets together, all their performances improve from the first data set to the third one. MTSCCA holds the best canonical profiles being consistent with the ground truthes, showing its better performance in feature selection than the two-view and multiple-view SCCA. In Data 4 where the feature dimensionality is high, MTSCCA always identifies correct signal locations. To make the comparison more formal, Table 2 and 3 show the sensitivity and specificity in terms of canonical weights u and v_j’s. Both metrics are calculated as follows. Features are selected based on their absolute weight values, and the larger the ∣u_i∣ (or ∣v_i∣) is, the more relevant to the canonical correlation. Generally, given a predefined threshold, those features with larger-than-threshold values are selected. However, it is hard to predefine an appropriate threshold. To overcome this issue, in this paper, the sensitivity is calculated via $\frac{# true positive in the top K selected features}{K}$ where K is the number of non-zero features of the ground truth. Similarly, the specificity is calculated by $\frac{K}{# selected features required to cover the ground truth}$ . The results show that all methods obtain good sensitivity and specificity across these simulated data sets. MTSCCA performs slightly better than those single-task based SCCA methods owing to the multi-task modeling strategy. It is worth noting that in the original implementations, both two-view SCCA and multi-view SCCA fail since they cannot treat the large matrix calculation on the same platform as MTSCCA does. By incorporating Theorem 1, the two methods become feasible to high-dimensional data sets. The runtime of each method is shown in Table 4, and there is no significant difference between these methods based on Theorem 1. This again demonstrates the effectiveness and practice of our fast implementation strategy.

Fig. 4. — Canonical weights V (mean value) estimated on synthetic data. The first row is the ground truth, and each remaining row corresponds to an SCCA method: (1) Two-view SCCA, (2) mSCCA (Multi-view SCCA), (3) MTSCCA (Multi-task SCCA). In each subfigure, the horizontal axis represents the indices of v_j (j = 1, 2), and the vertical axis represents the estimated weight value.

TABLE 2.

Comparison of the sensitivity of canonical weights on synthetic data.

	u				v1				v2
	Data 1	Data 2	Data 3	Data 4	Data 1	Data 2	Data 3	Data 4	Data 1	Data 2	Data 3	Data 4
SCCA	0.25	0.45	1.00	1.00	0.88	1.00	0.85	0.99	0.44	0.56	0.84	1.00
mSCCA	0.20	0.45	1.00	1.00	0.60	0.84	0.96	1.00	0.76	0.92	0.96	1.00
MTSCCA	0.05	0.55	1.00	1.00	0.32	0.56	1.00	1.00	0.32	0.64	1.00	1.00

Open in a new tab

TABLE 3.

Comparison of the specificity of canonical weights on synthetic data.

	u				v1				v2
	Data 1	Data 2	Data 3	Data 4	Data 1	Data 2	Data 3	Data 4	Data 1	Data 2	Data 3	Data 4
SCCA	0.85	0.89	1.00	1.00	0.77	0.83	0.96	1.00	0.81	0.85	0.95	1.00
mSCCA	0.84	0.89	1.00	1.00	0.87	0.95	0.99	1.00	0.92	0.97	0.99	1.00
MTSCCA	0.81	0.91	1.00	1.00	0.77	0.85	1.00	1.00	0.77	0.88	1.00	1.00

Open in a new tab

TABLE 4.

Runtime comparison of synthetic data.

	Runtime
	SCCA	mSCCA	MTSCCA
Data 1	0.19±0.24	0.19±0.24	0.19±0.23
Data 2	0.15±0.16	0.16±0.18	0.18±0.22
Data 3	0.11±0.18	0.17±0.18	0.13±0.15
Data 4	1.49±5.58	2.59±5.52	2.59±5.86

Open in a new tab

In summary, this simulation study using data sets with diverse characteristics demonstrates that MTSCCA is effective in bi-multivariate association identification with multiple data modalities. Moreover, MTSCCA identifies the best canonical loading profiles which is consistent with the ground truth compared to the single-task SCCA methods. In addition, it also reveals that the group structure can not only help prompt the identification performance, but also help reduce the time effort in high-dimensional scenario in multi-modal bi-multivariate association analysis.

3.3. Real Neuroimaging Genetics Study

The genotying and brain imaging data used in this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). One primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date information, see www.adni-info.org.

The neuroimaging data were from 755 non-Hispanic Caucasian participants, including 281 AD, 292 MCI and 182 healthy control (HC). They were 18-F florbetapir PET (AV-45) scans, fluorodeoxyglucose positron emission tomography (FDG) scans, and structural MRI scans which were downloaded from the ADNI database (adni.loni.usc.edu). Details of this data set are exhibited in Table 5. The multimodality imaging data were aligned to each participant’s same visit. The structural MRI scans were processed with voxel-based morphometry (VBM) via SPM [27]. Generally, all scans had been aligned to a T1-weighted template image, segmented into gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) maps, normalized to the standard Montreal Neurological Institute (MNI) space as 2×2×2 mm³ voxels, and had been smoothed with an 8mm FWHM kernel. The FDG-PET and AV45-PET scans were also registered into the same MNI space by SPM. We then subsampled the whole brain and generated 116 regions of interest (ROI) level measurements based on the MarsBaR automated anatomical labeling (AAL) atlas. They were the mean gray matter densities for structural MRI, amyloid values for AV45 scans and glucose utilization for FDG scans. Using the regression weights derived from the healthy control participants, these imaging measures were pre-adjusted to remove the effects of the baseline age, gender, education, and handedness.

TABLE 5.

Participant characteristics.

	HC	MCI	AD
Num	182	292	281
Gender(M/F, %)	47.16/52.84	54.52/45.48	47.37/52.63
Handedness(R/L, %)	90.91/9.09	87.35/12.65	91.50/8.50
Age (mean±std)	72.97±6.00	71.81±7.62	72.38±7.31
Education (mean±std)	16.52±2.58	15.97±2.78	16.14±2.78

Open in a new tab

The genotyping data of the same population were downloaded from the LONI website. They were genotyped using the Human 610-Quad or OmniExpress Array (Illumina, Inc., San Diego, CA, USA), and preprocessed using the standard quality control (QC) and imputation steps. The QC criteria for the SNP data include (1) call rate check per subject and per SNP marker, (2) gender check, (3) sibling pair identification, (4) the Hardy-Weinberg equilibrium test, (5) marker removal by the minor allele frequency and (6) population stratification. In second pre-processing step, following the quality-controlled SNPs, those missing genotypes were imputed using the MaCH software [28]. Among all human chromosomes, the chromosome 19 sequence contains the most number of genes, in which the gene density is more than double the genome-wide average [29], [30]. In addition, this chromosome also includes the well-known AD risk genes such as APOE, TOMM40 and ABCA7. Therefore, a bi-multivariate association study between this chromosome and whole brain imaging markers could be of great interest, and has potential to yield interesting AD risk factors. As a result, all the SNPs from chromosome 19 were included, i.e. 152,787 SNPs were involved in this study. Among these enormous SNPs, most of them might be irrelevant to AD, while only a few of them could be relevant via influencing the intermediate brain imaging measurements. The aim is to identify this small subset of SNPs in chromosome 19 correlating to imaging markers and AD.

3.4. Improved Bi-multivariate Association

In this subsection we evaluate the proposed method in identifying the bi-multivariate associations between one genetic data and three sets of imaging phenotypes. Thus there will be three pairs of associations, and we denote them as SNPs-AV45, SNPs-FDG and SNPs-VBM for the sake of description. For the three SCCA tasks, the proposed MTSCCA learns them together and generate a canonical weights matrix U for SNPs and one canonical weight vector v_j for AV45, FDG and VBM. We then calculate three canonical correlation coefficients (CCCs) in terms of SNPs-AV45, SNPs-FDG and SNPs-VBM separately. The two-view SCCA naturally yields three CCCs for these three tasks. Though the mSCCA only learns one canonical weight vector for SNPs, we use it three times to generate three CCCs with respect to the three tasks.

Fig. 5 shows the CCCs of the SNPs data with each imaging QTs data, where CCCs estimated from SNPs-AV45, SNPs-FDG and SNPs-VBM are separately shown. In this figure, both the training CCCs and testing CCCs, as well as their standard deviations (SD) are presented. By changing the number of selected features (10, 20, ⋯ , 100 in this work) for both SNPs and imaging QTs, the CCCs can be generated and then these curves are plotted. It is clear that the proposed MTSCCA obtains higher CCCs on both training and testing sets across all imaging modalities except for training results of SNPs-VBM. After investigation, this could be that the two-view SCCA runs into overfitting since it holds high training CCCs and quite low CCCs simultaneously. We also observe that mSCCA always obtains the lowest CCCs on both training and testing sets across three tasks in this data. This is very interesting as it seems to violate the truth because more data (three different imaging QTs here) ought to provide more information. The reason might attribute to its modelling strategy. Demanding one set of features (SNPs) being associated with three sets of features (imaging QTs) simultaneously could be overstrict and thus harm the performance. This is also the reason that two-SCCA generally holds better CCCs than mSCCA.

In addition, we calculate the p-values between our method and two competing methods and show them in Table 6, where the ‘-’ in parenthesis indicates that MTSCCA fails. The p-values are all reach the significance level which means that our method is significantly better than both competing methods. These results in terms of CCCs indicate that the proposed joint bi-multivariate learning method indeed has better association identification capability than those SCCA methods, including both two-view and multiple-view ones. Table 7 shows the runtime in seconds of each method, where that of the two-view SCCA are summation of three two-view tasks. The runtime results indicate that all three methods run fast on this large data set. This attributes to the grouping strategy used in the implementation according to Theorem 1. In contrast, both competing methods are incapacitated in their original implementations since they cannot manipulate a big matrix with hundreds of thousands of features included. This again assures our contribution to accelerate both our method and conventional methods via making use of the grouping structures.

TABLE 6.

The p-values of t-tests for CCCs comparison between MTSCCA and Two-view SCCA and mCCA. The ‘−’ in parenthesis means that MTSCCA loses on this trial.

	SNPs-AV45	SNPs-FDG	SNPs-VBM
	Training
Two-view SCCA	5.46E-24	3.39E-25	6.00E-15 (−)
mSCCA	7.98E-27	1.51E-27	4.77E-18
	Testing
Two-view SCCA	1.46E-23	8.60E-43	4.99E-24
mSCCA	3.71E-27	3.91E-31	4.80E-22

Open in a new tab

TABLE 7.

Runtime comparison with the mean±SD being presented.

Runtime (seconds)
Two-view SCCA	mSCCA	MTSCCA
342±0.37	114±0.30	361±0.93

Open in a new tab

3.5. Genetic Marker Selection

Apart from the CCCs, the selected features in terms of SNPs are a major concern. This can help reveal those SNPs being highly related to imaging QTs and AD status at the same time. We show the top ten selected SNPs according to the canonical weight values of each individual method. In order to make the selection results stable, we average the canonical weight matrix into a vector and then choose the top ten SNPs based on their absolute values for MTSCCA. The top ten markers of two-view SCCA method are calculated via averaging the three separate canonical weights. Those of mSCCA are obtained by its canonical weight vector. The results of selected SNPs are shown in Table 8. Owning to the jointly learning paradigm, the proposed MTSCCA yields a surprisingly meaningful result with respect to selected features (SNPs). As expected, the notable AD risk markers rs429358 gains the highest weight value, and all of the remaining nine SNPs of MTSCCA, i.e. rs56131196 (APOC1), rs12721051 (APOC1), rs4420638 (APOC1), rs111789331 (4.5 kb of APOC1), rs66626994 (5.6 kb of APOC1), rs146275714 (PVRL2), rs41289512, rs147711004 (71 kb of APOE) and rs10119 (TOMM40), have been reported to show increasing risk of AD in previous studies [31], [32], [33]. This indicates the ability of MTSCCA in identifying meaningful SNPs from massive genetic markers. The two-view SCCA also identifies the rs429358 as its first important SNPs, and five other AD related SNPs (rs10414043, rs147711004, rs7256200, rs73052335 and rs66626994) have been reported previously. But it identifies four SNPs that are not reported by now and thus further investigation should be taken place. The mSCCA performs unacceptably in this comparison since it does not find out rs429358. Moreover, except for the marker rs623264, all identified SNPs of mSCCA have not been reported yet in the current stage. In summary, the results in terms of selected SNPs show that MTSCCA performs better than both competing methods. This reveals that MTSCCA could be a suitable tool and very helpful in discovering meaningful genetic markers in a very large scenario.

TABLE 8.

Top ten SNPs selected by integrated canonical weights.

Two-view SCCA	mSCCA	MTSCCA
rs429358	rs138339429	rs429358
rs10414043	rs141300647	rs56131196
rs147711004	rs58501143	rs12721051
rs146291812	rs17363184	rs4420638
rs623264	rs623264	rs111789331
rs7256200	rs11881833	rs66626994
rs186235601	rs7253576	rs146275714
rs73052335	rs1749316	rs41289512
rs66626994	rs139402102	rs147711004
rs415966	rs4605289	rs10119

Open in a new tab

3.6. Brain Imaging Marker Selection

Besides the genetic markers, as a bi-multivariate method, MTSCCA also selects features from the multiple imaging QTs. Fig. 6 presents the canonical weights of every method on each imaging modality (AV45, FDG and VBM) across the five trials. We observe that all those imaging markers with nonzero coefficients have been shown to be associated with the progression of AD. To make it clear, we show the top ten selected QTs of each imaging modal data of MTSCCA in Table 9. There are five markers (the right angular gyrus, the left posterior cingulum cortex, the left hippocampus, the left olfactory cortex and the vermis 8) reported in all three modalities owning to the joint feature selection via the ℓ_2,1-norm regularization. Most importantly, these markers are have all been documented to be related to AD in the literature independently. For example, the significant reduction of glucose metabolism in the right angular gyrus has been observed in aging-associated cognitive decline (AACD) patients [34]. The declined metabolism in the left posterior cingulum cortex is an early sign of Alzheimer’s disease [35]. This brain tract is also connected to the hippocampus which is a notable sign of AD and MCI [36], [37]. The remaining left olfactory cortex [38] and vermis 8 [39], have been separately validated to be a reflection of AD or MCI. These results indicate that MTSCCA could find out meaningful imaging QTs markers that are associated with the status of dementia. The mSCCA also identifies a few of AD related markers such as the hippocampus. The results of the two-view method are rambling and thus lack of biological meanings. To summarize, the proposed MTSCCA can not only obtain higher CCCs than conventional SCCA methods, but also yield better canonical weights for both SNPs and imaging QTs. The top ten selected SNPs and imaging QTs are highly correlated with each other, as well as AD status, which demonstrates that MTSCCA could be very promising in brain imaging genetics.

TABLE 9.

Top ten imaging QTs selected by canonical weights of each imaging modality of MTSCCA.

AV45	FDG	VBM
Frontal_Med_Orb_Left	Cingulum_Post_Left	Postcentral_Left
Angular_Right	Angular_Right	Precentral_Left
Cingulum_Post_Left	Hippocampus_Left	Angular_Right
Hippocampus_Left	Vermis_8	Cingulum_Post_Left
Olfactory_Left	Angular_Left	Vermis_8
Frontal_Mid_Right	Amygdala_Left	Thalamus_Right
Cingulum_Ant_Left	Olfactory_Left	Rolandic_Oper_Right
Rolandic_Oper_Right	Temporal_Mid_Right	Frontal_Med_Orb_Left
Temporal_Mid_Right	Precentral_Left	Hippocampus_Left
Vermis_8	Temporal_Mid_Left	Olfactory_Left

Open in a new tab

4. Conclusion

High-throughput genotyping technique and neuroimaging techniques provide us a large amount of biomedical data, and finding out their bi-multivariate associations is important. In this paper, we have proposed a novel multi-task sparse canonical correlation analysis (MTSCCA) framework and apply it to imaging genetics with multi-modal brain imaging QTs. Different from existing SCCA, MTSCCA can incorporate multiple sets of imaging modalities data into a single integrative model. Furthermore, MTSCCA is a multiple bi-multivariate method and thus has better modeling capability than both SCCA and MTL regression. A fast optimization algorithm is proposed which avoids calculating the large covariance and its inverse. The algorithm is guaranteed to converge to a local optimum, and runs very fast with hundreds of thousands of features involved.

We compared MTSCCA with the conventional two-view and multi-view SCCA on an ADNI cohort. Our method obtained better performance than the benchmarks with higher correlation coefficients and clearer canonical weight patterns. MTSCCA succeeds in identifying a small set of SNPs from enormous genetic markers from the 19th chromosome. It is worth noting that all top ten selected SNPs of MTSCCA are AD risk factors. In addition, the canonical weight patterns of imaging QTs were also of great success. The identified imaging QTs were highly correlated to AD or MCI. These promising results demonstrated that the proposed multitask SCCA framework could be a powerful tool in big brain imaging genetics. Since a GWAS based bi-multivariate analysis is of much concern, in the future work, we will keep looking into the merit of MTSCCA and use it to genemo-wide brain-wide imaging analysis.

Acknowledgments

Data collection and sharing for this project was funded by the Alzheimer‘s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. HoffmannLa Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

This work was supported by the National Natural Science Foundation of China [61973255, 61602384]; NSF in Shaanxi Province of China [2017JQ6001]; CPSF [2017M613202]; PSF of Shaanxi [2017BSHEDZZ81]; and FR-FCU [3102018zy029] at Northwestern Polytechnical University. This work was also supported by the National Institutes of Health [R01 EB022574, R01 LM011360, R01 AG063481, U01 AG024904, P30 AG10133, R01 AG19771]; and the National Science Foundation [IIS 1837964] at University of Pennsylvania and Indiana University.

Biography

graphic file with name nihms-1060041-b0001.gif

Lei Du received the Ph.D. degree in computer science from School of the Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China, in 2013. Currently, he is an assistant professor in School of Automation, Northwestern Polytechnical University, Xi’an, China. His research interests include brain imaging genetics, bioinformatics, machine learning and big data mining.

graphic file with name nihms-1060041-b0002.gif

Kefei Liu is a postdoctoral research associate of the Department of Biostatistics, Epidemiology and Informatics at the University of Pennsylvania. He received B.Sc. in mathematics from Wuhan University and Ph.D. in electronic engineering from the City University of Hong Kong. He is interested in developing machine learning and statistical methods for the analysis of large-scale heterogeneous biological data.

graphic file with name nihms-1060041-b0003.gif

Xiaohui Yao received a B.S. degree in Computer Science and Technology from Qing Dao University, an M.S. degree in Computer Software and Theory from University of Science and Technology of China, and a Ph.D. degree in Bioinformatics from Indiana University. She is a Postdoctoral Fellow in the Department of Biostatistics, Epidemiology and Informatics at University of Pennsylvania. Her research interests include imaging genetics, multidimensional data mining, systems biology and information visualization.

graphic file with name nihms-1060041-b0004.gif

Shannon L. Risacher received a B.S. degree in Psychology from Indiana University-Purdue University Indianapolis, and a Ph.D. degree in Medical Neuroscience from Indiana University School of Medicine. She is an Assistant Professor of Radiology and Imaging Sciences at Indiana University School of Medicine. Her main research interest is in identifying biomarkers for early detection of Alzheimer’s disease pathology before clinical symptoms.

graphic file with name nihms-1060041-b0005.gif

Junwei Han received his Ph.D degrees in pattern recognition and intelligent systems from the School of Automation, Northwestern Polytechnical University, Xi’an, China, in 2003. He is currently a professor in School of Automation, Northwestern Polytechnical University. His research interests include computer vision and multi-media processing.

graphic file with name nihms-1060041-b0006.gif

Andrew J. Saykin received a B.A. degree in Psychology from University of Massachusetts Amherst, and an M.S. degree in Clinical Psychology and a Psy.D. degree in Clinical Neuropsychology from Hahnemann Medical College. He is the Raymond C. Beeler Professor of Radiology and Professor of Medical and Molecular Genetics at Indiana University School of Medicine. His expertise is in the areas of multimodal neuroimaging research, human genetics, and neuropsychology/cognitive neuroscience. He has a longstanding interest in the structural, functional, and molecular substrates of cognitive deficits in Alzheimer’s disease, cancer, brain injury, schizophrenia, and other neurological and neuropsychiatric disorders.

graphic file with name nihms-1060041-b0007.gif

Lei Guo received B.S., M.S., and Ph.D. degrees in 1982, 1986, and 1993, respectively. He is currently a professor of pattern recognition at Northwestern Polytechnical University, Xi’an, China. His research interests include computer vision, image processing, image segmentation, object detection and tracking.

graphic file with name nihms-1060041-b0008.gif

Li Shen received a B.S. degree from Xi’an Jiao Tong University, an M.S. degree from Shanghai Jiao Tong University, and a Ph.D. degree from Dartmouth College, all in Computer Science. He is a Professor of Informatics at University of Pennsylvania Perelman School of Medicine. He is an elected fellow of the American Institute for Medical and Biological Engineering (AIMBE). His research interests include medical image computing, bioinformatics, machine learning, brain imaging genomics, and big data science in biomedicine.

Footnotes

^†

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

Contributor Information

Lei Du, School of Automation, Northwestern Polytechnical University, Xi’an, 710072 China..

Kefei Liu, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA..

Xiaohui Yao, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA..

Shannon L. Risacher, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

Junwei Han, School of Automation, Northwestern Polytechnical University, Xi’an, 710072 China..

Andrew J. Saykin, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.

Lei Guo, School of Automation, Northwestern Polytechnical University, Xi’an, 710072 China..

Li Shen, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA..

References

[1].Saykin AJ, Shen L, Yao X, Kim S, Nho K, and et al. , “Genetic studies of quantitative MCI and AD phenotypes in ADNI: Progress, opportunities, and plans,” Alzheimer’s & Dementia, vol. 11, no. 7, pp. 792–814, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Shen L, Thompson PM, Potkin SG, Bertram L, Farrer LA, and et al. , “Genetic analysis of quantitative phenotypes in ad and mci: imaging, cognition and biomarkers,” Brain Imaging and Behavior, vol. 8, no. 2, pp. 183–207, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Mueller SG, Weiner MW, Thal LJ, Petersen RC, Jack C, Jagust W, Trojanowski JQ, Toga AW, and Beckett L, “The alzheimer’s disease neuroimaging initiative,” Neuroimaging Clinics of North America, vol. 15, no. 4, pp. 869–877, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Lee S, Zhu J, and Xing EP, “Adaptive multi-task lasso: with application to eqtl detection,” in NIPS, 2010, pp. 1306–1314. [Google Scholar]
[5].Wang H, Nie F, Huang H, Kim S, Nho K, Risacher SL, Saykin AJ, and Shen L, “Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort,” Bioinformatics, vol. 28, no. 2, pp. 229–237, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Chen J, Bushman FD, Lewis JD, Wu GD, and Li H, “Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis,” Biostatistics, vol. 14, no. 2, pp. 244–258, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Chen X and Liu H, “An efficient optimization algorithm for structured sparse cca, with applications to eqtl mapping,” Statistics in Biosciences, vol. 4, no. 1, pp. 3–26, 2012. [Google Scholar]
[8].Lin D, Calhoun VD, and Wang YP, “Correspondence between fMRI and SNP data by group sparse canonical correlation analysis,” Medical Image Analysis, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Du L, Huang H, Yan J, Kim S, Risacher SL, Inlow M, Moore JH, Saykin AJ, and Shen L, “Structured sparse canonical correlation analysis for brain imaging genetics: An improved graphnet method,” Bioinformatics, vol. 32, no. 10, pp. 1544–1551, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Du L, Liu K, Zhang T, Yao X, Yan J, Risacher SL, Han J, Guo L, Saykin AJ, and Shen L, “A novel SCCA approach via truncated ℓ₁-norm and truncated group lasso for brain imaging genetics,” Bioinformatics, vol. 34, no. 2, pp. 278–285, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Du L, Yan J, Kim S, Risacher SL, Huang H, Inlow M, Moore JH, Saykin AJ, and Shen L, “A novel structure-aware sparse learning algorithm for brain imaging genetics,” in MICCAI, 2014, pp. 329–336. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Du L, Zhang T, Liu K, Yan J, Yao X, Risacher SL, Saykin AJ, Han J, Guo L, and Shen L, “Identifying associations between brain imaging phenotypes and genetic factors via a novel structured scca approach,” in IPMI. Springer, 2017, pp. 543–555. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Witten DM and Tibshirani RJ, “Extensions of sparse canonical correlation analysis with applications to genomic data,” Statistical Applications in Genetics and Molecular Biology, vol. 8, no. 1, pp. 1–27, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Wilms I and Croux C, “Sparse canonical correlation analysis from a predictive point of view,” Biometrical Journal, vol. 57, no. 5, pp. 834–851, 2015. [DOI] [PubMed] [Google Scholar]
[15].Mai Q and Zhang X, “An iterative penalized least squares approach to sparse canonical correlation analysis,” Biometrical Journal, vol. 57, no. 5, pp. 834–851, 2019. [DOI] [PubMed] [Google Scholar]
[16].Du L, Liu K, Yao X, Risacher SL, Han J, Guo L, Saykin AJ, and Shen L, “Fast multi-task SCCA learning with feature selection for multi-modal brain imaging genetics,” in BIBM. IEEE, 2018, pp. 356–361. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Du L, Liu K, Zhu L, Yao X, Risacher SL, Guo L, Saykin AJ, and Shen L, “Identifying progressive imaging genetic patterns via multi-task sparse canonical correlation analysis: a longitudinal study of the adni cohort,” Bioinformatics, vol. 35, no. 14, pp. i474–i483, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Pritchard JK and Przeworski M, “Linkage disequilibrium in humans: Models and data,” American Journal of Human Genetics, vol. 69, no. 1, pp. 1–14, 2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Weiner MW, Aisen PS, Jack CR, Jagust WJ, Trojanowski JQ, Shaw L, Saykin AJ, Morris JC, Cairns N, Beckett LA, Toga AW, Green RC, Walter S, Soares H, Snyder PJ, Siemers E, Potter WZ, Cole PE, and Schmidt ME, “The Alzheimer’s Disease Neuroimaging Initiative: Progress report and future plans,” Alzheimer’s & Dementia, vol. 6, no. 3, pp. 202–211, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Parkhomenko E, Tritchler D, and Beyene J, “Sparse canonical correlation analysis with application to genomic data integration,” Statistical Applications in Genetics and Molecular Biology, vol. 8, no. 1, pp. 1–34, 2009. [DOI] [PubMed] [Google Scholar]
[21].Ando RK and Zhang T, “A framework for learning predictive structures from multiple tasks and unlabeled data,” Journal of Machine Learning Research, vol. 6, pp. 1817–1853, 2005. [Google Scholar]
[22].Bakker B and Heskes T, “Task clustering and gating for bayesian multitask learning,” Journal of Machine Learning Research, vol. 4, pp. 83–99, 2003. [Google Scholar]
[23].Bendavid S and Schuller R, “Exploiting task relatedness for multiple task learning,” COLT, pp. 567–580, 2003. [Google Scholar]
[24].Argyriou A, Evgeniou T, and Pontil M, “Multi-task feature learning.” NIPS, vol. 73, no. 3, pp. 41–48, 2006. [Google Scholar]
[25].Rosenfeld JA, Mason CE, and Smith TM, “Limitations of the human reference genome for personalized genomics.” PLOS One, vol. 7, no. 7, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26].Yuan M and Lin Y, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006. [Google Scholar]
[27].Ashburner J and Friston KJ, “Voxel-based morphometry–the methods,” NeuroImage, vol. 11, no. 6, pp. 805–21, 2000. [DOI] [PubMed] [Google Scholar]
[28].Li Y, Willer CJ, Ding J, Scheet P, and Abecasis GR, “MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes,” Genetic Epidemiology, vol. 34, no. 8, pp. 816–34, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Grimwood J, Gordon LA, Olsen A, Terry A, Schmutz J, Lamerdin J, Hellsten U, Goodstein D, Couronne O, Tran-Gyamfi M et al. , “The dna sequence and biology of human chromosome 19,” Nature, vol. 428, no. 6982, p. 529, 2004. [DOI] [PubMed] [Google Scholar]
[30].Venter JC, “The sequence of the human genome,” Science, vol. 292, no. 5523, pp. 1838–1838, 2001. [Google Scholar]
[31].Gao L, Cui Z, Shen L, and Ji H-F, “Shared genetic etiology between type 2 diabetes and alzheimer’s disease identified by bioinformatics analysis,” Journal of Alzheimer’s Disease, vol. 50, no. 1, pp. 13–17, 2016. [DOI] [PubMed] [Google Scholar]
[32].Zhou X, Chen Y, Mok KY, Kwok TC, Mok VC, Guo Q, Ip FC, Chen Y, Mullapudi N, Giusti-Rodríguez P et al. , “Non-coding variability at the apoe locus contributes to the alzheimer’s risk,” Nature communications, vol. 10, no. 1, p. 3310, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Davies G, Armstrong N, Bis JC, Bressler J, Chouraki V, Giddaluru S, Hofer E, Ibrahim-Verbaas CA, Kirin M, Lahti J et al. , “Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the charge consortium (n= 53 949),” Molecular Psychiatry, vol. 20, no. 2, p. 183, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Hunt A, Schnknecht P, Henze M, Seidl U, Haberkorn U, and Schrder J, “Reduced cerebral glucose metabolism in patients at risk for alzheimer’s disease,” Psychiatry Research Neuroimaging, vol. 155, no. 2, pp. 147–154, 2007. [DOI] [PubMed] [Google Scholar]
[35].Nakao T, Radua J, Rubia K, and Mataix-Cols D, “Gray matter volume abnormalities in adhd: voxel-based meta-analysis exploring the effects of age and stimulant medication.” American Journal of Psychiatry, vol. 168, no. 11, pp. 1154–1163, 2011. [DOI] [PubMed] [Google Scholar]
[36].Delano-Wood L, Stricker NH, Sorg SF, Nation DA, Jak AJ, Woods SP, Libon DJ, Delis DC, Frank LR, and Bondi MW, “Posterior cingulum white matter disruption and its associations with verbal memory and stroke risk in mild cognitive impairment,” Journal of Alzheimer’s Disease, vol. 29, no. 3, pp. 589–603, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Frisoni GB, Ganzola R, Canu E, Rub U, Pizzini FB, Alessandrini F, Zoccatelli G, Beltramello A, Caltagirone C, and Thompson PM, “Mapping local hippocampal changes in alzheimer’s disease and normal ageing with mri at 3 tesla,” Brain, vol. 131, no. 12, pp. 3266–3276, 2008. [DOI] [PubMed] [Google Scholar]
[38].Vasavada MM, Wang J, Eslinger PJ, Gill DJ, Sun X, Karunanayaka P, and Yang QX, “Olfactory cortex degeneration in alzheimer’s disease and mild cognitive impairment,” Journal of Alzheimer’s Disease, vol. 45, no. 3, pp. 947–58, 2015. [DOI] [PubMed] [Google Scholar]
[39].Sjobeck M and Englund E, “Alzheimer’s disease and the cerebellum: a morphologic study on neuronal and glial changes,” Dementia and Geriatric Cognitive Disorders, vol. 12, no. 3, pp. 211–218, 2001. [DOI] [PubMed] [Google Scholar]

[R1] [1].Saykin AJ, Shen L, Yao X, Kim S, Nho K, and et al. , “Genetic studies of quantitative MCI and AD phenotypes in ADNI: Progress, opportunities, and plans,” Alzheimer’s & Dementia, vol. 11, no. 7, pp. 792–814, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Shen L, Thompson PM, Potkin SG, Bertram L, Farrer LA, and et al. , “Genetic analysis of quantitative phenotypes in ad and mci: imaging, cognition and biomarkers,” Brain Imaging and Behavior, vol. 8, no. 2, pp. 183–207, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Mueller SG, Weiner MW, Thal LJ, Petersen RC, Jack C, Jagust W, Trojanowski JQ, Toga AW, and Beckett L, “The alzheimer’s disease neuroimaging initiative,” Neuroimaging Clinics of North America, vol. 15, no. 4, pp. 869–877, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Lee S, Zhu J, and Xing EP, “Adaptive multi-task lasso: with application to eqtl detection,” in NIPS, 2010, pp. 1306–1314. [Google Scholar]

[R5] [5].Wang H, Nie F, Huang H, Kim S, Nho K, Risacher SL, Saykin AJ, and Shen L, “Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort,” Bioinformatics, vol. 28, no. 2, pp. 229–237, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Chen J, Bushman FD, Lewis JD, Wu GD, and Li H, “Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis,” Biostatistics, vol. 14, no. 2, pp. 244–258, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Chen X and Liu H, “An efficient optimization algorithm for structured sparse cca, with applications to eqtl mapping,” Statistics in Biosciences, vol. 4, no. 1, pp. 3–26, 2012. [Google Scholar]

[R8] [8].Lin D, Calhoun VD, and Wang YP, “Correspondence between fMRI and SNP data by group sparse canonical correlation analysis,” Medical Image Analysis, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Du L, Huang H, Yan J, Kim S, Risacher SL, Inlow M, Moore JH, Saykin AJ, and Shen L, “Structured sparse canonical correlation analysis for brain imaging genetics: An improved graphnet method,” Bioinformatics, vol. 32, no. 10, pp. 1544–1551, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Du L, Liu K, Zhang T, Yao X, Yan J, Risacher SL, Han J, Guo L, Saykin AJ, and Shen L, “A novel SCCA approach via truncated ℓ₁-norm and truncated group lasso for brain imaging genetics,” Bioinformatics, vol. 34, no. 2, pp. 278–285, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].Du L, Yan J, Kim S, Risacher SL, Huang H, Inlow M, Moore JH, Saykin AJ, and Shen L, “A novel structure-aware sparse learning algorithm for brain imaging genetics,” in MICCAI, 2014, pp. 329–336. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Du L, Zhang T, Liu K, Yan J, Yao X, Risacher SL, Saykin AJ, Han J, Guo L, and Shen L, “Identifying associations between brain imaging phenotypes and genetic factors via a novel structured scca approach,” in IPMI. Springer, 2017, pp. 543–555. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Witten DM and Tibshirani RJ, “Extensions of sparse canonical correlation analysis with applications to genomic data,” Statistical Applications in Genetics and Molecular Biology, vol. 8, no. 1, pp. 1–27, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Wilms I and Croux C, “Sparse canonical correlation analysis from a predictive point of view,” Biometrical Journal, vol. 57, no. 5, pp. 834–851, 2015. [DOI] [PubMed] [Google Scholar]

[R15] [15].Mai Q and Zhang X, “An iterative penalized least squares approach to sparse canonical correlation analysis,” Biometrical Journal, vol. 57, no. 5, pp. 834–851, 2019. [DOI] [PubMed] [Google Scholar]

[R16] [16].Du L, Liu K, Yao X, Risacher SL, Han J, Guo L, Saykin AJ, and Shen L, “Fast multi-task SCCA learning with feature selection for multi-modal brain imaging genetics,” in BIBM. IEEE, 2018, pp. 356–361. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Du L, Liu K, Zhu L, Yao X, Risacher SL, Guo L, Saykin AJ, and Shen L, “Identifying progressive imaging genetic patterns via multi-task sparse canonical correlation analysis: a longitudinal study of the adni cohort,” Bioinformatics, vol. 35, no. 14, pp. i474–i483, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Pritchard JK and Przeworski M, “Linkage disequilibrium in humans: Models and data,” American Journal of Human Genetics, vol. 69, no. 1, pp. 1–14, 2001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Weiner MW, Aisen PS, Jack CR, Jagust WJ, Trojanowski JQ, Shaw L, Saykin AJ, Morris JC, Cairns N, Beckett LA, Toga AW, Green RC, Walter S, Soares H, Snyder PJ, Siemers E, Potter WZ, Cole PE, and Schmidt ME, “The Alzheimer’s Disease Neuroimaging Initiative: Progress report and future plans,” Alzheimer’s & Dementia, vol. 6, no. 3, pp. 202–211, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Parkhomenko E, Tritchler D, and Beyene J, “Sparse canonical correlation analysis with application to genomic data integration,” Statistical Applications in Genetics and Molecular Biology, vol. 8, no. 1, pp. 1–34, 2009. [DOI] [PubMed] [Google Scholar]

[R21] [21].Ando RK and Zhang T, “A framework for learning predictive structures from multiple tasks and unlabeled data,” Journal of Machine Learning Research, vol. 6, pp. 1817–1853, 2005. [Google Scholar]

[R22] [22].Bakker B and Heskes T, “Task clustering and gating for bayesian multitask learning,” Journal of Machine Learning Research, vol. 4, pp. 83–99, 2003. [Google Scholar]

[R23] [23].Bendavid S and Schuller R, “Exploiting task relatedness for multiple task learning,” COLT, pp. 567–580, 2003. [Google Scholar]

[R24] [24].Argyriou A, Evgeniou T, and Pontil M, “Multi-task feature learning.” NIPS, vol. 73, no. 3, pp. 41–48, 2006. [Google Scholar]

[R25] [25].Rosenfeld JA, Mason CE, and Smith TM, “Limitations of the human reference genome for personalized genomics.” PLOS One, vol. 7, no. 7, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] [26].Yuan M and Lin Y, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006. [Google Scholar]

[R27] [27].Ashburner J and Friston KJ, “Voxel-based morphometry–the methods,” NeuroImage, vol. 11, no. 6, pp. 805–21, 2000. [DOI] [PubMed] [Google Scholar]

[R28] [28].Li Y, Willer CJ, Ding J, Scheet P, and Abecasis GR, “MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes,” Genetic Epidemiology, vol. 34, no. 8, pp. 816–34, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Grimwood J, Gordon LA, Olsen A, Terry A, Schmutz J, Lamerdin J, Hellsten U, Goodstein D, Couronne O, Tran-Gyamfi M et al. , “The dna sequence and biology of human chromosome 19,” Nature, vol. 428, no. 6982, p. 529, 2004. [DOI] [PubMed] [Google Scholar]

[R30] [30].Venter JC, “The sequence of the human genome,” Science, vol. 292, no. 5523, pp. 1838–1838, 2001. [Google Scholar]

[R31] [31].Gao L, Cui Z, Shen L, and Ji H-F, “Shared genetic etiology between type 2 diabetes and alzheimer’s disease identified by bioinformatics analysis,” Journal of Alzheimer’s Disease, vol. 50, no. 1, pp. 13–17, 2016. [DOI] [PubMed] [Google Scholar]

[R32] [32].Zhou X, Chen Y, Mok KY, Kwok TC, Mok VC, Guo Q, Ip FC, Chen Y, Mullapudi N, Giusti-Rodríguez P et al. , “Non-coding variability at the apoe locus contributes to the alzheimer’s risk,” Nature communications, vol. 10, no. 1, p. 3310, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Davies G, Armstrong N, Bis JC, Bressler J, Chouraki V, Giddaluru S, Hofer E, Ibrahim-Verbaas CA, Kirin M, Lahti J et al. , “Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the charge consortium (n= 53 949),” Molecular Psychiatry, vol. 20, no. 2, p. 183, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] [34].Hunt A, Schnknecht P, Henze M, Seidl U, Haberkorn U, and Schrder J, “Reduced cerebral glucose metabolism in patients at risk for alzheimer’s disease,” Psychiatry Research Neuroimaging, vol. 155, no. 2, pp. 147–154, 2007. [DOI] [PubMed] [Google Scholar]

[R35] [35].Nakao T, Radua J, Rubia K, and Mataix-Cols D, “Gray matter volume abnormalities in adhd: voxel-based meta-analysis exploring the effects of age and stimulant medication.” American Journal of Psychiatry, vol. 168, no. 11, pp. 1154–1163, 2011. [DOI] [PubMed] [Google Scholar]

[R36] [36].Delano-Wood L, Stricker NH, Sorg SF, Nation DA, Jak AJ, Woods SP, Libon DJ, Delis DC, Frank LR, and Bondi MW, “Posterior cingulum white matter disruption and its associations with verbal memory and stroke risk in mild cognitive impairment,” Journal of Alzheimer’s Disease, vol. 29, no. 3, pp. 589–603, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] [37].Frisoni GB, Ganzola R, Canu E, Rub U, Pizzini FB, Alessandrini F, Zoccatelli G, Beltramello A, Caltagirone C, and Thompson PM, “Mapping local hippocampal changes in alzheimer’s disease and normal ageing with mri at 3 tesla,” Brain, vol. 131, no. 12, pp. 3266–3276, 2008. [DOI] [PubMed] [Google Scholar]

[R38] [38].Vasavada MM, Wang J, Eslinger PJ, Gill DJ, Sun X, Karunanayaka P, and Yang QX, “Olfactory cortex degeneration in alzheimer’s disease and mild cognitive impairment,” Journal of Alzheimer’s Disease, vol. 45, no. 3, pp. 947–58, 2015. [DOI] [PubMed] [Google Scholar]

[R39] [39].Sjobeck M and Englund E, “Alzheimer’s disease and the cerebellum: a morphologic study on neuronal and glial changes,” Dementia and Geriatric Cognitive Disorders, vol. 12, no. 3, pp. 211–218, 2001. [DOI] [PubMed] [Google Scholar]

PERMALINK

Multi-Task Sparse Canonical Correlation Analysis with Application to Multi-Modal Brain Imaging Genetics

Lei Du

Kefei Liu

Xiaohui Yao

Shannon L Risacher

Junwei Han

Andrew J Saykin

Lei Guo

Li Shen

Roles

Abstract

1. Introduction

2. Methodology

2.1. Background

2.2. MTSCCA

2.2.1. The MTSCCA Model

2.2.2. Group-sparsity and Joint Individual Feature selection for SNPs

2.2.3. Joint Individual Feature Selection Across Different Imaging Modalities

2.3. The Efficient Optimization Algorithm

2.3.1. Updating U

A fast implementation:

Fig. 1.

Fig. 2.

2.3.2. Updating vj

2.4. Convergence Analysis

3. Results

3.1. Benchmarks and Experimental Setup

3.2. Simulation Study

TABLE 1.

Fig. 3.

Fig. 4.

TABLE 2.

TABLE 3.

TABLE 4.

3.3. Real Neuroimaging Genetics Study

TABLE 5.

3.4. Improved Bi-multivariate Association

Fig. 5.

TABLE 6.

TABLE 7.

3.5. Genetic Marker Selection

TABLE 8.

3.6. Brain Imaging Marker Selection

Fig. 6.

TABLE 9.

4. Conclusion

Acknowledgments

Biography

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.3.2. Updating v_j