Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Apr 3.
Published in final edited form as: IEEE/ACM Trans Comput Biol Bioinform. 2023 Apr 3;20(2):1137–1146. doi: 10.1109/TCBB.2022.3172289

Joint Sparse Collaborative Regression on Imaging Genetics Study of Schizophrenia

Xueli Song a,*, Rongpeng Li a, Kaiming Wang a,*, Yuntong Bai b, Yuzhu Xiao a, Yu-ping Wang b
PMCID: PMC10321021  NIHMSID: NIHMS1888753  PMID: 35503837

Abstract

The imaging genetics approach generates large amount of high dimensional and multi-modal data, providing complementary information for comprehensive study of Schizophrenia, a complex mental disease. However, at the same time, the variety of these data in structures, resolutions, and formats makes their integrative study a forbidding task. In this paper, we propose a novel model called Joint Sparse Collaborative Regression (JSCoReg), which can extract class-specific features from different health conditions/disease classes. We first evaluate the performance of feature selection in terms of Receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) in the simulation experiment. We demonstrate that the JSCoReg model can achieve higher accuracy compared with similar models including Joint Sparse Collaborative Regression (JSCCA) and Sparse Collaborative Regression (SCoReg). We then applied the JSCoReg model to the analysis of schizophrenia dataset collected from the Mind Clinical Imaging Consortium. The JSCoReg enables us to better identify biomarkers associated with schizophrenia, which are verified to be both biologically and statistically significant.

Keywords: Collaborative Regression, Canonical Correlation Analysis, Feature selection, fMRI, SNP

1. Introduction

Schizophrenia (SZ) is a complex mental disorder potentially caused by the interaction between genetic factors and environmental influences [1]. Genetic variants are considered as possible risk factors of SZ, and a large number of studies have focused on exploring critical genes susceptible to the SZ [2, 3]. Besides genetic studies, brain imaging study has been largely used for the research of SZ to identify brain function disorders [4, 5]. Although unimodal (genetic or brain imaging) studies can discern SZ’s biomarkers from a certain point of view, it is difficult to capture potential links between genetic variants and brain imaging in the study of SZ[6, 7]. Multi-view study has many advantages over unimodal study for identifying meaningful biomarkers [8]. Therefore, it has become a significant and challenging problem to combine genetic and imaging data using multivariate approaches to explore biomarkers [9].

Despite being an emerging field, many advanced multivariate methods have been proposed and successfully used by the research community. Early studies based on Canonical Correlation Analysis (CCA) model performed pairwise analysis between genetic and brain imaging data. For example, Sparse Canonical Correlation Analysis (SCCA), a model penalized on conventional CCA by a sparse regularization term, has been widely used to extract signification features through sparse canonical variables [1012]. In order to incorporate the biological prior knowledge, Group SCCA was developed by Lin et al. [13] to exploit structural information, which can further enhance the feature selection. Considering that some prior knowledge is uncertain or unavailable, structured SCCA model was proposed by Du et al. [14] with absolute graph net regularizer, and the group effect was evaluated by an upper bound. Joint SCCA (JSCCA) model was built by Fang et al. [15] to analyze the differential dependency across multiple classes (e.g., different disease-status). In recent studies, Collaborative Regression (CoReg) model, a intuitive combination of regression and CCA, has attracted extensive attention. CoReg is a useful model for extracting features that are common to both genetic imaging data and phenotype, and it leads to promising results in the context of imaging genomics [16]. For example, Pascal et al. [17] intuitive combine sparse regression and CCA to extract the features that associated with phenotypes while displaying a significant level of co-expression. Based on the CoReg model, Bai et al. [18] identify biomarkers of SZ through integrating fMRI and Epigenetics.

As we all know, the genetic imaging data usually are collected from different classes (e.g., the SZ patients and healthy controls). And the conventional analysis methods usually select the same biomarkers from different classes. However, some diseases usually caused by the genetic variants, so that the biomarkers usually show the class-specific structure. For example, certain single nucleotide polymorphisms (SNPs) differ in their contribution to the genetic-modality latent variable across classes, whereas other SNPs show similar contribution across classes. Nevertheless, the class-specific biomarkers has been mostly overlooked in genetic imaging study, although it is useful to facilitate the understanding of SZ. In this paper, based on sparse regression and CCA model, we propose a Joint Sparse Collaborative Regression (JSCoReg) model to extract both shared and different features from different health conditions / disease classes. Specifically, the fused lasso penalty is applied to perform the fusion of different classes by encouraging canonical loadings from different class to share a similar structure. For imaging genetics data, our method can identify common brain regions and class-related genes. Fig. 1 is a schematic illustration of JSCoReg model.

Fig. 1.

Fig. 1.

An illustration of the JSCoReg model for feature selection from multimodal datasets.

The contribution of this paper is twofold. On one hand, we propose a JSCoReg model and derive the corresponding optimization algorithm for the solution. JSCoReg extracts the features associate with a given phenotypes while exhibiting class-specific patterns. A modified block coordinate descent algorithm is used to solve this model in three steps: fusion step, sparsity step, and normalization step. For validation, the simulation experiment is first conducted to evaluate the performance of JSCoReg. Then JSCoReg is applied to real Mind (Mental Illness and Neuroscience Discovery Institute) Clinical Imaging Consortium (MCIC) data [19], pinpointing the potential genes and brain regions related to schizophrenia, which can in turn inform of clinical decision making.

The rest of this paper is organized as follows. In Section 2, we review some relevant models and then propose our model followed by the optimization algorithm. The performance of our model is evaluated on both simulated and real MCIC data in Section 3 and Section 4, respectively. Followed by conclusions and discussions in Section 5.

2. The novel model

2.1. Sparse regression model

Assuming that we have a set of measurements X=x1;x2;;xnRn×p and the corresponding observations Y=y1,y2,ynT. We assume that the yi are conditionally independent given its measurement xi(i=1,,n). In order to extract a linear link between Y and X, we consider the following linear regression model

minωY-Xω22 (1)

where ωRp is the vector of objective regression coefficients. Phenotype Y represents class labels (that is, Y=y1,y2,ynT,yi-1,1,i=1,,n). It is noted that, although we only discussed the possibility of Y being a class label, other options exist as well. For instance, Y could be an influence factor on SZ (e.g., DNA methylation) or a quantitative outcome (e.g., clinical assessment score).

In practice, the number of variables (or features) is much larger than that of observations, i.e., np, which causes Eq. (1) to be underdetermined. A useful tool to perform this underdetermined problem is the least absolute shrinkage and selection operator (LASSO) regression model that proposed in [21]

minωY-Xω22+λω1

where λ is hyper-parameter, and the l1-norm induce the sparsity of the regression coefficient ω.

Lasso regression established the relationship between only a few features and phenotypes. And it is a remarkable model to perform variable selection. Next, we review the SCCA model for extracting the links between different modalities.

2.2. SCCA model and SCoReg model

We consider two datasets X=X1;X2;;Xn and Z=Z1;Z2;;Zn on the same n subjects, where X has p features and Z has q features. In previous studies on multimodal datasets, both CCA and penalized CCA have been used for feature-extraction and selection. CCA finds the most correlated canonical variables (the linear combination of features with canonical loadings). A form of CCA is expressed as follows:

argminw,vF(w,v)=Xw-Zv22,s.t.wTXTXw=1,vTZTZv=1,

where w and v are the canonical loadings to be determined. However, in many biomedical applications, a common problem is that the dimension of the data is much bigger than the sample size, which leads to overfitting problems or curse of dimensionality. To circumvent these issue, SCCA selects the key or significant features by imposing the l1-penalty on the canonical loadings. Specifically, SCCA model can be described as

argminw,vF(w,v)=Xw-Zv22+λ1w1+λ2v1,s.t.wTXTXw=1,vTZTZv=1,

where λ1 and λ2 are hyper-parameters to tune the weight of each regularization term. While SCCA is a useful tool to find the information shared by both X and Z, it overlooks the relationship between the extracted features and phenotypes. To address these issue, SCoReg combines both regression and SCCA into a unified framework with the l1-penalty to choose a subset of co-expressed discriminative features. Given phenotype Y, SCoReg[16] can be expressed as follows:

argminw,vF(w,v)=Xw-Zv22+Y-Xw22+Y-Zv22+λ1w1+λ2v1,s.t.wTXTXw=1,vTZTZv=1.

In this SCoReg model, the extracted features are assumed to come from the same class or group (e.g., healthy controls and schizophrenia patients). However, in real imaging genetics studies, these different classes may contain common features and/or ones specific to each class. To this end, we propose a novel model, namely JSCoReg, to discover both the class-specific and class-shared information, which is detailed in the following.

2.3. Joint Sparse Collaborative Regression Model (JSCoReg)

As discussed in the previous sections, numerous methods have been proposed to select the features that are common to different measurements while are associated with phenotype. In this section, we consider the scenario where exist the class-specific features in a certain measurement. We aim to propose a new model to select class-specific features associated with their phenotypes.

Dividing the data into K classes, the two sets of data can be denoted as X=X1;X2;;XK, and Z=Z1;Z2;;ZK, where XkRnk×p, ZkRnk×q (k=1,2,,K), and nk is the number of subjects belonging to k-th class. JSCoReg estimates the features’ coefficients w and vk by minimizing the following objective function:

Fw,vk=YXw22+k=1KYkZkvk22+k=1KXkwZkvk22+λ1w1+k=1Kλ2vk1+k<kavkvk1,s.t.wTXkTXkw=1,vkTZkTZkvk=1. (2)

where the columns of Zk and X are standardized separately to have zero means and unit variances; wRp×1 and vkRq×1(k=1,2,,K) are the canonical loadings of X and Zk respectively; a, λ1, and λ2 are hyper-parameters to tune the weight of each regularization term.

The l1-penalties on w and each vk are equipped to control their sparsity, and the fused lasso penalty on canonical loadings vk(k=1,2,,K) encourages them to both share a similar structure and have the class-specific patterns. The hyper-parameter a plays an important role in adjusting the degree of fusion: if a=0, there is no fusion across the canonical loadings and our model reduces to SCoReg for each single class, and we define this case as SCoReg(Single); if a, canonical loadings vks(k=1,2,,K) are required to be identical to each other and our model reduces to SCoReg where all extracted features belong the same class, and we define this case as SCoReg (Combined).

2.4. Optimization Algorithm

By expanding the l2-penalty in (2) and then removing the constant terms in the expansion, the optimization problem is transformed into the following form:

minω,vkk=1KwXkTZkvkk=1KYkTZkvkYTXw+λ1w1+k=1Kλ2vk1+k<kavkvk1,s.t.wTXkTXkw=1,vkTZkTZkvk=1,wTXTXw=1. (3)

It has been proven that in high-dimensional cases, treating covariance matrices (e.g.,XTX, XkTXk and ZkTZk) as diagonal can lead to good results [22, 23]. Inspired by the work of [11], we substitute the covariance matrices (e.g., XTX, XkTXk and ZkTZk) with identity matrices to ensure the validity of the model and simplify the optimization problem (3), and obtain the following “diagonal penalized CCA” form:

minω,vkk=1KwTXkTZkvkk=1KYkTZkvkYTXw+λ1w1+k=1Kλ2vk1+k<kavkvk1,s.t.w22=1,vkF2=1. (4)

Note that YkTZk=0 as the columns of Xk and Zk are standardized with zero mean and unit variance. Thus, we rewrite the optimization problem into the following form:

minw,vkk=1KwTXkTZkvkYTXw+λ1w1+k=1Kλ2vk1+k<kavkvk1,s.t.w22=1,vkF2=1. (5)

Especially, considering the scenario where K=2, we code the phenotype of two classes (n1 and n2 observations, respectively) as cn1 and -cn2 respectively, where c is a non-negative parameter [20]. Notice that X, Y, w, n1, and n2 satisfy

YTXw=cXclass1¯-Xclass2¯w=cX˜w (6)

where Xclassi¯=xclassi1¯,xclassi2¯,,xclassip¯Rp(i=1,2) is the mean vector of the observations in X that belong to class i, and X˜=Xclass1¯-Xclass2¯=x˜1,x˜1,,x˜1Rp(i=1,2).

Therefore we can rewrite (5) as

minw,vkk=1KwTXkTZkvkcX˜w+λ1w1+k=1Kλ2vk1+k<kavkvk1,s.t.w22=1,vkF2=1. (7)

In (7), the parameter c has a great impact on large x˜i(|| is the absolute value, i=1,2,,p) in X˜. That is, when c increases, the elements of w corresponding to the larger x˜i will be selected with a priority. Thus, finding an appropriate c is critical for feature selection. A special case is that, when c=0, this model reduces to a regular JSCCA. When there are only two classes in the data, namely K=2, we will work on model (7), which is a more flexible and feasible form of (5).

To summarize, our JSCoReg model can be realized by solving the optimization problem (5) including its special case (7). The objective function (5) is biconvex, that is, it is convex with respect to the decision variable w with given vk and vice versa. Thus, the block coordinate descent is applied here and the iteration procedure mainly contains the following two steps:

minw2=1(k=1KvkZkTXk+YTX)w+λ1w1(YTX=cX˜,whenK=2), (8)
minVF=1k=1KwTXkTZkvk+k=1Kλ2vk1+k<kavkvk1, (9)

where V=v1,v2,vKT.

In detail, when V=v1,v2,vKT is fixed, a soft-thresholding operator as shown in Lemma 1 is applied to solve (8) to obtain w.

Lemma 1[11]

Consider the following optimization problem

minu-uTas.t.u221,u1λ.

The solution satisfies u=s(a,λ)s(a,λ)2, where λ is a positive constant, aRn×1, s(a,λ)=s1,s2,,snT, and si=sgnaimaxai-λ,0(i=1,2,,n).

When w is fixed, as shown in Lemma 2, V can be derived by referring to the method from Section 2.3 in [15].

Lemma 2[11, 15]

The solution of

minVF=1k=1KRkvk+k=1Kλ2vk1+k<kavkvk1, (10)

is given by Vˆ/VˆF, and Vˆ is the optimum of

minVk=1KvkRkT22+k=1Kλ2vk1+k<kavkvk1, (11)

where Rk=wTXkTZk.

Problem (11) is a special case of the fused lasso model [24], and can be solved in three steps [25, 26]. Firstly, V is obtained by a fusion step, which fuses the features without significant difference (depending on a). The i-th element of variable v (corresponding to the i-th feature of Z) will be fused between the kth and the kth classes if vki-vkia. Secondly, Vˆ=vˆ1,vˆ2,,vˆK is derived through the soft-thresholding method; that is, vˆi=Svi,λ2(i=1,2,,K). Finally, V is obtained by a normalization step, Vˆ/VˆF. The detailed algorithm to solve JSCoReg is given in Algorithm 1.

Algorithm 1.

Iterative Algorithm for JSCoReg

Input: Standardize data XRn×p,XkRnk×p, ZkRnk×q,YRn×1k=1,2,K.
Output: Canonical loadings w and vk(k=1,2,K)
  1: Initialize wRp×1,vkRq×1,k=1,2,K
  2: Repeat
  3: w¯=k=1KvkTZkTXk+XTY
 (XTY=cX˜T,whenK=2)
  4: wˆ=Sw,λ1
  5: w=wˆ/wˆ2
  6: v¯1,v¯2,,v¯K=argminνk=1KvkwTXkZkT22+k<kavkvk1
  7: for k=1K
  8: vˆk=Svk,λ2
  9: end for
10: V=vˆ1,vˆ2,,vˆK/vˆ1,vˆ2,,vˆKF
11: until Convergence

The step3 and step6 is the core of the Algorithm 1. The main method to solve step3 and step 6 have been presented in Lemma 1 and Lemma 2, respectively.

2.5. Parameters selection

There are four hyper-parameters λ1, λ2, a and c (when K=2) in the proposed model. λ1 and λ2 tune the sparsity degree of canonical loadings, a controls the similarity between different genomic canonical loadings, and c is the weight of phenotype(when K=2).

Although cross-validation is a widely used method in hyper-parameters’ selecting, our practical data may have a small sample size and is not suitable [27, 28]. As an alternative, we adopt a two-step stability selection method. In the first step, we use the sparsity level of the solution to guide the selection of λ1 and λ2 [29]. For large λ1 and λ2, the term λ1w1+k=1Kλ2vk1 will dominate the objective function and be penalized severely during the optimization process. For small λ1 and λ2, the term λ1w1+k=1Kλ2vk1 will play a subordinate role, and the solutions may not be sparse. In [15], Fang et al. tuned λ1 and λ2 based on the sample size n, yielding a much less sensitive searching procedure. In particular, we set λ1 based on kw, the number of nonzero elements in w. In each iteration, we adjust λ1 through the following formulation:

λ1|w|kw+1,|w|kw,

where |w|kw is the kw-th largest element of w in absolute value. Meanwhile, kv is used to determine λ2 by using the same scheme.

In the second step, by fixing λ1 and λ2, we employ Bootstrapping method to determine a and c. Parameter a adjusts the fusion degree between multiple pairs of canonical loadings to encourage them to present significant class-specific structure. The fusion is realized by l1 norm penalty on the prediction-error parameter between vk and vk (i.e., avk-vk1). By the l1-norm penalty, the common and highly similar features between different classes are fused into an identical pattern, and the remaining features are the significant class-specific pattern. When a=0, there is no fusion across the canonical loadings, and our model reduces to SCoReg which is for a single class. When a increases from 0 to , more features with less similarity are gradually fused into a same pattern. When a=vk and vk are completely fused into an identical pattern, and our model reduces to SCoReg in which all extracted features belong to a same class.

Given a data set S with n observations, a new data set S is generated using Bootstrapping sampling, and then S is treated as the training set and Sc is the testing set (Sc, which contains the observations belonging to S but not presented in S). Repeating Bootstrapping sampling for T times, we obtain T pairs of training and testing sets. Afterwards, we fix λ1 and λ2, and a and c are selected from a candidate set [10−2, 10−1, 100, 101, 102] based on the following optimization problem

argmina,cΔcorr=1Tt=1Tcorrtraintcorrtestt, (12)

where corrtraint is the Pearson correlation between the training sets of the two modalities and corrtestt is that of the testing sets. At the same time, the canonical loadings wt and vkt consistent with the optimal parameters are collected. We calculate the probability of having voxels and SNPs by the following equation:

pwi=1TtIwit>0, (13)

where I is the indicator function. Then we select the important (a high probability of occurrence) voxels by

Swi=i:pwi>c1, (14)

where c1 is a given threshold. Similarly, we identify the important SNPs with a given threshold c2.

3. Simulation

3.1. Simulation Setup

To evaluate the performance of JSCoReg model, we generate a dataset with K classes. Each class consists of nk(k=1,2,,K) samples containing fMRI data X with p voxels and SNP data Z with q SNPs. A latent method similar to the one used in [11,13] is applied here to simulate the correlation between X and Z.

The dataset generation process consists of the following three steps. The first step constructs the latent variables. A latent variable hk~N(0,δ)hkRnk×1,k=1,2,K) with normal distribution puts similar effect on the associated X and Z (here nk is the number of observations in the k-th class). Then αR1×p and βkR1×q are set as the canonical loadings of X and Z respectively. There are p1 non-zero entries in α and qk non-zero entries in βk. Dataset X and Z are generated using hkα and hkβk(k=1,2,,K). Finally, Gaussian noise is introduced for each dataset, and we set cknk (±cnk, when K=2) as phenotype for the k–th class.

In this simulation, we consider the data belonging to two classes. For each class, we generate 100 observations nk=100,k=1,2, and each of observation has 500 features.

3.2. Simulation Results

To assess the performance of the proposed JSCoReg model, we calculated the true positive rate (TPR) and false positive rate (FPR), and compare the performance of our model (JSCoReg) with JSCCA, SCoReg(Combined), and SCoReg(Single). Fig. 2 displays the TPR against FPR on the selected features (x axis corresponds to the FPR values and y axis corresponds to the TPR values), and the area under the ROC curve (AUC) is presented in Table 1. Fig. 2(a) displays the ROC of the selected voxels. The ROC curve and AUC indicate that the JSCoReg model outperforms the JSCCA, SCoReg(Combined), and SCoReg(Single) model on the detection of canonical voxels. The reason is that the phenotypic variables effectively improves the accuracy of feature selection. Fig. 2(b)(c) display the ROC of the selected case and healthy canonical SNPs, respectively. The ROC curve of our model is slightly higher than that of other models, and the value of AUC indicates that our model selects SNPs with higher accuracy.

Fig. 2.

Fig. 2.

A comparison of JSCoReg, JSCCA, SCoReg in selecting the features. The x axis corresponds to FPR value, and the y axis is the TPR value. The closer the curve to the top left, the better. (a) The ROC curve on the detection of canonical voxels. (b) The ROC curves on the detection of canonical SNPs from case class. (c) The ROC curves on the detection of canonical SNPs from healthy class.

Table 1.

A comparison of AUC of JSCoReg, JSCCA, SCoReg in selecting the features.

Method AUC
voxels SNPs (case/healthy class)

JSCoReg 0.9915 0.9541/0.9543
JSCCA 0.9445 0.9133/0.9422
SCoReg(Combined) 0.9880 0.8753/0.9079
SCoReg(Single) 0.9907 0.9091/0.9122

By comparing the AUC of using extracted features, we can clearly draw a conclusion that our method has successfully identified significant features.

Then, we test the robustness of the proposed JSCoReg model under different noise levels. Normal distribution random noise is added to standardized dataset as

Dˆ=1+σμD, (15)

where Dˆ is the noisy dataset, D is the noise-free dataset, σ is the noise level, and μ follows a standard normal distribution vector with zero mean and unit standard deviation. In simulation studies, two kinds of noise level (σ=10% and σ=20%) are considered, and the experimental results are shown in Fig. 3 and Table 2. The experimental results illustrate that our method has good stability under two different noise level.

Fig. 3.

Fig. 3.

A comparison of the precision under different noise levels. (a) ROC curves on the detection of canonical voxels. (b) ROC curves on the detection of canonical SNPs from case class. (c) ROC curves on the detection of canonical SNPs from healthy class.

Table 2.

A comparison of AUC under different noise levels.

Noise level AUC
voxels SNPs (case/healthy class)

noise-free 0.9915 0.9541/0.9543
noise1 0.9268 8983/0.9283
noise2 0.8775 0.8862/0.9035

Finally, we evaluate the parameter sensitivity of JSCoReg, and the results are presented in Fig. 4. In Fig. 4(a), parameters λ1, λ2 and a are preset at their optimal values, and the influence of parameter c is evaluated for the feature selection on synthetic dataset X. It is obvious that c=30 and c=40 outperform c=0 and c=150. As the value of c increases from 0 to 40, the TPR increases synchronously. This indicates the importance of linking phenotype on improving the accuracy of feature selection. However, when c=150, the precision drops, due to the inappropriate incorporation of phenotype data. In short, Fig. 4 (a) indicates that the right use of phenotype is critical for feature selection.

Fig. 4.

Fig. 4.

A comparison of the precision under different parameter value. (a) ROC curves on the detection of canonical voxels with different parameter c. (b) ROC curves on the detection of SNPs from case class. (c) ROC curves on the detection of SNPs from healthy class.

Given the optimal values of parameters λ1, λ2 and c, we evaluate the influence of a on the synthetic dataset X. The results are presented in Fig. 4(b) and Fig. 4(c). It is obvious that the results of using a=10 and a=20 outperform a=0 and a=1000. This is due to the fact that JSCoReg has no fusion effect on the canonical loadings of different classes when a=0, and when a=1000 JSCoReg model degenerates into a SCoReg model for one class. This indicates that the combination of fused lasso and l1-norm has the ability to select class-related features.

4. Application to Schizophrenia Data Set

4.1. Data preparation and preprocessing

After validation via simulation, we apply JSCoReg to real data to find biomarkers associated with SZ. We test on the data from the Mind (Mental Illness and Neuroscience Discovery Institute) Clinical Imaging Consortium (MCIC) [19], which includes fMRI (containing information of 41,236 voxels) and SNP (including 777,635 SNPs) data of 184 subjects. Among these participants, 80 are SZ patients (34 ± 11 years old, 20 females) diagnosed based on the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) and the other 104 are healthy controls (32 ± 11 years old, 38 females). In this study, the phenotype Y indicates case or controls samples. (Y=y1,y2,ynT,yi-1,1,i=1,,n)

The fMRI data were collected while subjects were performing a sensor motion task. The images were acquired using a Siemens3T Trio Scanner and 1.5T Sonata with echo-planar imaging (EPI) with parameters (TR = 2000ms, TE = 30 ms (3.0T)/40 ms(1.5T), field of view = 22cm, slice thickness = 4mm, 1mm skip, 27 slices, acquisition matrix = 64 × 64, flip angle = 90°). The data were pre-processed using SPM5 software, and then were realignment, spatial normalization, and smoothing. Next, data were analyzed by multiple regression considering the stimulus and their temporal derivatives plus an intercept term as repressors. After these steps, the dimension of fMRI data is 53*63*46. Then we select 41236 voxels from 116 brain regions based on the automated anatomical labeling (AAL) [30] brain atlas.

The SNPs data was obtained from each subject’s blood sample. Genotyping for all participants was performed at the Mind Research Network using the Illumina Infinium HumanOmni1-Quad assay covering 1,140,419 SNP loci. Then, the PLINK software package (http://pngu.mgh.harvard.edu/purcell/plink) was used to perform a series of standard quality control procedures, resulting in the final dataset spanning 777,635 SNP loci. Each SNP was categorized into three clusters and Genotypes BB (non-minor allele), AB(one minor allele), and AA(two minor alleles) were coded as 0, 1, and 2, respectively [43].

4.2. Results

We apply JSCoReg model to study fMRI and SNP datasets. The parameters selection method shown in “parameters selection” section was used to determine optimal parameters. Parameters kw and kv are set to 100, and T is set to 200. The formulation (13) and (14) are used to select SNPs and voxels, and parameters c1 and c1 was preset as 0.2 and 0.15, respectively. We use AAL template to divide the brain into 116 Regions of Interests (ROIs), and present the identified ROIs and genes in Table 3 and Table 4.

Table 3.

Brain regions identified by JSCoReg

Model Roi ID Number of Voxel Brain region

JSCoReg 77 3 Thalamus-L
78 35 Thalamus-R
91 12 Cerebellum-Crus1-L
93 6 Cerebellum-Crus2-L
94 4 Cerebellum-Crus1-R
99 3 Cerebellum-6-L
112 4 Vermis-6
113 13 Vermis-7
114 16 Vermis-8
115 15 Vermis-9

Table 4.

Genes Identified from the SNP data

Number of Snp Gene ID

34 CNOT4 TPD52 TNFSF18 ADAMTS5 DNM3 AKAP6
DCLK2 MGST1 LOC644578 DOCK5 BAIAP2L1 HNF1B
C8orf84 HLA-DPB2 KSRI LHX2 LPL FAM19A5
MARCH1 FER FAM136A CDH2 NEURL1B BSX
LOC642340 LOC100289521 MYADML LOC643563 DCC MYO18B
LOC100286951 PRPS1L1

Next, we use the significance test method to calculate the P-value of identified ROIs and genes. Assume F=f1,f2,,fm1 and S=s1,s2,sm2 represent the selected voxels and SNPs using JSCoReg. The square mean correlation between these two types of datasets can be expressed as follows:

ρ*=1m1m2g=1m1h=1m2ρfg,sh2 (16)

For a given data set S, we randomly change the order of the row vectors of the S, and repeat the process Δ times. For each permutation, ρ* is used as the null hypothesis of square correlation, and ρ is the new square correlation coefficient calculated by Eq. 14. The significance of the P-value can be estimated by

Pvalue=1ΔtIρt¯ρ* (17)

Finally, the P-value of the case and control group were calculated as 0.0083 and 0.0062, respectively. Therefore, we consider the selected voxels and SNPs are significant.

As shown in Table 3, 111 voxels in 10 ROIs are identified by JSCoReg. ROIs which have been reported to be associated with SZ are presented in bold fonts. For example, Thalamus (right) has been reported to play a crucial role in harmonizing the pass of information among brain regions [31, 32]. Many studies have shown the relationship between SZ and dysfunction of Thalamus [33, 34]. ROIs 91, 93, 94, and 99 are located at the declive of the Cerebellum, and the Culmen is related with many genes that have been reported to be related to SZ [35, 36]. The Cerebellum has been found associated with SZ by other independent studies [37].

As shown in Table 4. JSCoReg selects 34 SNPs located within 32 genes. Genes that have been reported in other studies to be relevant to SZ are shown in bold fonts. The gene DCC (Deleted in Colorectal Carcinoma, Ensemble id: ENSG00000187323) is connected to behavioral abnormalities, and it is demanded for mesocorticolimbic dopamine development (e.g., schizophrenia) [38, 39]. BSX and LPL are found related to the risk of developing SZ. The LPL gene is expressed in the brain regions related to cognitive functions and has been reported to be related to SZ [40]. BSX is expressed in the embryonic period and is a pivotal gene for brain structure development in the early phase [41].

To validate the correlation between identified voxels and case SNPs, we compute the Pearson correlation between the selected voxels and case SNPs by the JSCoReg model. The results are visualized in Fig. 5 for both SZ patients (left) and health controls (right) (the same y-axis is shown in both heat maps of Fig. 5). It is clear that the heatmap calculated from SZ patients has much higher absolute values compared to the one calculated from health controls, demonstrating that these class-related SNPs have a strong correlation with the obtained brain regions. It is worth noting that two heatmaps are generated using the SNPs selected from the case class, which indicates that JSCoReg model is an effective tool for class-specific feature selection.

Fig. 5.

Fig. 5.

The correlation heatmaps between selected voxels and SNPs. Left: the SZ patients; Right: the health controls. Y-axis displays the corresponding brain regions defined by AAL template; and X-axis displays the corresponding gene names. It is clear that the heatmap calculated from health controls has much higher absolute values compared to the one calculated from SZ patients.

Moreover, we use ConsensusPathDB [42] to test the biological significance of the selected risk SNPs. The Gene Ontology terms related to neural and brain activity enriched with P-value less than 0.01 are summarized in table 5. In table 5, ‘brain development’ and ‘cerebral cortex development’ are related to the brain activity, and ‘brain development’ has been pointed out in [15] to be related to brain disorders. ‘Neuron development’, ‘neuron to neuron synapse’ and ‘central nervous system development’ are related to neural activity. Through the above analysis, the biological significance of the risk genes selected by the JSCoReg model from SNPs has been distinctly demonstrated.

Table 5.

The enriched gene ontology terms that are related to the neural activity

term name p-value

brain development 0.00983
cerebral cortex development 0.0081
neuron development 0.00815
neuron to neuron synapse 0.00756
central nervous system development 0.00495

4.3. Comparison with JSCCA and Group SCCA

JSCoReg, JSCCA, and Group SCCA are effective multivariate modeling tools in imaging genetics, and they provide data modeling ideas from different perspectives. By comparison, JSCoReg is a more generalized model. For example, JSCCA can be regarded as a special case of JSCoReg (phenotype is zero vector), and JSCoReg can be transformed into the JSCCA or SCoReg model by adjusting parameters. In practical application, according to the available phenotype and class difference information, it can be transformed to satisfy the actual application scenario and improve the accuracy of feature selection.

In the simulation experiment, we have verified the advantages of our model over the JSCCA and SCoReg in feature selection. More complex problems will be encountered in real application. As show in Fig. 6, we employed Venn diagram to display the results of these three methods. From the results of brain region selection, 18 brain regions were selected in group SCCA model, 4 brain regions were selected in the JSCCA model, and 10 brain regions were selected in our model. The bilateral Thalamus and Cerebellum related brain regions are selected in both our model and the Group SCCA model. From the results of gene selection, 25 genes were selected in the Group SCCA model, 14 genes in the Joint SCCA model, and 32 genes in our model. DCC genes were selected in both our model and the JSCCA model. Some of the selected biomarkers in our model have been reported by other similar studies. All these results show the reliability the advantage of our method.

Fig. 6.

Fig. 6.

The Venn diagram to compare the identified biomarkers. (a) The brain regions identified by JSCoReg and Group SCCA. (b) The genes identified by JSCoReg and JSCCA.

5. Conclusion and discussion

In this paper, we have proposed a novel JSCoReg model with a modified block coordinate descent algorithm for its solution. It has improved both JSCCA and SCoReg models by considering both shared and class-specific features among multiple classes (e.g., health and case groups). The effectiveness and reliability of our model have been illustrated in simulation in terms of False Positive Rate and Receiver operating characteristic curve. The result has shown that the JSCoReg model can achieve higher accuracy compared with JSCCA and SCoReg. For the real schizophrenia dataset, JSCoReg model can identify more significant biological and statistical biomarkers.

In regards to biomarker recognition, multi-view learning has many advantages over single view learning. Inspired by the CoReg model and JSCCA model, this work proposed a JSCoReg model to extract and select features (in this case, brain regions and SNPs) associate with a given phenotype while displaying class-specific information to facilitate the understanding of SZ. At the same time, many potential problems still need further study in application. On the one hand, a definite and robust parameter selection for the JSCoReg model remains an open problem. It is necessary to propose a robust parameter-selection-method to provide a optimal parameter set. We will continue to work on improving this. On the other hand, although the model is valid to extract features associate with a given phenotype, with the increase of model complexity, the possibility of overfitting also increases greatly. In future work, we will pursue a more concise penalty term or other sparse optimization models to extract class-specific information.

Acknowledgment

This work is partly supported by the Special Fund for Basic Scientific Research of Central Colleges in Chang’an University(310812163504, 300102129202 and 310812171010), NIH (R01GM109068, R01MH104680, R01MH107354, R01AR059781, R01EB006841, R01EB005846, R01MH103220, R01MH116782, P20GM103472) and NSF (1539067).

References

  • [1].Harrison PJ and Weinberger DR, “Schizophrenia genes, gene expression, and neuropathology: on the matter of their convergence,” Molecular Psychiatry, vol. 10, no. 1, pp. 40–68, 2005. [DOI] [PubMed] [Google Scholar]
  • [2].Abecasis GR, et al. , “Genomewide scan in families with schizophrenia from the founder population of Afrikaners reveals evidence for linkage and uniparental disomy on chromosome,” American Journal of Human Genetics, vol. 74, no. 3, pp. 403–417, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Sutrala SR, et al. , “Gene copy number variation in schizophrenia,” Schizophrenia Research, vol. 96, no. 1–3, pp. 93–99, 2007. [DOI] [PubMed] [Google Scholar]
  • [4].Li X, et al. , “FMRI study of language activation in schizophrenia schizoaffective disorder and in individuals genetically at high risk,” Schizophrenia Research, vol. 96, no. 1–3, pp. 4–24, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Szycik GR, et al. , “Audiovisual integration of speech is disturbed in schizophrenia: an fMRI study,” Schizophrenia Research, vol. 110, no. 1–3, pp. 111–118, 2009. [DOI] [PubMed] [Google Scholar]
  • [6].Gur RE, et al. , “An fmri study of facial emotion processing in patients with schizophrenia,” American Journal of Psychiatry, vol. 159, no. 12, pp. 1992–1999, 2002. [DOI] [PubMed] [Google Scholar]
  • [7].Liu J, et al. , “Methylation patterns in whole blood correlate with symptoms in schizophrenia patients,” Schizophrenia bulletin, vol. 40, no. 4, pp. 769–776, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Thompson PM, et al. , “Imaging genomics,” Current opinion in neurology, vol. 23, no. 4, pp. 368–373, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Liu JY and Calhoun VD, “A review of multivariate analyses in imaging genetics,” Frontiers in Neuroinformatics, vol. 8, no. 29, pp. 1–11, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Parkhomenko E, et al. , “Sparse canonical correlation analysis with application to genomic data integration,” Statistical Applications in Genetics and Molecular Biology, vol. 8, no. 1, pp. 1–34, 2009. [DOI] [PubMed] [Google Scholar]
  • [11].Witten DM, et al. , “A penalized matrix decomposition with applications to sparse principal components and canonical correlation analysis,” Biostatistics, vol. 10, no. 3, pp. 515–534, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Liu JY, and Calhoun VD, “A review of soft multivariate analyse in imaging genetics,” Frontiers in Neuroinformatics, vol. 8, no. 29, pp. 1–11, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Lin DD, et al. , “Correspondence between fMRI and SNP data by group sparse canonical correlation analysis,” Medical Image Analysis, vol. 18, no. 6, pp. 891–902, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Du L, et al. , “Structured sparse canonical correlation analysis for brain imaging genetics: an improved GraphNet method,” Bioinformatics, vol. 32, no. 10, pp. 1544–1551, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Fang J, et al. , “Joint sparse canonical correlation analysis for detecting differential imaging genetics modules,” Bioinformatics, vol. 32, no. 22, pp. 3480–3488, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Gross SM and Tibshirani R, “Collaborative regression,” Biostatistics, vol. 16, no. 2, pp. 326–338, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Zille P, et al. , “Enforcing co-expression within a brain-imaging genomics regression framework,” IEEE Transactions on Medical Imaging, vol. 37, no. 12, pp. 2561–2571, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [18].Bai YT, et al. , “Biomarker Identification Through Integrating fMRI and Epigenetics,” IEEE Transactions on Biomedical Engineering, vol. 67, no. 4, pp. 1186–1196, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].Gollub RL, et al. , “The MCIC Collection: A Shared Repository of Multi-Modal, Multi-Site Brain Image Data from a Clinical Investigation of Schizophrenia,” Neuroinformatics, vol. 11, no. 3, pp. 367–388, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Witten DM and Tibshirani RJ, “Extensions of sparse canonical correlation analysis with application to genomic data,” Statistical Applications in Genetics and Molecular Biology, vol. 8, no. 1, pp. 1–27, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Tibshirani R, “Regression shrinkage and selection via the Lasso,”Journal Of The Royal Statistical Society Series B-statistical Methodology, vol. 58, pp. 267–288, 1996. [Google Scholar]
  • [22].Tibshirani R, et al. , “Class prediction by nearest shrunken centroids with applications to DNA microarrays,” Statistical Science, vol. 18, no. 1, pp. 104–117, 2003. [Google Scholar]
  • [23].Dudoit S, et al. , “Comparison of discrimination methods for the classification of tumors using gene expression data,” Journal of the American Statistical Association, vol. 97, no. 457, pp. 77–87, 2001. [Google Scholar]
  • [24].Hoefling H, “A path algorithm for the fused lasso signal approximator,” Journal of Computational and Graphical Statistics, vol. 19, no. 4, pp. 984–1006, 2010. [Google Scholar]
  • [25].Hocking T, et al. , “Clusterpath: an algorithm for clustering using convex fusion penalties,” In Proceedings of the 28th. International Conference on Machine Learning (ICML-11). Omnopress, 2011. [Google Scholar]
  • [26].Danaher P, et al. , “The joint graphical lasso for inverse covariance estimation across multiple classes,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 76, no. 2, pp. 373–397, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Wang Z, et al. , “Network-guided regression for detecting associations between DNA methylation and gene expression,” Bioinformatics, vol. 30, no. 19, pp. 2693–2701, 2014. [DOI] [PubMed] [Google Scholar]
  • [28].Grellmann C, et al. , “Comparison of variants of canonical correlation analysis and partial least squares for combined analysis of MRI and genetic data,” NeuroImage, vol. 107, no. 1, pp. 289–310, 2015. [DOI] [PubMed] [Google Scholar]
  • [29].Xu Z, et al. , “L-1/2 Regularization: A thresholding representation theory and a fast solver,” IEEE Transaction on Neural and Learning Systems, vol. 23, no. 7, pp. 1013–1027, 2012. [DOI] [PubMed] [Google Scholar]
  • [30].Yan CG and Zang YF, “DPARSF: a matlab toolbox for “Pipeline” data analysis of resting-state fMRI,” Front. Syst. Neurosci, vol. 3, no. 13, pp. 1–13, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Kiehl KA and Liddle PF, “An event-related functional magnetic resonance imaging study of an auditory oddball task in schizophrenia,” Schizophrenia Research, vol. 48, no. 2, pp. 159–171, 2001. [DOI] [PubMed] [Google Scholar]
  • [32].Clinton SM and Meador-Woodruff JH, “Thalamic dysfunction in schizophrenia: neurochemical, neuropathological, and in vivo imaging abnormalities,” Schizophrenia Research, vol. 69, no. 2, pp. 237–253, 2004. [DOI] [PubMed] [Google Scholar]
  • [33].Sui J, et al. , “Discriminating schizophrenia and bipolar disorderby fusing fMRI and DTI in a multimodal CCA+ joint ICA model,” Neuroimage, vol. 57, no. 3, pp. 839–855, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Honey GD, et al. , “Functional dysconnectivity in schizophrenia associated with attentional modulation of motor function,” Brain, vol. 128, no. 11, pp. 2597–2611, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Cao H, et al. , “Integrating fMRI and SNP data for biomarker identification for schizophrenia with a sparse representation based variable selection method,” BMC Medical Genomics, vol. 6, no. (3, supplement), pp. S2, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Kim DI, et al. , “Auditory oddball deficits in schizophrenia: an independent component analysis of the fMRI multisite function BIRN study,” Schizophrenia Bulletin, vol. 35, no. 1, pp. 67–81, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Hu L, et al. , “Role of the cerebellum in schizophrenia: Current status and progress of structural magnetic resonance imaging,” Chinese Journal of Medical Imaging Technology, vol. 26, no. 11, pp. 2202–2204, 2010. [Google Scholar]
  • [38].Grant A, et al. , “Association between schizophrenia and genetic variation in DCC: a case control study,” Schizophrenia research, vol. 137, no. 1–3, pp. 26–31, 2012. [DOI] [PubMed] [Google Scholar]
  • [39].Hu WX, et al. , “Adaptive sparse multiple canonical correlation analysis with application to imaging (epi)genomics study of schizophrenia,” IEEE Transactions on Biomedical Engineering, vol. 65, no. 2, pp. 390–399, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [40].Xie C, et al. , “Association between schizophrenia and single nucleotide polymorphisms in lipoprotein lipase gene in a Han Chinese population,” Psychiatric genetics, vol. 21, no. 6, pp. 307–314, 2011. [DOI] [PubMed] [Google Scholar]
  • [41].Niculescu HL, et al. , “Towards understanding the schizophrenia code: an expanded convergent functional genomics approach,” American Journal of Medical Genetics Part B: Neuropsychiatric Genetics, vol. 144, no. 2, pp. 129–158, 2007. [DOI] [PubMed] [Google Scholar]
  • [42].Kamburov A, et al. , “The ConsensusPathDB interaction database: 2013 update,” Nucleic acids research, vol. 41, no. 1, pp. 793–800, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [43].Alam MA, et al. , “Identifying outliers using multiple kernel canonical correlation analysis with application to imaging genetics,” Computational Stats E Data Analysis, vol. 125, pp. 70–85, 2018. [Google Scholar]

RESOURCES