Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 May 1.
Published in final edited form as: IEEE/ACM Trans Comput Biol Bioinform. 2021 Aug 6;18(4):1350–1360. doi: 10.1109/TCBB.2019.2950904

A Latent Gaussian Copula Model for Mixed Data Analysis in Brain Imaging Genetics

Aiying Zhang a, Jian Fang a, Wenxing Hu a, Vince D Calhoun b, Yu-Ping Wang a
PMCID: PMC7756188  NIHMSID: NIHMS1654689  PMID: 31689199

Abstract

Recent advances in imaging genetics make it possible to combine different types of data including medical images like functional magnetic resonance imaging (fMRI) and genetic data like single nucleotide polymorphisms (SNPs) for comprehensive diagnosis of mental disorders. Understanding complex interactions among these heterogeneous data may give rise to a new perspective, while at the same time demand statistical models for their integration. Various graphical models have been proposed for the study of interaction or association networks with continuous, binary, and count data as well as the mixture of them. However, limited efforts have been made for the multinomial case, for instance, SNP data. Our goal is therefore to fill the void by developing a graphical model for the integration of fMRI image and SNP data, which can provide deeper understanding of the unknown neurogenetic mechanism. In this paper, we propose a latent Gaussian copula model for mixed data containing multinomial components. We assume that the discrete variable is obtained by discretizing a latent (unobserved) continuous variable and then create a semi-rank based estimator of the graph structure. The simulation results demonstrate that the proposed latent correlation has more steady and accurate performance than several existing methods in detecting graph structure. When applying to a real schizophrenia data consisting of SNP array and fMRI image collected by the Mind Clinical Imaging Consortium (MCIC), the proposed method reveals a set of distinct SNP-brain associations, which are verified to be biologically significant. The proposed model is statistically promising in handling mixed types of data including multinomial components, which can find widespread applications. To promote reproducible research, the R code is available at https://github.com/Aiying0512/LGCM.

Index Terms—: Schizophrenia (SZ), SNP, fMRI, Imaging genetics, Latent Gaussian copula model, Mixed data

I. Introduction

With the rapid development of biomedical techniques, it is possible to collect multiple sources of data on brain disorders and the integration or fusion of neuroimaging and genetic biomarkers has become promising for comprehensive understandings of mental illnesses ([1], [2], [3]). For instance, an explosive growth of studies appears in the literature using imaging genetics to understand the neurogenetic mechanism of schizophrenia ([4], [5]), where the brain connectivity is treated as an endophenotype for a disease phenotype and is associated with genetic variants. This imaging genetics approach has the great potential to identify new molecular targets affecting specific brain systems and provide an enormous impetus for drug discovery ([6], [7]).

Many statistical approaches have been proposed for imaging genetics in recent years ([8], [3]). Based on the employed strategies, we can roughly divide existing methods into three categories, namely, regression models, component-based analyses and graphical models. For regression models, Stein et al. ([9]) have implemented a mass univariate linear model (MULM) for a voxelwise genome-wide association study (vGWAS). One drawback is that the relationship between different neuroimaging phenotypes (e.g. at different regions of the brain) is not explicitly modelled ([8]). As a result, multivariate regression models are applied. Penalized regression, specifically LASSO regression ([10], [11]), and reduced rank regression (RRR) ([12]) are often considered to build the inter- and inner- relationships between genetic and imaging data. Nonetheless, due to computational limitations, multivariate regression models cannot deal with a large number of variables (in the hundreds at most) ([8]). Therefore, they are most applied to reveal local relationships. The component-based analyses, on the other hand, are more computationally efficient for high dimensional data. The methods applied to imaging genetics mainly include independent component analysis (ICA) ([13], [14]), canonical component analysis (CCA) ([15], [16]), and partial least squares (PLS) ([17]). ICA is designed to extract statistically independent components; CCA detects linear combinations of the variables from different datasets that have maximum correlation; and PLS aims to find fundamental relations between two datasets through latent structure. These component-based methods can detect significant subgroups (components); however, they cannot reveal the inner relationships within each subgroup.

To this end, graphical models are proposed to study association networks for the systems with a large number of variables. Among them, Gaussian Graphical Models (GGMs) have been widely used. Current methods of GGMs can be generally categorized into three groups: graphical Lasso ([18], [19]), CLIME ([20]), and nodewise regression ([21]). All these methods are working for the estimation of the precision matrix, i.e., the inverse of the covariance matrix. The graphical Lasso (gLasso) estimates through the penalized likelihood function; CLIME uses a constrained l1 minimization to get the estimation; and the nodewise regression is based on the relation between the partial correlation coefficients and regression coefficients. However, when applying to imaging genetics, the data collected do not often follow the Gaussian distribution or even are continuous. The fMRI images after preprocessing are continuous (Gaussian), DNA methylation data are continuous in [0, 1], the RNA sequencing are usually count data, and SNPs are multinomial with the value {0, 1, 2}. To study the interactions among these heterogeneous data is equivalent to estimate the graphical network whose variables are from various types of data, giving rise to a mixed data problem. Therefore, the original GGMs with Gaussian assumption cannot be directly applied. To relax the Gaussian assumption, a semiparametric Gaussian copula model has been developed for modeling continuous data ([22], [23]), where a monotonically increasing function (copula) is proposed between the observed continuous variables and Gaussian distributed variables. As a result, they are able to estimate the Gaussian covariance through the copula function based on the observations, and then turn the estimation of the association network into a GGM problem. For discrete data like count data, some exponential family graphical models such as Poisson ([24], [25]), and negative-binomial graphical models ([26], [27]) have been built. There also have been a series of works to study the mixed graphical models. For instance, Lee and Hastie proposed a pseudo likelihood method for pairwise graphical models with mixed Gaussian and multinomial data [28]. Yang et al. [29] and Chen et al. [30] proposed exponential family graphical models, which allow the conditional distribution of nodes to belong to the exponential family. A semiparametric exponential family graphical model is recently studied by Yang et al. ([31]). All these approaches essentially model the nodewise conditional distribution by generalized linear models. In contrast, Fan et al. [32] brought up a latent Gaussian copula model which can combine continuous and binary data through a deeper layer of unobserved variables. In addition, the model is semiparametric and offers more flexibility for modelling the interactions between the mixed variables than the linear models [32].

In this paper, we propose a graphical model considering multinomial data or mixed data with multinomial components. The multinomial graphical model can be applied to discrete data with exact boundary, while the mixed graphical model is designed for applications in brain imaging genetics data analysis (see Fig. 1). Following the same idea as Fan et al. ([32]), we assume that the discrete variable was obtained by discretizing a latent (unobserved) continuous variable. Thus, the correlations among the latent variables (the latent correlations) are diminished by estimating them directly from the observed discrete variables. The contributions of this paper are generally two-folds. Mathematically, we create a semi-rank based estimator. With the new estimator defined, we extend the latent graphical model by Fan et. al ([32]) for binary data to account for multinomial data without losing mathematical simplicity and rigor. For medical applications, we apply the proposed model to integrate fMRI image and SNP data from MCIC to identify potential neurogenetics associations in schizophrenia (SZ). Unlike previous studies that only target on brain connectivity ([33], [34]), or genetic interactions ([35], [36]), or the neurogenetic associations ([16], [37]), we can obtain brain connectivity, genetic interaction, and neurogenetic associations at the same time.

Fig. 1:

Fig. 1:

An illustration on the latent Gaussian copula model. X is the observation with the discrete component X1 (SNPs) and continuous component X2 (fMRI). Z1 is the latent variable vector for the discrete component. G is Gaussian distributed with covariance (correlation) matrix Σ.

The rest of the paper is organized as follows. In Section II, we introduce the latent Gaussian copula model and the theoretical properties. The performance of the proposed method is evaluated through both a simulation and real data analysis for schizophrenia study in Section III, followed by some discussions and concluding remarks in the last section.

II. Method

A. Background

Gaussian graphical models (GGMs) have been widely used for the study of association networks due to their mathematical simplicity. The key idea of GGMs is to use partial correlation coefficients to measure the dependency structure of a multivariate system. Since all the variables in a system are directly or indirectly correlated, an advantage of using partial correlation is that it can distinguish direct dependency in the system. Furthermore, it has been proven that if a p-dimensional random vector X follows a multivariate Gaussian distribution, i.e., X = (X1, X2, …, Xp) ~ N(μ, Σ), then the partial correlation between Xi and Xj can be expressed as ρijV\ij=Ωi,jΩi,iΩj,j, i, j = 1, 2, …, p, where μ and Σ denote the unknown mean and covariance matrix, Ω is the precision matrix (i.e., the inverse of covariance matrix, Σ−1), Ωi,j denotes the (i, j)th entry of Ω and V = {1, 2, …, p} denotes the set of indices of variables ([38]). Thus, the construction of GGMs is equivalent to the estimation of the precision matrices. In practice, most problems we face have high dimensionality p but low sample size n. Under the high dimensional setting, the sample covariance matrix may not be invertible and thus the precision matrix cannot be directly computed. Various methods have been proposed in the literature to overcome this difficulty, such as graphical Lasso (gLasso) ([18], [19]), CLIME ([20]), and nodewise regression ([21]).

In this paper, our goal is to develop a connection between discrete and Gaussian distributed data such that we can construct graphical models for systems that contain discrete data via the well-studied high dimensional GGMs. Intuitively, we can assume that a discrete variable is obtained by discretizing a latent (unobserved) Gaussian distributed variable. However, the Gaussian assumption for the latent variable is too strong. It is more reasonable to relax the restriction to a continuous latent variable ([39]). Liu et. al [40] proposed a semiparametric Gaussian copula model, also known as nonparanormal (NPN) model, which realizes a transformation from continuous variables to Gaussian variables and keeps the graph structure after transformation. To be more specific, we give Definition II.1.

Definition II.1. (Gaussian copula model) A random continuous vector X = (X1, X2, …, Xp)′ is sampled from a Gaussian copula model, if and only if there exists a set of monotonically increasing transformations f=(fj)j=1p satisfying f(X) = {f1(X1), f2(X2), …, fp(Xp)}′ ~ Np(0, Σ) with diag(Σ) = Ip. We denote X ~ NPN(0, Σ, f), where NPN stands for the nonparanormal family.

The nonparanormal (NPN) family is a nonparametric extension of the Normal family. It is nonparametric in the sense that it only depends on the transformation functions f=(fj)j=1p and the covariance matrix Σ, which can all be estimated from the data. Following this idea, Fan et al. ([32]) generates a latent Gaussian copula model to handle binary variables, with the assumption that the discrete values are obtained by discretizing a latent continuous variable at some unknown cutoffs. Our goal here is to generalize the model for discrete data that follow a multinomial distribution. We will fill a vacancy in graphical models in handing multinomial data, which can in turn find widespread applications as in our imaging genetics study for the integration of fMRI and SNPs.

B. Latent Gaussian Copula Model

First, we build the latent Gaussian copula model for the case that only has multinomial data. Let X = (X1, X2…, Xp)′ ∈ {0, 1, …, L}p be a p-dimensional multinomial random vector. Then the latent Gaussian copula model for multinomial data is defined as follows.

Definition II.2. (Latent Gaussian copula model for multinomial data) We say that the multinomial random vector X satisfies the latent Gaussian copula model if there exists a p-dimensional random vector Z = (Z1, Z2, …, Zp)′ ~ NPN(0, Σ, f) such that Xj=l=1LI(Zj>Cjl), ∀j = 1, 2, …, p, where I(·) is the indicator function and Cp×L = (C1, C2, …, Cp)′ is a matrix of constants with Cj = (Cj1, Cj2, …, CjL)′ representing the L cutoffs for Xj . Then we denote X ~ LNPN(0, Σ, f, C), where LNPN stands for the latent nonparanormal family. Σ is called the latent correlation matrix. If Z ~ N(0, Σ), then we say X satisfies the latent Gaussian (LN) model LN(0, Σ, C).

For the applications to imaging genetics, we are particularly interested in the associations among heterogeneous data which contain different data types. For example, the study of SNP-brain associations involves both multinomial and continuous data. Therefore, it is necessary to extend our model to mixed data. The definition is given below:

Definition II.3. (Latent Gaussian copula model for mixed data) Let X = (X1, X2) be a p-dimensional random vector, where X1 is a p1-dimension multinomial vector and X2 is a p2-dimension continuous vector, p1 + p2 = p. We say X ~ LNPN(0, Σ, f, C), if there exists a p1-dimension random vector Z1=(Z1,Z2,,Zp1) such that Z := (Z1, X2) ~ NPN(0, Σ, f) and Xj=l=1LI(Zj>Cjl), ∀j = 1, 2, …, p1, where Cp1×L=(C1,C2,,Cp1) is a matrix of constants, f = (f1, f2)′ with f1=(fj)j=1p1 denoting the set of transformation functions for Z1 and f2=(fj)j=p1+1p denoting the set of transformation functions for X2.

Since the model for the multinomial data is a special case of the mixed model, we will just discuss the latent Gaussian copula model for mixed data in the following of the paper.

C. Parameter Estimation

In the latent Gaussian copula model, we assume that the multinomial component X1 is generated by a latent continuous random vector Z1 with multiple truncations at Cl’s, l = 1, 2, …, L. Combining with the continuous component X2, Z := (Z1, X2) satisfies the Gaussian copula model. Note that the parameters (Σ, f, C) in the model cannot all be identified from the data. For the continuous component, the marginal transformations f2=(fj)j=p1+1p are identifiable ([22]). However, the parameters for the multinomial component (f1, C) are not identifiable since a lot of information has been lost during the discretization. Similar to the arguments in [32], we find that for the multinomial component, it is only identifiable if Δjl := fj(Cjl), l = 1, 2, …, L, j = 1, 2, …, p1, the l-th cutoff points of the corresponding standard Gaussian distributed variable of Xj. Thus, we denote our model as LNPN (Σ, Δ, f2), where Δ=(Δ1,Δ2,Δp1), Δj = (Δj1, Δj2, …, ΔjL)′, j = 1, 2, … p1 stores the cutoff values for the multinomial variables.

Assume we have n independent samples X1, X2, …, Xn ~ LNPN(Σ, Δ, f2). As shown in Fig. 1, in order to learn the graph structure of X = (X1, X2)’s, we need to estimate the latent correlation matrix Σ first. Because the latent variable Z1 is not observed, we cannot use the Gaussian copula model directly. In [32], Fan et. al proposed a rank based estimator of Σ for binary data, which cannot keep its theoretical properties in multinomial case. Therefore, we need to create a new way to estimate Σ from our observations Xi’s.

To start with, we define a semi-rank based statistics r^jk for the following three different cases,

1. 1 ≤ jkp1, i.e., Xij ,Xik are both multinomial,

r^jk=2n(n1)1i<in(XijXij)(XikXik), (1)

2. 1 ≤ jp1 and (p1 + 1) ≤ kp, i.e., Xij is multinomial and Xik is continuous,

r^jk=2n(n1)1i<in(XijXij)sgn(XikXik), (2)

3. p1 + 1 ≤ jkp, i.e., Xij ,Xik are both continuous,

r^jk=2n(n1)1i<insgn(XijXij)sgn(XikXik), (3)

where sgn(·) is the function that extracts the sign of a number and sgn(0) = 0.

Our next step is to identify the link function between the intermediate statistics r^jk and the latent correlation Σjk. Let Φ(·) denote the cumulative density function (cdf) of the standard normal distribution and

Φ2(u,v,t)u+v+ϕ2(x1,x2,t)dx1dx2

denote the cdf of a standard bivariate normal distribution, where ϕ2(x1, x2, t) is the probability density function (pdf) of a standard bivariate normal distribution with correlation t.

For Case 1, the expectation of r^jk is

E(r^jk)=2{l=1Lm=1LΦ2(Δjl,Δkm,Σjk)l=1LΦ(Δjl)m=1LΦ(Δkm)} (4)

and we denote it as Fj,k; Δj, Δk).

Similarly, for Case 2, we denote

G(Σj,k;Δj)E(r^jk)=41=1LΦ2(Δjl,0,Σjk/2)21=1LΦ(Δjl). (5)

For Case 3, Kendall ([41]) has already given the link function

H(Σj,k)=E(r^jk)=2sin1(Σjk/π) (6)

Therefore, we identify the monotonic link functions between the intermediate statistics r^jk and the latent correlation Σjk for all cases. The derivations of Eq. (4) and (5) can be found in Appendix A.1, A.2. The unknown parameters Δj = (Δj1, Δj2, …, ΔjL)′ in Eq. (4), (5) can be estimated through the following equation set,

P(Xij>=l)=Φ(Δjl)=1Φ(Δjl),l=1,2,,L (7)

The maximum likelihood estimation (MLE) of the cdf P (Xij >= l) is

P(Xij>=l)=1ni=1nI{Xijl}

where I{A} is the indicator function of a subset A. Thus, Δ^j=(Δ^j1,Δ^j2,,Δ^jL) can be obtained by

Δ^jl=Φ1(11ni=1nI{Xijl}),l=1,2,,L (8)

Finally, we can get the estimated latent correlation matrix R^=(R^jk)p×p through the link functions and we summarize the algorithm in Algorithm 1. To promote the application of the proposed method, we make the R code available at https://github.com/Aiying0512/LGCM.

Algorithm 1.

Algorithm for the latent correlation matrix with mixed data

Input: Observed mixed sample vector x=(x1,x2)n×p, x1n×p1 is discrete and x2n×p2 is continuous.
Output: Estimated latent correlation matrix R^
 1. Calculate the intermediate statistics r^.
Start
  for j = 1 to p − 1 do
   for k = j + 1 to p do
    if 1 ≤ jkp1 then Calculate r^jk by Eq.(1)
    else if jp1, k > p1 then Calculate r^jk by Eq.(2)
    else Calculate r^jk by Eq.(3)
End
 2. Estimate Δ=(Δ1,Δ2,Δp1).
Start
  for j = 1 to p do
   for l = 1 to L do
    Calculate Δ^jk by Eq.(8)
End
 3. Obtain the estimated latent correlation matrix R^
Start
  if j = k =1, 2, …p then R^jj=1
  else if 1 ≤ jkp1 then
   R^jk=R^kj=F1(r^jk;Δ^j,Δ^k)
  else if jp1, k > p1 then
   R^jk=R^kj=G1(r^jk;Δ^j)
  else R^jk=R^kj=H1(r^jk)
End

D. Theoretical Properties

In this section, we will discuss theoretical properties of our proposed method. To study the convergence of R^jk, we assume it to satisfy the following conditions:

(I) There exists a constant δ > 0 such that |Σjk| < 1 − δ, ∀jk = 1, 2, …, p.

(II) There exists a constant M < ∞ such that |Δjl| < M, ∀j, = 1, 2, …, p and l = 1, 2, …, L.

First, let’s consider the multinomial component.

Lemma II.1. (Monotonicity) For any fixed Δ^j, Δ^k, the link function F (t; Δj, Δk) is strictly increasing on t ∈ (−1, 1).

Lemma II.2. (Lipschitz continuous) Under conditions (I), (II), F −1(τ; Δj, Δk) is Lipschitz continuous in τ uniformly over Δj, Δk, i.e., there exists a Lipschitz constant L2 independent of (Δj, Δk), s.t.

|F1(τ1;Δj,Δk)F1(τ2;Δj,Δk)|L2|τ1τ2|.

The following theorem establishes the Op(log(p)/n) convergence rate of r^jkΣjk uniformly over j, k.

Theorem II.1. Under conditions (I) and (II), for any ϵ > 0, we have

P(|F1(τ^jk;Δ^j,Δ^k)Σjk|>ϵ)2 exp{nϵ22L22}+4 exp{nπϵ264L4L12L22}+4 exp{2nM2L12},

where L1, L2 are positive constants given in Appendix A.5.

This is, for some constant c, sup1j,kp1|R^jkΣjk|clog(p1)/n with probability greater than 1p11. For the estimation of correlation between a multinomial variable and a continuous variable, we can also derive similar results shown as Lemma II.3 and Theorem II.2. Specifically, for some constant c, sup1jp1,p1+1kp|R^jkΣjk|clog(p)/n with probability greater than 1 − p−1.

Lemma II.3. (Lipschitz continuous) Under conditions (I), (II), G−1(τ; Δj) is Lipschitz continuous in τ uniformly over Δj, Δk, i.e., there exists a Lipschitz constant L3 independent of Δj, s.t.

|G1(τ1;Δj)G1(τ2;Δj)|L3|τ1τ2|.

Theorem II.2. Under conditions (I) and (II), for any ϵ > 0, we have

P(|G1(τ^jk;Δ^j)Σjk|>ϵ)2 exp{nϵ22L32}+2 exp{nπϵ236L2L12L32}+2 exp{2nM2L12}

where L3 is a positive constant given in Appendix A.6.

For the continuous component, Liu et. al ([40]) have already proven that

Theorem II.3. With probability greater than 1 − p−1,

supp1+1j,kp|R^jkΣjk|2.45πlog(p2)/n.

Combining Theorem II.1 - II.3, we obtain the error bound for R^jkΣjk uniformly over 1 ≤ j, kp.

Corollary II.1. Under conditions (I), (II), with probability greater than 1 − p−1,

sup1j,kp|R^jkΣjk|clog(p)/n,

where c is a constant independent of (n, p).

The detailed proof of the above properties can be found in Appendix A.36.

E. Learning the Latent Graph Structure

Since our goal is to distinguish the relationship of direct dependency in a system, we want to get the latent partial correlations after the estimation of the latent correlation matrix. Under the latent Gaussian copula model, the inverse matrix Ω = Σ−1 identifies the conditional independence among X, i.e., Xi and Xj are independent given all other variables (ρij|−ij = 0) if and only if Ωij = 0. Ω is called the latent precision matrix. Therefore, the construction of the graphical model is equivalent to the estimate of Ω. With the estimation of the latent correlation matrix R^, the problem has been turned into a Gaussian Graphical Model and therefore gLASSO ([18]), CLIME ([20]) and the nodewise regression method ([21]) can be applied.

However, these GGM methods require the positive definite property of estimated latent correlation matrix, which cannot be guaranteed. Therefore, we project R^ into the cone of positive semidefinite matrices, i.e.,

Rp=arg min R0maxi,j|R^ijRij|,

using the smoothed approximation method proposed by [42]. Zhao et. al ([43]) proved that under assumptions (I) and (II), for the projection R^p, we have

sup|R^p,ijΣ|clog(p)/n,

where c′ is a constant independent of (n, p). This guarantees the convergence of using GGM methods.

Then the latent precision matrix Ω can be estimated by one of the methods mentioned before with the following equations: for gLasso,

Ω^=arg minΩ0{tr(R^pΩ)log det(Ω)+λjk|Ωjk|}; (9)

for CLIME,

Ω^=arg minΩ1i,j|Ωij|,subject to |R^pΩI|maxmaxi,j|(R^pΩI)ij|λ. (10)

For the nodewise regression method, we use the same procedure discussed in [19] except for using R^p as the empirical correlation matrix.

To select the tuning parameter λ in the GGM methods including gLASSO, CLIME and nodewise regression, we use the high-dimensional Bayesian information criterion (HBIC) ([44], [45]) defined as

HBIC(λ)=tr(S^Ω^λ)log|Ω^λ|+Cnlog(p)nsλ, (11)

where Ŝ denotes the correlation estimator, and in our case S^R^p, Cn = log(log(n)) and sλ is the number of edges corresponding to Ω^λ.

III. Results

A. Numerical Studies

In this section, we evaluate our proposed model through a series of simulation studies. We set L = 2 and assume the cutoff matrix Cp×2 has Cj1 ~ Unif(0.25, 0.85), Cj2 ~ Unif(1.5, 2), ∀j = 1, 2, …, p.

First, we examine the estimated latent correlation matrix under the null hypothesis, i.e., the latent correlation r = 0 between any two variables. Under the null hypothesis, there is no edge between any two nodes in a graph. It is sufficient for us to study the latent correlation between two variables under the null hypothesis. For a Pearson correlation coefficient ρ, under the null hypothesis H0 : ρ = 0, we can expect that its z-score follows the standard normal distribution N(0, 1). We use the normal quantile-quantile (QQ) plot to examine the distribution of the latent correlations under the null hypothesis. Figure 2 gives the QQ plots for Case 1 : two variables are both discrete, and Case 2: one variable is discrete and the other is continuous. As we can see, after applying the proposed model, the latent correlation still keeps the same property as a Pearson correlation.

Fig. 2:

Fig. 2:

The QQ plots of the latent correlations under null distribution for Case 1 (left) and Case 2 (right). Each latent correlation is calculated based on n = 500 samples and the experiment is replicated for t = 100, 000 times.

Then, we simulate small-world graphs through the R package igraph, in which all variables are highly correlated in the system. The total number of edges in a graph is 2p. After obtaining the adjacency matrix Ep×p of the graph, we generate the latent precision matrix Ω with the diagonal elements Ωjj = 1. For the off-diagonal elements, Ωjk = 0, jk, if Ejk = 0; otherwise, Ωjk~i.i.d.Unif(0.2,0.7). Finally, we re-scale the diagonal elements to 1 + |Λmin{Ω} + 0:01 to make sure the positive definiteness of Ω, where Λmin denotes the smallest eigenvalue of a matrix. Then we obtain the latent correlation matrix Σ and consider the following two data generating scenarios.

  1. (Discrete case) Simulate data X = (X1, X2, …, Xp), where Xj = I(Zj > Cj1) + I(Zj > Cj2), ∀j = 1, 2, …, p and Z ~ N(0, Σ).

  2. (Mixed case) Simulate data X = (X1, X2, …, Xp), where Xj = I(Zj > Cj1) + I(Zj > Cj2), for j = 1, 2, …, p/2, Z ~ N(0, Σ) and Xj = Zj, for j = 1, 2, …, p/2.

We compare 6 estimation methods for the precision matrix Ω: 1. inverse the Pearson correlation matrix estimator (Pearson estimator); 2. inverse the latent correlation matrix estimator (Latent estimator); 3. use gLASSO with the Pearson correlation (P-gLASSO); 4. use CLIME with the latent correlation (L-CLIME) in Eq. (10); 5. use gLASSO with the latent correlation (L-gLASSO) in Eq. (9); 6. use the nodewise regression method with the latent correlation (L-NR). The first two methods are only applied when n > p. We also use CLIME method and nodewise regression method with the Pearson correlation (P-CLIME, P-NR, respectively) and find the three estimators performed similarly. Hence, we only present the result of P-gLASSO for the comparison with the latent models. CLIME is implemented through the R package flare, and gLASSO and the nodewise regression are implemented through the R package huge.

Figure 3 compares the ROC curves using each estimator under various variable sizes for each scenario with n = 200. We simulate 50 datasets independently for each setting, respectively. We find that the diagnostic ability of an estimator using the latent correlation is significantly higher than the one using the Pearson correlation. For the low dimension case (n > p), the Latent estimator gives the best diagnostic performance, while the L-gLASSO performs poorly for the mixed data case. For the high-dimensional cases, the performances of the three latent methods: L-CLIME, L-gLASSO and L-NR are similar. In Appendix B, we have discussed the effects of sample size n and variable size p on the ROC curves. We also evaluate the performances of different methods with AUC values and relative errors by Frobenius norm in Appendix B.

Fig. 3:

Fig. 3:

The ROC curves under various variable settings for the discrete (left column) and mixed (right column) scenarios with n = 200. The top row shows the performance for p = 50, while the bottom row shows for p = 500.

B. Analysis for Schizophrenia Study

We apply the method to real schizophrenia (SZ) data from the Mind Clinical Imaging Consortium (MCIC) ([46]) where both brain fMRI images and SNP data from 91 SZ patients and 106 healthy controls were collected. We prepossessed the data described as follows:

1) SNP data. A blood sample was obtained from each subject and DNA was extracted. Genotyping was performed at the Mind Research Network, covering 1, 140, 419 SNP loci, out of which 777, 365 SNPs loci were retained after quality control ([16]). We selected the SNPs that have been reported to be associated with SZ at significance level α = 5×10−8 from GWAS catalog (https://www.ebi.ac.uk/gwas/) and 426 SNPs are left in the analysis.

2) fMRI image. The image data were collected during a sensory motor task, a block design motor response to auditory stimulation. Standard prepossessing steps were applied using SPM12, including motion correction, spatial normalization to standard MNI space, spatial smoothing with a 3mm FWHM Gaussian kernel. Then multiple regression considering the influence of motion was performed and the stimulus on-off contrast maps for each subject were collected. Finally, 264 region of interests (ROIs) were extracted based on the Power parcellation method ([47]) with a sphere radius parameter of 5 mm.

Based on the simulation studies, we apply the L-gLASSO method to estimate the latent precision matrix with p1 = 426 and p2 = 264. The tuning parameter λ is chosen by the HBIC method in Eq. (11). We have identified 200 pairs of SNP-ROI associations. The full list is shown in Appendix C. We have discovered 9 hub ROIs and 11 hub SNPs from the SNP-ROI associations (see Table I and Table II). Here we define hubs as the nodes with degrees at least two standard deviation higher than the mean degrees ([5]).

TABLE I:

The hub ROIs in the SNP-ROI associations.

Index MNI Anatomical region Network Degree
21 (29, −17, 71) PRED SSN 4
23 (−23, −30, 72) POSTG SSN 4
81 (−44, 12, −34) T2PG DMN 4
82 (46, 16, −30) T2PD DMN 4
91 (−3, −49, 13) PQG DMN 5
130 (47, −50, 29) AGD DMN 6
202 (−3, 26, 44) F1MG FPN 4
203 (11, −39, 50) MCIND SN 5
222 (6, −24, 0) THAD SCN 5

TABLE II:

The hub SNPs and their mapped genes in the SNP-ROI associations.

SNP Index Mapped gene Degree
rs264480 TMEM132D 32
rs1797052 PDZK1 14
rs10924245 KIF26B 12
rs13118894 CC2D2A 12
rs2774292 SYT6
LOC107985443
10
rs17699030 DOCK6 10
rs6435387 KIF5C 7
rs17255281 COL18A1
SLC19A1
9
rs16873221 GBA3
LOC105374524
6
rs11827962 ANO5
SLC17A6
6
rs4073405 SYT3
LOC101928812
6

From the aspect of the brain, anatomically, we find that the frontal lobe, parietal lobe and the temporal lobe are closely affected by the gene mutations. Specifically, we related the 264 ROIs to AAL 116 atlas and have identified the ROIs located on the left superior frontal gyrus, medial (F1MG), the right middle frontal gyrus (F2D) and the right precentral gyrus (PRED) in the front lobe; the left precuneus (PQG), the right angular gyrus (AGD) and the left postcentral gyrus (POSTG) in the parietal lobe; the left middle temporal gyrus (T2G) and the the inferior temporal gyrus (T3) in the temporal lobe; the right cingulate gyrus, middle part (MCIND) and right thalamus (THAD). Based on the functional ability of the brain regions, the 264 ROIs are divided into 12 functional network systems (see Fig. 4), including sensory/somatomotor network (SSN), cingulo-opercular task control network (CON), auditory network (AN), default mode network (DMN), memory retrieval network (MRN), visual network (VN), fronto-parietal task control network (FPN), salience network (SN), subcortical network (SCN), ventral attention network (VAN), dorsal attention network (DAN) and cerebellum network. Three functional networks: DMN, SSN and SN are particularly affected by genetic variations.

Fig. 4:

Fig. 4:

12 functional networks considered in this paper. These networks are expressed in the 264 nodes of the template defined by Power et al. ([47]) and visualized through the BrainNet Viewer ([48]).

For the hub ROIs listed in Table I, we find that PRED and POSTG are affected by the same SNPs rs10924245, rs13118894, rs802568, rs264480; T2PG interacted with rs264480, rs17699030, rs7267005, rs17093238; and T2PD interacted with rs2609653, rs7303433, rs17292804, rs17093238; PQG was affected by rs2774292, rs1797052, rs2540277, rs4611189, rs4073405; AGD was influenced by rs13118894, rs17005123, rs8321, rs886424, rs17053965, rs264480; both F1MG and MCIND were interacted with 2 common SNPs: rs2774292 and rs264480, but also had their own influential SNPs: rs13118894, rs17108911 for F1MG, and rs16873221, rs7899719, rs4073405 for MCIND; for THAD, the interacted SNPs were rs7559992, rs13118894, rs16873221, rs264480, rs13332492. To better understand the underlying biological functions of the SNP-ROI associations, for each hub ROI, we conduct a gene enrichment analysis on the genes whose corresponding SNPs are associated with the ROI. We have found two significant pathways and one ontology term that are related to the brain (see Table III).

TABLE III:

Details about the gene enrichment results.

Pathway Database q-val ROI*
Sympathetic Nerve Pathway PharmGKB 0.0214 202
Glycosphingolipid metabolism Reactome 0.00893 203
GO term Category (level) q-val ROI*
GO:0007269 BP (2) 0.002 91
neurotransmitter secretion 0.003 203
*

Here we only give the index of the ROIs. The detailed information can be found in Table I.

On the other hand, Fig. 5 visualizes the associations between the hub SNPs and the ROIs. As we can see, the SNP rs264480 is most active with brain associations, which has influences on the frontal lobe (especially around F1M), AG and Insula. The second active SNP rs1797052 is mainly interacted with the frontal-parietal lobe and the cingulate gyrus. The SNPs rs10924245 and rs11827962 mostly influence the frontal lobe and temporal lobe. Other hubs including rs13118894, rs2774292, rs6435387, rs16873221, rs4073405 are shown to have impact on the frontal lobe. Further, similar to the gene enrichment analysis, for each SNP hub, we apply the hypergeometric test to detect significant functional networks that it may have affected. The results are shown in Table IV, where q-val represents the corrected p value using the BH method ([49]).

Fig. 5:

Fig. 5:

The associations between the hub SNPs and the ROIs. The figure is visualized through the BrainNet Viewer ([48]).

TABLE IV:

The detected functional networks through the hypergeometric test.

SNP index (gene) Network q-val
rs1797052 (PDZK1) Salience 0.016
rs17699030 (DOCK6) Cingulo-opercular 1.78 × 10−5

C. Discussion

1). Comparison with P-gLASSO:

In simulation studies, we have shown the advantages of using latent correlation over Pearson correlation in identifying the graph structure of mixed data with multinomial components. The medical problem in this work is to integrate image and genetic information to understand the neurogenetic mechanism of complex diseases like SZ. Thus, we compared the proposed latent correlation with Pearson correlation using gLASSO ([18]) on the MCIC study with fMRI image and SNP data. From Table V, the L-gLASSO identifies more SNP-ROI associations, because the latent estimation can enhance the correlations among discrete variables and thus help find hidden relationships. Following the same definition of hubs, we have found 5 SNP hubs by P-gLASSO and one of them, rs4073405, overlapped with the findings by L-gLASSO. However, the hub organization of the brain has not been detected. Among the SNP-ROI associations, there are 28 pairs in common between using P-gLASSO and L-gLASSO. We have further applied the Fisher’s exact test to the identifications of ROI-SNP associations using P-gLASSO and L-gLASSO, and get the p-value < 2.2 × 10−16. This indicates that the results of the two methods are significantly related.

TABLE V:

A comparison of the number of discoveries based on the MCIC study.

P-gLASSO L-gLASSO Overlap
# of ROI hubs 0 9 0
# of SNP hubs 5 11 1
# of SNP-ROI associations 92 200 28

2). Biological implication:

In Section III.B, we have detected 11 hub SNPs. By checking the Genecard (www.genecards.org), we find that 7 genes including TMEM132D, KIF26B, SYT6, SLC19A1, KIF5C, SLC17A6, SYT3, are overexpressed in brain and 2 genes, CC2D2A and ANO5 are highly expressed in brain. More specifically, TMEM132D is a single-pass transmembrane protein that implicates connectivity in brain regions important for anxiety-related behavior [50]). Studies have shown that the risk allele is associated with higher TMEM132D expression in the frontal cortex ([51]). The mechanism of TMEM132D with angular and insula in SZ is still open to discover. KIF26B locates in the intervals of 1q44, which have been considered as critical regions containing genes leading to structural abnormalities of the corpus callosum ([52]). Significant damages of corpus callosum of SZ patients have been reported in the frontal lobe and temporal lobe ([53]). The product of gene SLC19A1 is a membrane transporter, which plays an important role in folate metabolism ([54]). It is situated on the cerebrospinal fluid (CSF) side of the choroid plexus, where it enables transport of concentrated folate into the CSF ([55]). Also, the SLC19A1 gene has been recognized with neural tube defects (NTD) ([56]). Taken together, variations in SLC19A1 involve both neuronal structures and metabolism in the brain ([57]), which gives a potential explanation of its effect on the frontal and temporal lobes. As a member of the kinesin family of proteins, KIF5C plays an important role in brain neurodevelopment. The mutations of KIF5C may cause cortical malformation including the frontal cortex ([58]), which could cause neurobehavioral issues. As for SLC17A6, a study shows that its mutation is sufficient to cause strong modifications in both motor system and mesostriatal dopamine system ([59]). SYTs are a protein family known to operate by binding to calcium ions, anionic lipids, and to syntaxin. Among them, Syts 3 and 6 are localized to plasma membranes and found to probably function in Ca2+ triggered PC12 exocytosis ([60]). Recent studies proved that reduced expression of SYT6 would attenuate calcium-triggered neurotransmitter release and has been observed in SZ patients ([61]).

Two genes: PDZK1 and DOCK6 are not over/highly expressed in the brain, yet are extremely important to the brain function network. PDZK1 encodes the protein PDZK1 that interacts with N-methyl-Daspartate (NMDA) receptors and neuroligins ([62]). The hypofunction of NMDA receptors impairs the salience network ([63]). The function of DOCK6 is to promote neurite outgrowth and regulate axonal growth and regeneration of sensory neurons ([64]). Our results showed DOCK6 may have significant effects on the Cingulo-opercular network (CON). Although this association has not been well discussed in the literature, we found that CON, known as “task-set maintenance” network, integrates various data including sensory data to assess the homeostatic relevance of internal and external stimuli ([65]). With the mutation of DOCK6, the activity of sensory neurons are inhibited and hence affect the performance of CON.

From the aspect of the ROIs, the cingulate gyrus and thalamus were well-known to be associated with SZ. Generally, it has been demonstrated in the literature that there is disrupted functional integration of widespread brain areas under auditory tasks, including a decreased connectivity in the frontal and temporal lobe ([66]). Moreover, previous study ([33]) using MCIC data has found aberrant connectivity in F2, CIA, T3 and PQ, with which our results are consistent. The brain hubs we identified in the SNP-ROI associations were asymmetric, which gives a potential explanation that the asymmetrical performance ([67]) in SZ may be caused by genetic variations. Further, we implemented gene enrichment analyses for the hub ROIs. The results show that the cingulate gyrus is regulated by the glycosphingolipid metabolism pathway. Glycosphingolipid (GSL) is a lipid with high expression in the nerve system, which regulates the formation of brain ([68]). The disorder of GSL has been found to have impact on the cingulate gyrus ([69]). The precuneus (PQ) and the cingulate gyrus (MCIN) are found to be the site of a neurotransmitter secretion process. To be more specific, this biological process includes calcium ion-regulated exocytosis of neurotransmitter and regulation of neurotransmitter levels. PQ is a region involved in various complex functions, such as memory retrieval, self-consciousness and information integration. MCIN is related to motor movements and decision making. Neuroimaging studies have demonstrated that, activities in the cingulate cortex and precuneus area during a wide range of tasks require external orientation ([70]). Neurotransmitters like GABA and glutamate are associated with the deactivation of PQ ([71]) and dopamine system has influence on the activity of MCIN ([72]).

IV. Conclusion

In this paper, we build a latent graphical copula model for mixed types of data including multinomial components. The model can find widespread applications in multi-modal data analysis such as association detection between genomic variations and brain regions in this study. The key of our method is to find the link function between the r statistics we have defined and the latent correlation. The main contributions of our work can be summarized as follows. First, instead of using Kendall’s τ statistics or Spearman’s ρ statistics ([22], [32]), the r statistics actually result in a simpler link function to derive the latent correlation matrix estimator. We call the r-statistics the semi-rank based estimator, which is in contrast to the rank based latent correlation matrix estimator τ proposed in [32] for mixed data with binary components. The definition is

τ^jk=2n(n1)1i<insgn(XijXij)sgn(XikXik).

However, this estimator cannot be directly generalized to multinomial cases. In our model, we only keep the sign function for the continuous data and thus name the r-statistics as semi-rank based estimator. Second, the proposed latent model has both mathematical simplicity and rigor. Third, the simulation studies show that the proposed method is stable over different settings and has improved performance in graph structure detection compared to 6 existing methods, which suggests that our method is a more solid statistical model to integrate fMRI image and SNP data. Finally, as a follow-up of our previous study ([73]), the proposed method is applied to a schizophrenia study with real fMRI image and SNP data. Compared to the results using P-gLASSO, our model has identified more ROI-SNP associations, as well as more hub ROIs and SNPs. Some of our findings have been verified in the literature. We have also discovered a set of novel ROI-SNP associations and discussed their possible neurochemical pathways in Sec.III.C 2), which shed new lights on the neurogenetic mechanisms of brain disorders.

Supplementary Material

append

V. Acknowledgment

The work has been funded by NIH (R01GM109068, R01MH104680, R01MH107354, P20GM103472, 2R01EB005846, 1R01EB006841), and NSF (#1539067).

Biography

graphic file with name nihms-1654689-b0001.gif

Aiying Zhang received the B.S. in statistics from the University of Science and Technology of China and currently is a Ph.D student in biomedical engineering at Tulane University. Her research interests include medical imaging, bioinformatics and imaging genetics. She mainly focuses on the development of statistical models, specifically graphical models, for biomedical applications.

References

  • [1].Hariri AR, Drabant EM, and Weinberger DR, “Imaging genetics: perspectives from studies of genetically driven variation in serotonin function and corticolimbic affective processing,” Biological psychiatry, vol. 59, pp. 888–897, 2006. [DOI] [PubMed] [Google Scholar]
  • [2].Meyer-Lindenberg A, “The future of fMRI and genetics research,” NeuroImage, vol. 62, pp. 1286–1292, 2012. [DOI] [PubMed] [Google Scholar]
  • [3].Liu J and Calhoun VD, “A review of multivariate analyses in imaging genetics,” Front Neuroinform, vol. 8, p. 29, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Meyer-Lindenberg A, “Imaging genetics of schizophrenia.” Dialogues in Clinical Neuroscience, vol. 12(4), pp. 449–456, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Fang J et al. , “Fast and accurate detection of complex imaging genetics associations based on greedy projected distance correlation,” IEEE Transactions on Medical Imaging, vol. 37, pp. 860–870, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Potkin S, Macciardi F, and Guffanti G, “Identifying gene regulatory networks in schizophrenia,” Neuroimage, vol. 53, pp. 839–847, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Batmanghelich NK, Dalca A, Sabuncu M, Golland P, and ADNI., “Joint modeling of imaging and genetics,” Information processing in medical imaging, vol. 23, pp. 766–777, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Nathoo F, Kong L, and Zhu H, “A review of statistical methods in imaging genetics.” arXiv preprint, p. arXiv:1707.07332, 2017. [DOI] [PMC free article] [PubMed]
  • [9].Stein JL, Hua X, Lee S, Ho AJ, Leow AD, Toga AW, Saykin AJ, Shen L, Foroud T, Pankratz N et al. , “Voxelwise genome-wide association study (vgwas),” neuroimage, vol. 53, no. 3, pp. 1160–1174, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Wang H, Nie F, Huang H, Kim S, Nho K, Risacher SL, Saykin AJ, Shen L, and Initiative ADN, “Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the adni cohort,” Bioinformatics, vol. 28, no. 2, pp. 229–237, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Greenlaw K, Szefer E, Graham J, Lesperance M, Nathoo FS, and Initiative ADN, “A bayesian group sparse multi-task regression model for imaging genetics,” Bioinformatics, vol. 33, no. 16, pp. 2513–2522, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Vounou M, Janousova E, Wolz R, Stein JL, Thompson PM, Rueckert D, Montana G, Initiative ADN et al. , “Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in alzheimer’s disease,” Neuroimage, vol. 60, no. 1, pp. 700–716, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Calhoun VD and Adali T, “Multisubject independent component analysis of fmri: a decade of intrinsic networks, default mode, and neurodiagnostic discovery,” IEEE reviews in biomedical engineering, vol. 5, pp. 60–73, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Chen J, Calhoun VD, Pearlson GD, Perrone-Bizzozero N, Sui J, Turner JA, Bustillo JR, Ehrlich S, Sponheim SR, and Cañive J. o., “Guided exploration of genomic risk for gray matter abnormalities in schizophrenia using parallel independent component analysis with reference,” Neuroimage, vol. 83, pp. 384–396, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Fang J, Lin D, Schulz S, Xu Z, Calhoun V, and Wang Y, “Joint sparse canonical correlation analysis for detecting differential imaging genetics modules.” Bioinformatics, vol. 32(22), pp. 3480–3488, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Hu W et al. , “Adaptive sparse multiple canonical correlation analysis with application to imaging (epi)genomics study of schizophrenia,” IEEE TBME, vol. 65, pp. 390–399, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Le Floch É, Guillemot V, Frouin V, Pinel P, Lalanne C, Trinchera L, Tenenhaus A, Moreno A, Zilbovicius M, Bourgeron T et al. , “Significant correlation between a set of genetic polymorphisms and a functional brain network revealed by feature selection and sparse partial least squares,” Neuroimage, vol. 63, no. 1, pp. 11–24, 2012. [DOI] [PubMed] [Google Scholar]
  • [18].Yuan M and Lin Y, “Model selection and estimation in the gaussian graphical model,” Biometrika, vol. 94, pp. 19–35, 2007. [Google Scholar]
  • [19].Friedman J, Hastie T, and Tibshirani R, “Sparse inverse covariance estimation with the graphical lasso,” Biostatistics, vol. 9, pp. 432–441, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Cai T, Liu W, and Luo X, “Aconstrained l1 minimization approach to sparse precision matrix estimation,” JASA, vol. 106, pp. 594–607, 2011. [Google Scholar]
  • [21].Meinshausen N and Bühlmann P, “High-dimensional graphs and variable selection with the lasso,” Annals of Statistics, vol. 34, pp. 1436–1462, 2006. [Google Scholar]
  • [22].Liu H, Lafferty JD, and Wasserman LA, “The nonparanormal: semiparametric estimation of high dimensional undirected graphs,” J. Mach. Learn. Res, vol. 10, pp. 2295–2328, 2009. [Google Scholar]
  • [23].Xue L and Zou H, “Regularized rank-based estimation of high-dimensional nonparanormal graphical models,” Ann. Statist, vol. 40, pp. 2541–2571, 2012. [Google Scholar]
  • [24].Sultan M, Schulz M, and Richard H, “A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome,” Science, vol. 321, pp. 956–960, 2008. [DOI] [PubMed] [Google Scholar]
  • [25].Jia B, Xu S, Xiao G, Lamba V, and Liang F, “Learning gene regulatory networks from next generation sequencing data,” Biometrics, vol. 11, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Robinson M and Oshlack A, “A scaling normalization method for differential expression analysis of rna-seq data,” Genome Biology, vol. 11, p. R25, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Anders S and Huber W, “Differential expression analysis for sequence count data,” Nature Proceedings, vol. 11, p. R106, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Lee JD and Hastie TJ, “Learning the structure of mixed graphical models,” Journal of Computational and Graphical Statistics, vol. 24, no. 1, pp. 230–253, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Yang E, Allen G, Liu Z, and Ravikumar PK, “Graphical models via generalized linear models,” in Advances in Neural Information Processing Systems, 2012, pp. 1358–1366. [Google Scholar]
  • [30].Chen S, Witten DM, and Shojaie A, “Selection and estimation for mixed graphical models,” Biometrika, vol. 102, pp. 47–64, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Yang Z, Ning Y, and Liu H, “On semiparametric exponential family graphical models,” The Journal of Machine Learning Research, vol. 19, no. 1, pp. 2314–2372, 2018. [Google Scholar]
  • [32].Fan J, Liu H, Ning Y, and Zou H, “High dimensional semiparametric latent graphical model for mixed data,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 79, pp. 1467–9868, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Zhang A, Fang J, Liang F, Calhoun VD, and Wang Y, “Aberrant brain connectivity in schizophrenia detected via a fast gaussian graphical model,” IEEE Journal of Biomedical and Health Informatics, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Swanson N, Eichele T, Pearlson GD, Kiehl KA, and Calhoun VD, “Lateral differences in the default mode network in healthy controls and schizophrenia patients,” Hum Brain Mapp, vol. 32, pp. 654–664, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Chumakov I and et al. , “Genetic and physiological data implicating the new human gene g72 and the gene for d-amino acid oxidase in schizophrenia,” National Academy of Sciences, vol. 99, pp. 13 675–13 680, 2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Millar J and et al. , “Disc1 and pde4b are interacting genetic factors in schizophrenia that regulate camp signaling,” National Academy of Sciences, vol. 310, pp. 1187–1191, 2005. [DOI] [PubMed] [Google Scholar]
  • [37].Lin D, Calhoun VD, and Wang Y-P, “Correspondence between fMRI and SNP data by group sparse canonical correlation analysis.” Medical image analysis, vol. 18(6), pp. 891–902, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Lauritzen S, Graphical Models. Oxford: Oxford University Press, 1996. [Google Scholar]
  • [39].Skrondal A and Rabe-Hesketh S, “Latent variable modelling: A survey,” Scandinavian Journal of Statistics, vol. 34, pp. 712–745, 2007. [Google Scholar]
  • [40].Liu H, Han F, Yuan M, Lafferty JD, and Wasserman LA, “High dimensional semiparametric gaussian copula graphical models,” Ann. Statist, vol. 40, pp. 2293–2326, 2012. [Google Scholar]
  • [41].Kendall MG, Rank Correlation Methods. London: Griffin, 1948. [Google Scholar]
  • [42].Nesterov Y, “Smooth minimization of non-smooth functions,” Math. Programming, vol. 103, pp. 127–152, 2005. [Google Scholar]
  • [43].Zhao T, Roeder K, and Liu H, “Positive semidefinite rank-based correlation matrix estimation with application to semiparametric graph estimation,” J. Computnl Graph. Statist, vol. 23, pp. 895–922, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [44].Wang L, Kim Y, and Li R, “Calibrating non-convex penalized regression in ultra-high dimension,” Ann. Statist, vol. 41, pp. 2505–2536, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Fan Y and Tang CY, “Tuning parameter selection in high dimensional penalized likelihood,” J. R. Statist. Soc. B, vol. 75, pp. 531–552, 2013. [Google Scholar]
  • [46].Gollub RL et al. , “The MCIC collection: A shared repository of multi-modal, multi-site brain image data from a clinical investigation of schizophrenia,” Neuroinformatics, vol. 11, pp. 367–388, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [47].Power JD, Fair DA, Schlaggar BL, and Petersen SE, “The development of human functional brain networks,” Neuron, vol. 67, pp. 735–748, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [48].Xia M, Wang J, and He Y, “Brainnet viewer: A network visualization tool for human brain connectomics,” PLoS ONE, vol. 8(7), pp. 1–15, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].Benjamini Y and Hochberg Y, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society Series B, vol. 57, pp. 289–300, 1995. [Google Scholar]
  • [50].Naik RR et al. , “Polymorphism in tmem132d regulates expression and anxiety-related behavior through binding of RNA polymerase ii complex,” Translational psychiatry, vol. 8(1), p. 1, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Sanchez-Pulido L and Ponting CP, “TMEM132: an ancient architecture of cohesin and immunoglobulin domains define a new family of neural adhesion molecules,” Bioinformatics, vol. 34(5), pp. 721–724, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [52].Hass J et al. , “A genome-wide association study suggests novel loci associated with a schizophrenia-related brain-based phenotype,” PLoS One, vol. 8(6), p. e64872, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [53].Sanfilipo M et al. , “Volumetric measure of the frontal and temporal lobe regions in schizophrenia: relationship to negative symptoms,” Archives of general psychiatry, vol. 57(5), pp. 471–480, 2000. [DOI] [PubMed] [Google Scholar]
  • [54].Liu J et al. , “Single nucleotide polymorphisms in SLC19A1 and SLC25A9 are associated with childhood autism spectrum disorder in the chinese han population,” Journal of Molecular Neuroscience, vol. 62(2), pp. 262–267, 2017. [DOI] [PubMed] [Google Scholar]
  • [55].Main PA, Angley MT, Thomas P, ODoherty CE, and Fenech M, “Folate and methionine metabolism in autism: a systematic review,” The American journal of clinical nutrition, vol. 91(6), pp. 1598–1620, 2010. [DOI] [PubMed] [Google Scholar]
  • [56].O’Leary VB et al. , “Reduced folate carrier polymorphisms and neural tube defect risk,” Molecular genetics and metabolism, vol. 87(4), pp. 364–369, 2006. [DOI] [PubMed] [Google Scholar]
  • [57].Mahmuda N et al. , “A study of single nucleotide polymorphisms of the SLC19A1/RFC1 gene in subjects with autism spectrum disorder,” International journal of molecular sciences, vol. 17(5), p. 772, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [58].Michels S, Foss K, Park K, Golden-Grant K, Saneto R, Lopez J, and Mirzaa GM, “Mutations of KIF5C cause a neurodevelopmental disorder of infantile-onset epilepsy, absent language, and distinctive malformations of cortical development,” American Journal of Medical Genetics Part A, vol. 173(12), pp. 3127–3131, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [59].Schweizer N et al. , “Reduced vglut2/slc17a6 gene expression levels throughout the mouse subthalamic nucleus cause cell loss and structural disorganization followed by increased motor activity and decreased sugar consumption,” eNeuro, pp. ENEURO-0264, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [60].Südhof TC, “Synaptotagmins: why so many?” Journal of Biological Chemistry, vol. 277(10), pp. 7629–7632, 2002. [DOI] [PubMed] [Google Scholar]
  • [61].Schmitt A et al. , “Structural synaptic elements are differentially regulated in superior temporal cortex of schizophrenia patients,” European archives of psychiatry and clinical neuroscience, vol. 262(7), pp. 565–577, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [62].Goodbourn P, Bosten JM, Bargary G, Hogg RE, Lawrance-Owen AJ, and Mollon JD, “Variants in the 1q21 risk region are associated with a visual endophenotype of autism and schizophrenia,” Genes, Brain and Behavior, vol. 13(2), pp. 144–151, 2014. [DOI] [PubMed] [Google Scholar]
  • [63].Cannon TD, “How schizophrenia develops: cognitive and brain mechanisms underlying onset of psychosis,” Trends in cognitive sciences, vol. 19(12), pp. 744–756, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [64].Shi L, “Dock protein family in brain development and neurological disease,” Communicative and integrative biology, vol. 6(6), p. E26839, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [65].Sadaghiani S and D’Esposito M, “Functional characterization of the cingulo-opercular network in the maintenance of tonic alertness,” Cerebral Cortex, vol. 25(9), pp. 2763–2773, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [66].Gallinat J, Mulert C, Bajbouj M, Herrmann WM, Schunter J, Senkowski D, Moukhtieva R, Kronfeldt D, and Winterer G, “Frontal and temporal dysfunction of auditory stimulus processing in schizophrenia,” NeuroImage, vol. 17, pp. 110–127, 2002. [DOI] [PubMed] [Google Scholar]
  • [67].Takahashi T, Suzuki M, Zhou S-Y, Tanino R, Hagino H, Niu L, Kawasaki Y, Seto H, and Kurachi M, “Temporal lobe gray matter in schizophrenia spectrum: A volumetric {MRI} study of the fusiform gyrus, parahippocampal gyrus, and middle and inferior temporal gyri,” Schizophrenia Research, vol. 87, pp. 116 – 126, 2006. [DOI] [PubMed] [Google Scholar]
  • [68].Robert KY, Nakatani Y, and Yanagisawa M, “The role of glycosphingolipid metabolism in the developing brain,” Journal of lipid research, vol. 50(Supplement), pp. S440–S445, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [69].Robert KY, Nakatani Y, and Yanagisawa M, “Metabolic imaging in humans.” Topics in magnetic resonance imaging: TMRI, vol. 25(5), p. 223, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [70].Shulman GL, Fiez JA, Corbetta M, Buckner RL, Miezin FM, Raichle ME, and Petersen SE, “Common blood flow changes across visual tasks: Ii. decreases in cerebral cortex,” Journal of cognitive neuroscience, vol. 9(5), pp. 648–663, 1997. [DOI] [PubMed] [Google Scholar]
  • [71].Hu Y, Chen X, Gu H, and Yang Y, “Resting-state glutamate and gaba concentrations predict task-induced deactivation in the default mode network,” Journal of Neuroscience, vol. 33(47), pp. 18 566–18 573, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [72].Vogt BA, “Midcingulate cortex: structure, connections, homologies, functions and diseases,” Journal of chemical neuroanatomy, vol. 74, pp. 28–46, 2016. [DOI] [PubMed] [Google Scholar]
  • [73].Zhang A, Fang J, Calhoun V, and Wang Y-P, “High dimensional latent gaussian copula model for mixed data in imaging genetics,” ISBI 2018, pp. 105–109, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

append

RESOURCES