Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Nov 14.
Published in final edited form as: Neuroimage. 2021 Nov 22;245:118750. doi: 10.1016/j.neuroimage.2021.118750

Graph auto-encoding brain networks with applications to analyzing large-scale brain imaging datasets

Meimei Liu a, Zhengwu Zhang b, David B Dunson c,*
PMCID: PMC9659310  NIHMSID: NIHMS1770953  PMID: 34823023

Abstract

There has been a huge interest in studying human brain connectomes inferred from different imaging modalities and exploring their relationships with human traits, such as cognition. Brain connectomes are usually represented as networks, with nodes corresponding to different regions of interest (ROIs) and edges to connection strengths between ROIs. Due to the high-dimensionality and non-Euclidean nature of networks, it is challenging to depict their population distribution and relate them to human traits. Current approaches focus on summarizing the network using either pre-specified topological features or principal components analysis (PCA). In this paper, building on recent advances in deep learning, we develop a nonlinear latent factor model to characterize the population distribution of brain graphs and infer their relationships to human traits. We refer to our method as Graph AuTo-Encoding (GATE). We applied GATE to two large-scale brain imaging datasets, the Adolescent Brain Cognitive Development (ABCD) study and the Human Connectome Project (HCP) for adults, to study the structural brain connectome and its relationship with cognition. Numerical results demonstrate huge advantages of GATE over competitors in terms of prediction accuracy, statistical inference, and computing efficiency. We found that the structural connectome has a stronger association with a wide range of human cognitive traits than was apparent using previous approaches.

Keywords: Brain networks, Non-linear factor analysis, Graph CNN, Replicated networks, Variational auto-encoder

1. Introduction

Understanding the brain connectome and how it relates to human traits and various clinical variables has drawn huge attention (Craddock et al., 2013; Fornito et al., 2013; Jones et al., 2013; Park and Friston, 2013). This has motivated large neuroimaging studies with thousands of subjects, such as the UK Biobank (UKB) (Miller et al., 2016), the Adolescent Brain Cognitive Development (ABCD) study (Casey et al., 2018), and the Human Connectome Project (HCP) (Van Essen et al., 2013). Through these studies, there have been dramatic improvements in the ability to reconstruct brain connectomes thanks to advanced hardware (Glasser et al., 2016), novel image acquisition protocols (Glasser et al., 2016; Tuch, 2004), and new reconstruction algorithms (Smith et al., 2012; Zhang et al., 2018b). In this paper, we are particularly interested in diffusion magnetic resonance imaging (dMRI), which is a commonly used technique that measures the movement of water molecules along major fiber bundles in white matter (WM) fiber tracts, enabling reconstruction of individual-level microstructural brain networks delineating anatomical connections between brain regions. This paper aims at developing advanced analysis methods for the brain structural connectomes recovered from diffusion MRI data.

Let Ai represent the structural connectivity recovered from subject i, with element Ai[uv] measuring white matter connections between brain regions u and v. Using n individual brain networks, we are interested in (a) appropriately summarizing each individual brain network in a parsimonious manner, isolating unique features of the network without discarding valuable information, (b) inferring relationships between brain networks and human traits, and (c) characterizing variation across individuals in their network structure.

There are existing methods relevant to these goals. Based on a latent space characterization, Durante et al. (2017) proposed a random effects model to represent the population distribution of brain networks. Their approach clusters individuals based on brain structure and allows inferences on group differences (Durante and Dunson, 2018). Disadvantages include the highly computationally intensive implementation and coarse characterization of individual differences based on clustering. There is also literature on PCA-style approaches. One possibility is to simply stack the adjacency matrices Ai for individuals i=1,,n into a tensor, and then apply tensor PCA and its variants to get summary scores of networks (Zhang and Dunson, 2019; Zhang et al., 2019). These scores are treated as brain network surrogates in subsequent analyses, e.g., relating brain networks to human traits (Zhang et al., 2019). Tensor PCA is relatively efficient computationally while providing a simple low-dimensional summary of an individual’s brain structure, but it is linear, limiting the ability to represent brain networks parsimoniously. Other matrix-based approaches, including spatial independent component analysis (ICA), non-negative matrix factorization (NMF), and spatial sparse coding algorithms, are also widely used in the analysis of functional brain connectomes; see Beckmann et al. (2005), Xie et al. (2017), and references therein. An alternative is graph representations based on geometric deep learning. Kipf and Welling (2017), Hamilton et al. (2017) proposed graph convolutional networks (GCN) that use structure information in learning a low-dimensional feature representation for each node in a graph. Kawahara et al. (2017), Ktena et al. (2018) applied GCN to functional brain connectomes for classification and similarity ranking. Zhao et al. (2019) proposed a variational autoencoder-based Gaussian mixture model for functional brain connectomes classification. Advanced graph embeddings for structural brain connectomes are still lacking.

A major motivation of this article is to develop a non-linear latent factor modeling approach to (1) provide a characterization of the population distribution of brain graphs and (2) output low-dimensional features that can be used to summarize an individuals graph. Compared with the original high-dimensional adjacency metrics, the low-dimensional features of brain networks can further facilitate visualization, prediction, and inference on relationships between connectomes and human traits. With this motivation, we are particularly intrigued by deep neural networks for non-linear dimension reduction. Generative algorithms, such as Variational Auto-Encoders (VAEs) (Kingma and Welling, 2014; Rezende et al., 2014), have proven successful in representing images via low dimensional latent variables. VAEs model the population distribution of image data through a simple distribution for the latent variables combined with a complex non-linear mapping function. A key to the success of such methods is the use of convolutional operators to encode symmetries often present in images. However, structural brain networks have a fundamentally different geometric structure, and such methods cannot be employed directly.

We develop a model-based variational Graph Auto-Encoder (GATE) for brain connectome analysis. GATE consists of two components. The first component is a generative model that specifies how the latent variables zi give rise to the observations Ai through a non-linear mapping, parametrized by neural networks. The second component is an inference model that learns the inverse mapping from Ai to zi. Our main contributions can be summarized as follows.

First, GATE learns the embedding and the population distribution of brain connectomes simultaneously. This is achieved by: (1) a nonlinear latent factor model to obtain a low-dimensional representation zi of brain network Ai; and (2) a hierarchical generative model designed to learn the conditional distribution p(Aizi) so that one can accurately reconstruct the brain network from the latent embedding. We model each cell Ai[uv] in Ai using a latent space model (Hoff et al., 2002), with the latent coordinates of the regions u and v varying as a nonlinear function of the individual-specific features zi. This step involves a novel graph convolutional network that relies on the intrinsic locality of the brain networks to propagate node-specific k-nearest neighbor information.

Second, we extend GATE to relate human phenotypes to brain structural connectivity, which we refer to as regression with GATE (reGATE). reGATE is a supervised embedding method that simultaneously learns the population distribution of brain networks, network embeddings, and a predictive model for human traits. Although there has been some work integrating regression models and VAEs, the focus has been on multi-stage approaches; e.g., see Yoo et al. (2017) as an example. reGATE can generate from the population distribution of brain networks conditionally on the value of a human trait. This provides invaluable information about how traits and brain networks are associated while characterizing variation across individuals. We further draw inference on selected network summary measures of interest, such as network density and average path length, to understand how these properties are distributed depending on human traits.

We apply GATE and reGATE to brain connectomes from ABCD and HCP and find strong relationships between structural connectomes and cognition traits in both datasets. reGATE shows superior performance in predicting the relationship between cognition and brain connectomes, particularly when trained with data from large numbers of individuals. For example, using more than five thousand brain scans in the ABCD study, reGATE improves prediction of cognitive traits by 30% – 40% compared with existing competitors. Through detailed inference based on reGATE, we show that individuals with high cognitive traits tend to have denser connections between hemispheres, higher overall network density, and lower average path length. Such network summary measures have higher variability across the children evaluated in ABCD compared with adults in HCP.

2. Methods

2.1. Brain imaging datasets and structural connectome extraction

We focus on two large datasets in this paper: the Adolescent Brain Cognitive Development (ABCD) dataset and the Human Connectome Project (HCP) dataset.

ABCD dataset:

The ABCD study in the United States focuses on tracking brain development from childhood through adolescence to understand biological and environmental factors that can affect the brain’s developmental trajectory. The research consortium consists of 21 research sites across the country and invited 11,878 9–10-year-old children to participate. Researchers track their biological and behavioral development through adolescence into young adulthood. The dataset can be downloaded from NIH Data Archive (NDA, https://nda.nih.gov). The imaging protocol is harmonized for three types of 3T scanners: Siemens Prisma, General Electric (GE) 750, and Philips. We downloaded the structural T1 MRI and diffusion MRI (dMRI) data for 5252 subjects from the ABCD 2.0 release in NDA. The structural T1 images were acquired with an isotropic resolution of 1 mm3. The diffusion MRI images were obtained based on imaging parameters: 1.7 mm3 resolution, four different b-values (b = 500, 1000, 2000, 3000) and 96 diffusion directions. There are 6 directions at b = 500, 15 directions at b = 1000, 15 directions at b = 2000, and 60 directions at b = 3000. Multiband factor 3 is used for dMRI acceleration. See Casey et al. (2018) for more details about data acquisition and preprocessing of the ABCD data.

HCP dataset:

The HCP aims at characterizing human brain connectivity in about 1,200 healthy adults to enable detailed comparisons between brain circuits, behavior, and genetics at the level of individual subjects (Van Essen et al., 2012). Customized scanners were used to produce high-quality and consistent data to measure brain connectivity. The data containing various traits and MRI data can be easily accessed through https://db.humanconnectome.org/.

To obtain structural connectomes, we used a state-of-the-art dMRI data preprocessing framework – population-based structural connectome (PSC) mapping (Zhang et al., 2018a). PSC uses a reproducible probabilistic tractography algorithm (Girard et al., 2014; Maier-Hein et al., 2017) to generate whole-brain tractography. PSC borrows anatomical information from high-resolution T1 images to reduce bias in the reconstruction of tractography. We used the Desikan-Killiany atlas (Desikan et al., 2006) to define the brain regions of interest (ROIs) corresponding to the nodes in the structural connectivity network. The Desikan-Killiany parcellation has 68 cortical surface regions with 34 nodes in each hemisphere. For each pair of ROIs, we extracted the streamlines connecting them. In this process, several procedures were used to increase reproducibility: (1) each gray matter ROI is dilated to include a small portion of white matter region, (2) streamlines connecting multiple ROIs are cut into pieces so that we can extract the correct and complete pathway and (3) outlier streamlines are removed. We use the number of fibers connecting each pair of ROIs to summarize connectivity in our analyses.

For the ABCD dataset, we processed 5252 subjects using PSC. We focus our analyses on four cognitive traits: (a) picture vocabulary score, (b) oral reading recognition test score, (c) crystallized composite age-corrected standard score, and (d) cognition total composite score. The first row in Fig. 1 demonstrates the distribution of the four traits. Similarly, for the HCP dataset, we preprocessed 1065 subjects using PSC. We focused on the cognitive traits: (a) picture vocabulary test score, (b) oral reading recognition test score, (c) line orientation - total number correct, and (d) line orientation - total positions off for all trials.

Fig. 1.

Fig. 1.

Histograms of cognitive traits. The first row: ABCD study with 5252 subjects involved; from left to right, the traits include picture vocabulary score, oral reading recognition test score, crystallized composite age-corrected standard score, cognition total composite score. The second row: HCP study with 1065 subjects; from left to right: picture vocabulary test score, oral reading recognition test score, line orientation (LO) - total number correct, and line orientation (LO) - total positions off for all trials.

2.2. The graph auto-encoder model

The brain connectome for individual i is represented as a V × V symmetric adjacency matrix Ai, where Ai[uv] is the count of the number of fibers connecting regions u and v in the ith individual’s brain. Using the Desikan atlas, we choose V = 68 ROIs. We let

L(Ai)=(Ai1,,AiV(V1)/2)(Ai[21],Ai[31],,Ai[V1],Ai[32],,Ai[V2],,Ai[V(V1)])

denote the lower triangular elements of matrix Ai. We let yi denote the value of a cognitive trait for individual i. Our goal is to model the population distribution of the Ai’s and learn the relationship between cognitive traits yi and brain structural connectomes Ai.

2.2.1. Latent space model for brain connectomes

Latent space models (Hoff et al., 2002) provide a probabilistic framework that assumes the edges in the networks are conditionally independent given their corresponding edge probabilities, with these probabilities defined as a function of pairwise distances between the nodes in a latent space. Borrowing the conditional independence idea, we first introduce a general latent space model for brain connectomes. We assume that the number of fibers connecting brain regions are conditionally independent Poisson variables, given individual- and edge-specific rates

λi={λi1,,λiV(V1)/2},AiλiPoisson(λi), (1)

independently for each pair =1,, V (V − 1)/2 and i=1,,n. We assume log(λi) has the following factorization form:

log(λi)=γ+ψ(i), (2)
ψ(i)=r=1RαrXur(i)Xvr(i),for=[uv], (3)
andXr(i)=(X1r(i),,XVr(i)). (4)

As shown in (2), log(λi) is decomposed into two parts: a baseline parameter γ controlling connection strength between the th pair of brain regions, representing shared structure across individuals, and an individual deviation ψ(i). Taking into account symmetry constraints and excluding the diagonal elements, there are V (V − 1)/2 unknown {λil} for each subject, leading to a daunting dimensionality problem. To reduce dimensionality, Durante et al. (2017) proposed an SVD-type latent factorization, as shown in (3), where r=1,,R indexes the different latent dimensions, αr>0 is a weight on the importance of dimension r, and Xur(i) is the rth latent factor specific to brain region u and subject i. According to (3) - (4), if Xur(i) and Xvr(i) have the same sign and neither are close to zero, we have Xur(i)Xvr(i)>0 and there will be a positive increment on ψ(i) and hence on the expected number of fibers connecting regions u and v for subject i.

Model (1)(4) can flexibly characterize variability across individuals in brain connectivity, while accommodating the complexity of network structures within each individual. However, we face challenges in learning the latent representations using existing latent space models: (1) Non-linearity: graph data are generally non-Euclidean with complicated structures. Designing a model to efficiently capture the non-linear structure is difficult. (2) Sparsity: brain regions are not fully connected, particularly structural brain networks. (3) Speed: existing latent space approaches often rely on Markov chain Monte Carlo sampling, which is computationally intensive for high dimensional graphs. It is desirable to develop a fast non-linear factorization model. To address these challenges, we propose an autoencoder-based approach called Graph Autoencoder (GATE), from which we model the latent coordinates Xu(i) of brain regions as a non-linear function of a lower-dimensional vector zi, which serves as a low-dimensional representation of the ith brain network Ai.

2.2.2. The graph autoencoder (GATE) model

GATE relies on the variational autoencoder (VAE Kingma and Welling, 2014), which is a popular technique for non-linear dimension reduction. Denote ziK as a low-dimensional latent representation of the individual brain connectome Ai. GATE consists of two components. The first component is a generative model that specifies how the latent variables zi give rise to the observations Ai through a non-linear mapping, parametrized by neural networks. The second component is an inference model that learns the inverse mapping from Ai to zi. We frame our proposed Graph Autoencoder (GATE) method in the following context.

2.2.3. Generative model

For each subject i, we assume Ai for =1,, V (V − 1)/2 are conditionally independent given the latent representation ziK. Therefore, the likelihood of L(Ai) is

pθ(L(Ai)=aizi)==1V(V1)/2pθ(Ai=aizi). (5)

pθ(Aizi) is a generative model for the weighted adjacency matrix Ai given the latent zi with zip(z). We define p(z)=N(0,IK), representing all connectomes in the same Gaussian latent space.

We learn the mapping from the Gaussian latent space to the complex observation distribution in (5) by a hierarchical model equipped with parameters θ. Specifically, we assume the observations Ai arise from the following generative process:

ziN(0,IK),AiziPoisson(λi(zi)), (6)

where the Poisson rate parameter λi(zi) is modeled as a nonlinear function of zi according to:

λi(zi)=exp(γ+ψ(zi)), (7)
ψ(zi)=r=1RαrXur(zi)Xvr(zi),for=[uv], (8)
Xr(zi)=(X1r(zi),,XVr(zi))=gr(zi), (9)

where gr():KV is a nonlinear mapping from zi to the rth latent factor of the brain regions Xr, parameterized by deep neural networks with parameters θ for r=1,,R.

Denote X(zi)=(X1(zi),,XR(zi))V×R. The uth row (Xu1(zi),,XuR(zi)) represents the latent features of brain region u𝒱 for individual i. A relatively large positive value for the cross product between the uth and vth rows implies a relatively high connection strength between these brain regions. The nonlinear mapping {gr()}(r=1,,R) characterizes the latent embedding of brain regions that is determined via the local collaborative patterns among brain regions.

To take into account the intrinsic locality of structural brain networks, we propose a novel graph convolutional network (GCN) to learn each region’s representation by propagating node-specific k-nearest neighbor information. Information in nodes that are closer to each other will be pooled together in GCN. The intrinsic locality refers to the relative distance between brain regions measured through the length of white matter fiber tracts connecting them. We extract this information from brain imaging tractography and store it in a matrix BV×V, where Buv is the averaged length of fiber tracts between region u and v, Buv = Bυu, Buu = 0, and we set Buv = ∞ if there are no fibers between them. For each region u, we define its k-nearest neighbors (k-NN (u)) as the k ROIs closest to u according to our notion of distance, and denote the region itself as its 0-NN. If a region u has less than k direct neighbors, we will include all the regions v satisfying Buv(0,) as its neighbors. In practice, we choose the average degree of nodes as the number of neighbors for simplicity since it measures the average number of collaborations among nodes.

To learn the r-latent coordinate gr (zi) for subject i, the key idea is to consider each column Xr (zi) as an “image ” with each region as an irregular pixel; we have R such “images ” for each individual. Convolutional neural networks (CNN) are highly effective architectures in image and audio recognition tasks (Hinton et al., 2012; Krizhevsky et al., 2012; Sermanet et al., 2012), thanks to their ability to exploit the local translational invariance structures over their domain. Considering the unique features of the brain connectome networks, we generalize the CNN and define appropriate graph convolutions to learn the nonlinear mapping {gr()} via exploiting the local collaborative pattern among brain regions. In particular, we define an M-layer GCN as follows:

Xr(i,1)=h1(W(r,1)zi+b1), (10)
Xr(i,m)=hm(W(r,m)Xr(i,m1)+bm)for2mM, (11)

where Xr(i,m) denotes the output of the mth layer of the convolutional neural network, hm() is an activation function for the mth layer, and W(r,m) is a weight matrix characterizing the convolutional operator at this layer. We denote the parameters bm, W (r,m), together with γ, αr in (7) - (8) (m=1,,M, r=1,,R, =1,, V(V − 1)/2)) as the model parameter θ. The activation functions {hm()} can be chosen from the following candidates based on performance: (1) Rectified linear unit (ReLU) function, which is widely used (Goodfellow et al., 2016) in deep neural networks, with the definition as ReLU(x) = max(0, x), where the max operation is applied element-wise; (2) Sigmoid function defined as hm(x)=11+ex(0,1); (3) linear or identity function hm(x)=ax with a0.

For m = 1, W(r,1)V×K maps the latent representation ziK to the latent space Xr(i,1)V×1; for m2,W(r,m) is a V × V weight matrix with the uth row wu(r,m) satisfying wuv(r,m)>0 if v = u or vkrNN(u), and = 0 otherwise. (11) implies that the embedding feature of each region at the mth layer is determined by the weighted sum of itself and its nearest neighbor regions at the (m − 1)th layer, and the related weights aim to characterize the region-specific local connectivity. For r=1,,R, we can choose different values of k to define its kr-NN to fully explore the possible collaboration pattern among brain regions.

Figure 2 shows how a three-layer GCN learns Xr (zi) via a 2-NN GCN. First, we initialize the latent feature for each region as xur(i,1) based on (10). Then, we construct a “graph ” based on the fiber length in B: each region is assigned to connect with 2 nearest neighbors at most according to the fiber length to other brain regions. This information is reflected in W(i,m), whose rows contain at most three non-zero elements (one at the diagonal and two off the diagonal). Next, we update the latent feature of each region in the next layer based on a sum of reweighted features from its 2 nearest neighbors and itself.

Fig. 2.

Fig. 2.

Illustrative example of three-layer GCN architecture with 2-NN filters. For example, to learn the rth latent coordinate for node 1, the 2-NN is node 2 and node 3. After input zi, the first layer embedding is x1r(i,1)=h1(w11(r,1)zi), then the second layer embedding is x1r(i,2)=h2(w11(r,2)x1r(i,1)+w12(r,2)x2r(i,1)+w13(r,2)x3r(i,1)), and the third layer embedding is x1r(i,3)=h3(w11(r,3)x1r(i,2)+w12(r,3)x2r(i,2)+w13(r,3)x3r(i,2)), with the output x1r(i,3) as the rth latent coordinate for node 1.

We collect all the parameters (γ,αr,bm,W(r,m)) for =1,, V(V − 1)/2, r=1,,R, m=1,,M as θ. In the following Section 2.2.4, we show how to use variational inference to learn θ.

2.2.4. Variational inference and GATE learning

To train and evaluate the deep generative model in (6), we need to estimate θ, the parameters characterizing the mapping from zi to Ail and pθ(ziAi), the posterior distribution of the latent variable. By applying Bayes’ rule, we have the posterior as

pθ(ziAi)=pθ(Aizi)p(zi)pθ(Ai).

Since the likelihood function pθ(Aizi) is parameterized via the neural network with non-linear transformations, both the marginal distribution pθ(Ai) and the posterior probability distribution pθ(ziAi) are intractable. Hence, we resort to variational inference (VI) (Jordan et al., 1999; Hoffman et al., 2013), a widely-used tool for approximating intractable posterior distributions. VI seeks a simple distribution qϕ(ziAi) parameterized by ϕ from a variational family, e.g., a Gaussian distribution family, that best approximates pθ(ziAi). We call such qϕ(ziAi) as the probabilistic encoder, which maps the input Ai to a low dimensional latent representation zi. The approximated posterior qϕ(ziAi) should be close to pθ(ziAi). We use Kullback-Leibler (KL) divergence to quantify the separation between these two distributions, which is defined as DKL(QP)=EzQlogQ(z)P(z), measuring how much information is lost if the distribution Q is used to represent P. We choose

qϕ(ziAi)N(μϕ(Ai),diag{σϕ2(Ai)}), (12)

i.e., a fully factorized (diagonal covariance) Gaussian distribution, to facilitate computation. We design deep neural networks to learn the parameters in μϕ and σϕ2, and denote the parameters involved in deep neural networks as ϕ. The details are in Supplementary S.1.

Our objective is to maximize the observed data log-likelihood log pθ(Ai), and also minimize the difference between the true posterior pθ(ziAi) and approximated posterior distribution qϕ(ziAi). We express the above objective as

logpθ(Ai)DKL(qϕ(ziAi)pθ(ziAi))=Eqϕ(ziAi)[logpθ(Aizi)]DKL(qϕ(ziAi)pθ(zi)):=𝓛(Ai;θ,ϕ), (13)

where detailed calculation of (13) can be found in Supplementary S.2. Since DKL(qϕ(ziAi)pθ(ziAi)) is nonnegative, 𝓛(Ai;θ,ϕ) can be viewed as a lower bound on the marginal log-likelihood, referred to as the evidence lower bound (ELBO), which is a function of both θ and ϕ. Therefore, the training objective is minimizing the negative of the ELBO, i.e., minimizing

𝓛(Ai;θ,ϕ)=Eqϕ(ziAi)[logpθ(Aizi)]+DKL(qϕ(ziAi)p(zi)). (14)

(Ai;θ,ϕ) consists of two parts: the first term is the reconstruction error, measuring how well the model can reconstruct Ai; while the second term, defined as the KL divergence of the approximate posterior from the prior, is a regularizer that pushes qϕ(ziAi) to be as close as possible to its prior N(0, IK).

In practice, the expectation in the ELBO (14) is intractable. To address this, we employ Monte Carlo variational inference (Kingma and Welling, 2014) by approximating the troublesome expectation with samples of the latent variables from the variational distribution ziqϕ(ziAi). Particularly, we form the Monte Carlo estimates of the expectation as

Eqϕ(ziAi)[logpθ(Aizi)]1L=1Llogpθ(Aizi),

where zi is sampled with the reparametrization trick: sampling εiN(0,IK) and reparametrizing zi=μϕ(Ai)+εiΣϕ(Ai), where Σϕ(Ai)=diag{σϕ2(Ai)}. A simple calculation shows that the KL-divergence DKL(qϕ(ziAi)p(zi))=12k=1K(μk2+σk21log(σk2)), where μk and σk are the kth element of μϕ(Ai) and Σϕ(Ai) respectively. Therefore, the ELBO in (14) can be approximated as

𝓛(Ai;θ,ϕ)𝓛˜(Ai;θ,ϕ)=1L=1Llogpθ(Aizi)+12k=1K(μk2+σk21log(σk2)),

which is differentiable with respect to θ and ϕ. Then, given n observed networks, we can construct an estimator of the ELBO of the full dataset, based on the minibatches nmi=1m𝓛˜(A(i);θ,ϕ), where {A(i)}i=1m is a randomly drawn sample of size m from the full observed data with sample size n. Viewing nmi=1m𝓛˜(A(i);θ,ϕ) as the objective, we implement a stochastic variational Bayesian algorithm to optimize θ and ϕ, respectively. Figure 3 shows the graphical diagram of the GATE approach; Algorithm 1 summarizes the GATE training procedure. Once the GATE model is learned, we can do the following: (1) have a low-dimensional representation for each individual network; (2) generate brain networks to learn the population distribution of brain connectomes and features of these connectomes.

Fig. 3.

Fig. 3.

A taxonomy of GATE. The encoder step is based on Eq. (12) to map Ai to zi. The decoder step is based on Eq. (6)-(9) to map zi to Ai through a latent space model with node embedding X(i)(zi) learned from a Graph CNN.

Table 1.

Algorithm 1: Training GATE model using gradients.

Input: {Ai}i=1n, {zi}i=1n, geometric matrix B, latent space dimension R.

Randomly initialize θ, ϕ
while not converged do
 Sample a batch of {Ai} with mini-batch size m, denote as 𝓐m.
  for all Ai𝓐m do
   Sample εiN(0,IR), and computezi=μϕ(Ai)+εiΣϕ(Ai).
   Compute the gradients θ𝓛˜(Ai;θ,ϕ) and ϕ𝓛˜(Ai;θ,ϕ) with zi.
  Average the gradients across the batch.
 Update θ, ϕ using gradients of θ, ϕ.
Return θ, ϕ.

2.3. Regression with GATE and inference

Relating brain connectomes with traits.

In addition to finding low-dimensional representations of brain structure networks, we are also interested in inferring the relationship between brain networks and human traits, such as cognition. With this goal in mind, we develop a supervised version of GATE, referred to as regression GATE (reGATE). Let yi be a trait of the ith subject. We first express the joint log likelihood of (Ai, yi) as

logpθ(Ai,yi)=𝓛(Ai,yi;θ,ϕ)+DKL(qϕ(ziAi)pθ(ziyi,Ai)), (15)

where

𝓛(Ai,yi;θ,ϕ)=Eqϕ(ziAi)logpθ(yizi)+Eqϕ(ziAi)logpθ(Aizi)DKL(qϕ(ziAi)pθ(zi))

is called the ELBO of log pθ(Ai,yi). We assume the human trait yi and the brain connectivity Ai are conditionally independent given the latent representation zi for the ith subject, and show the derivation of (15) in Supplementary S.3. In (15), we divide the log-likelihood of (Ai, yi) into two parts: the ELBO denoted as 𝓛(Ai,yi;θ,ϕ), and the non-negative KL-divergence between qϕ(ziAi) and pθ(ziyi,Ai). Different from the unsupervised ELBO in (13), (15) can be considered as a supervised ELBO with an extra term pθ(yizi) that essentially formulates a regression of yi with respect to zi. Here we consider yi as a continuous random variable, and set pθ(yizi) as a univariate Gaussian, i.e., pθ(yizi)N(ziβ+b,σ2), where β, σ2θ are parameters to be learned. Figure 4 shows the flowchart of the reGATE architecture.

Fig. 4.

Fig. 4.

reGATE to predict human traits. The encoder step is based on Eq. (12) to map Ai to zi.The decoder step maps zi to Ai based on Eq. (6)-(9); and a regression of yi with respect to zi.

Similarly to Section 2.2.4, we form the Monte Carlo estimate of 𝓛(Ai,yi;θ,ϕ) and estimate θ, ϕ, β, b following the stochastic variational Bayesian Algorithm 1 by replacing 𝓛(Ai;θ,ϕ) with 𝓛(Ai,yi;θ,ϕ). We show the detailed sampling steps in Supplementary S.4. For the trained reGATE model, we have: (1) a low-dimensional representation for each individual network; (2) human trait prediction for each individual network; (3) ability to generate brain networks for inference on how features of the networks vary across individuals and with traits.

Conditional generative model:

We are interested in inferring how brain networks vary across levels of a trait. For example, if yi measures a person’s memory ability, we would like to study differences in the distribution of brain networks between people with good and bad memory skills. To address questions of this type, we generate samples from the posterior distribution of Ai given particular yi values using Gibbs sampling: sample zi from pθ(ziyi), then sample Ai from pθ(Aizi). The conditional pθ(Aizi) is learned while implementing reGATE. The posterior distribution of zi given yi can be expressed as pθ(ziyi)pθ(yizi)pθ(zi), where pθ(zi)N(0,IK) and pθ(yizi)N(ziβ+b,σ2). Therefore, we have pθ(ziyi)N(μz(yi),Σz(yi)), where

μz(yi)=(IK+ββ/σ2)1β(yib)/σ2,andΣz(yi)=(IK+ββ/σ2)1; (16)

the derivation of (16) is in Supplementary S.5.

The latent representation zi is unidentifiable in VAE since the log-likelihood and ELBO are rotationally invariant for zi. For example, letting z˜i=Uzi, then PU,θ(Ai)=Pθ(Ai) and

DKL(qU,ϕ(ziAi)pU,θ(ziAi))=DKL(qϕ(ziAi)pθ(ziAi)),

where U is an orthogonal matrix, qU,ϕ() and pU,θ() are defined by replacing zi with z˜i in qϕ() and pθ(). Rotational invariance can be solved by post-processing to rotationally align the zi’s. However, this would only be necessary if one is attempting to compare zis from different datasets or analyses of a given dataset. Within an analysis, the main focus is on inference on the relative values of zis, and these relative values are well defined. In addition, when the focus is on relating brain structure to human traits or in predicting traits based on brain structure or vice versa, the non-identifiability issue does not present a problem.

3. Simulation study

We conduct a simulation study to evaluate the performance of GATE and reGATE on a broad application in graph-value data. We simulate different types of random graphs using the Python package NetworkX. Particularly, we consider four network structures: sparse networks according to the model in Johnson (1977), community structures under the model of Nowicki and Snijders (2001), small-worldness from the model in Watts and Strogatz (1998), and scale-free property from the model in Barabási and Albert (1999). We simulate 100 networks with V = 68 nodes for each type by sampling their edges from conditional independent Bernoulli random variables given their corresponding structure-specific edge probability. Each structure-specific edge probability vector is carefully constructed to assign a high probability to a subset of network configurations characterized by a specific property. Figure 5 displays some example networks we generated with the four different network structures.

Fig. 5.

Fig. 5.

The edge probability vectors (rearranged in matrix form) from four network structures: (a) sparse graph, (b) community structure, (c) small world, (d) scale free.

We first generate yi according to yi=αAiα+ϵi, where α=(1,...,117,0,...,0)68, and ϵiN(0,1). We then standardize yi, so that it ranges from −1.5 to 2.0. These settings aim to generate separable yi’s according to the topological structures of Ai. The histograms in Fig. 7 clearly demonstrate how yi varies for different network structures. Our goals in this simulation study include (1) learning the latent representation under both GATE and reGATE; (2) inferring how the network connectivity structure varies with yi; (3) assessing the predictive performance of the reGATE model.

Fig. 7.

Fig. 7.

First row: generated networks conditional on the specific yi using reGATE, corresponding to yi = −1. 5, −0. 1, 2 respectively from left to right. Second row: histogram of yi with respect to the network structure; the x-axis is the value of yi, the y-axis is the frequency that belongs to a specific structure in the training data.

We train GATE and reGATE to obtain low-dimensional representations zi‘s. Specifically, for the n simulated networks, we have z¯1=(E(pθ(z1A1)),,z¯n=E(pθ(znAn))) as the posterior means after training the GATE model. We then conduct principal component analysis (PCA) analysis on the posterior means {z¯1,,z¯n} and plot each z¯i using their first two PC scores in 2 colored according to the corresponding y value. We can clearly observe the separation between these four types of networks in the low-dimensional representation space inferred using both GATE and reGATE, with reGATE yielding greater separation across the groups.

To infer how brain networks Ai vary according to yi, we simulate networks from the posterior distribution of Aiyi according to the conditional generative model in Section 2.3. Specifically, we first sample ziyi based on Eq. (16) with the parameters obtained from the previously trained reGATE and yi ranging from −1. 5 to 2, then generate networks via pθ(Aizi). Figure 7 shows the generated networks with structure varying for different yis. We can clearly observe that the network shows a sparse structure when yi = −1. 5, a mixture of community and small-world structure when yi = −0. 1, and a clear scale-free structure when yi = 2. These generated structures are consistent with the ground truth in our simulation settings.

To address our third goal, we evaluate the predictive accuracy of reGATE. We consider two cases: (1) yi=αAiα+ϵi; (2) yi=(αAiα)2+(αAiα)3+ϵi. We also compare reGATE with a few popular methods in the literature for predicting human traits using network data. The first method is a regular linear regression based on tensor network principal component analysis (LR-TNPCA) in Zhang et al. (2019). The second method is linear regression based on a regular PCA applied to the vectorized networks. The third method is tensor regression proposed in Zhou et al. (2013), denoted as CPR here. The mean square error (MSE) from five-fold cross-validation was used to compare different approaches. Figure 6 (c) shows the association between the predicted value and the true value under reGATE in case 1. The first and second rows in Table 2 show the MSE under different methods. We can see reGATE outperforms other methods in predictive accuracy. The third row in Table 2 reports the computing time with 100 replicated simulations. reGATE is both fast and accurate based on these results.

Fig. 6.

Fig. 6.

Plot of the first 2 PC scores of the posterior mean of ziAi with corresponding yi. Different colors refer to various values of y. (a) GATE, (b) reGATE, (c) Predicted y v.s. true y in reGATE, where x-axis is the true value and y-axis is the predicted value.

Table 2.

The first row and the second row are the MSE under different methods. All numbers are calculated based on the mean of 100 replicated simulations.

reGATE LR-TNPCA LR-PCA CPR
MSE: case 1 0.0252 0.0271 0.0340 0.0388
MSE: case 2 0.0505 0.0593 0.0700 0.1078
Time (mins) 17.5 197.6 5.3 253.7

We summarize the computing details used in the simulation study here. We run stochastic gradient descent with momentum (the Adam algorithm in Kingma and Ba (2015)) on GATE and reGATE with the learning rate 0.001 on one NVIDIA Titan-V GPU. In GATE, we use a batch size of 128, sampled uniformly at random at each epoch and repeated for 1000 epochs. In reGATE, we use 5-fold CV to calculate the MSE and run 200 epochs with batch size of 128 for each training dataset. The experimental details and network architectures for the inference model and generative model training are summarized in Table 3. The latent dimension K is chosen as the smallest value that achieves the minimal training loss, provided that the network architectures are fixed as in Table 3. Figure 8 (a) shows the log-training loss with latent dimensions K varying from 5 to 100, and the training loss achieves the best performance when K is around 45. The sensitivity of the training loss with respect to the random initialization is also explored. As shown in Fig. 8 (b), under different initializations, the paths of the training loss converge to the same level as the number of epochs increases. All code used to produce the results and figures is available online via GitHub (https://github.com/meimeiliu/GATE).

Table 3.

Experimental details and network architectures. K is the dimension of zi, N is the number of layers in the inference network, M is the number of layers in GCN, and R is the dimension of X(i).

Inference model (μϕ, σϕ)
Generative model
μϕ(N=2) σϕ(N=2) setting activation
GATE/reGATE (K = 45) W1,μ : 45 * 400 W1,σ : 45 * 400 k-NN: 16 h1: Sigmoid
W2,μ : 400 * 45 W2,σ : 400 * 45 M = 2 h2 : Sigmoid
b1,μ : 400 * 1 b1,σ : 400 * 1 R = 5
b2,μ : 45 * 1 b2,σ : 45 * 1
φ1,μ = ReLu φ1,σ = ReLu
φ2,μ = Linear φ2,σ = Linear

Fig. 8.

Fig. 8.

(a) Log-training loss with latent dimension K varies; (b) the paths of the log-training loss with random initializations.

4. Applications to the ABCD and HCP data

We apply our method to both the ABCD and HCP datasets described in Section 2.1 to examine the relationship between structural brain networks and cognition for adolescents and young adults. The ABCD study uses a reliable and well-validated battery of measures that assess a wide range of human functions, including cognition. The core of this battery is comprised of the tools and methods developed by the NIH Toolbox for assessment of neurological and behavioral function (Gershon et al., 2013). The Toolbox includes measures of cognitive, emotional, motor, and sensory processes. Since we are particularly interested in cognition, we extract four cognition related measures as y from ABCD, including

  1. Picture vocabulary test: the picture vocabulary test uses an audio recording of words, presented with four photographic images on the computer screen. Participants are asked to select the picture that best matches the meaning of the word.

  2. Oral reading recognition test: participants on this test are asked to read and pronounce letters and words as accurately as possible.

  3. Crystallized composite score: crystallized cognition composite can be interpreted as a global assessment of verbal reasoning. We use the age-corrected standard score.

  4. The cognition total composite score: this composite score measures overall cognition and is obtained from a factor analysis (Heaton et al., 2014).

We extracted two matched cognitive measures from HCP: picture vocabulary test score and oral reading recognition test. In addition, we add two more cognitive measures in our data analysis: total number correct answers and total positions off in a line orientation test. A more detailed description of these traits can be found in Gershon et al. (2013).

4.1. Visualization: show network data in low-dimensional space

Both GATE and reGATE output low-dimensional representations of the brain networks. We can visualize the latent features of each individual’s connectome and examine the relationship between structural connectivity and the four traits via the latent features. We train GATE on 5252 brain networks extracted from the ABCD dataset to obtain low-dimensional representations zi for i=1,,5252. We then plot the posterior mean of ziAi using t-SNE (Van der Maaten and Hinton, 2008) colored with its corresponding trait score in 3. We show 200 subjects’ data for each cognition trait, with the first 100 subjects having the lowest trait scores and the second 100 subjects having the highest scores. As shown in Fig. 9 (a)(d), under both GATE and reGATE, we obtain a large separation between the two groups of subjects, indicating that brain connection patterns are different for these groups. reGATE has better performance since we incorporate the trait information in learning the zis. A similar analysis is conducted using the HCP data, and the result is shown in Figure 16 in the supplement.

Fig. 9.

Fig. 9.

Visualization of low dimensional embedding zi learned from GATE and reGATE applied to the ABCD study. We display 100 subjects with the lowest trait scores and 100 subjects with the highest scores for each trait. Colors represent the trait scores.

4.2. Prediction: predict traits with reGATE

In this section, we demonstrate the predictive ability of reGATE and compare it with several competitors: LR-TNPCA (Zhang et al., 2019), LR-PCA, CPR (Zhou et al., 2013), and BLR (Wang et al., 2019). BLR is supervised bi-linear regression (BLR) with emphasis on signal sub-network selection. To verify the role played by the latent space model, we further compare our approach with a simplified reGATE without considering the node-level latent space structure. Specifically, we simplify the generative model as ziN(0,IK), and AiziPoisson(λil(zi)), where λil(zi) is directly learned via a fully connected neural network without the latent space setting in Eqs. (7)(9). We denote this approach as S-reGATE.

MSE from five-fold cross-validation is used to assess performance. Table 4 shows results for ABCD and HCP datasets. reGATE significantly outperforms other methods in both datasets for all traits. S-reGATE has a better prediction performance in ABCD data but cannot compete with BLR in the HCP dataset. Hence, the proposed latent space model does indeed improve prediction; interpretation is also improved relative to S-reGATE, which does not capture some aspects of variation in the brain networks.

Table 4.

Comparison of prediction results of different methods in both the ABCD and HCP datasets. Each cell shows the MSE and correlation (in the parenthesis) between the observed and predicted measures via five-fold CV.

reGATE S-reGATE LR-TNPCA LR-PCA CPR BLR
ABCD (n = 5252)
Pic Voc 186.8 (0.40) 212.5 (0.27) 280.1 (0.22) 285.2 (0.20) 285.1 (0.23) 200.4 (0.29)
Oral Reading 206.5 (0.41) 224.8 (0.32) 342.3 (0.19) 346.5 (0.14) 357.0 (0.13) 337.1 (0.19)
Cryst Comp 279.4 (0.39) 304.5 (0.31) 315.8 (0.23) 322.9 (0.22) 318.4 (0.19) 321.3 (0.30)
CogTot Comp 248.0 (0.38) 255.6 (0.35) 297.6 (0.17) 308.6 (0.26) 306.7 (0.22) 290.2 (0.32)
HCP (n = 1065)
Pic Voc 209.1 (0.28) 214.4 (0.25) 214.5 (0.25) 219.0 (0.21) 257.2 (0.17) 216.1 (0.23)
Oral Reading 198.9 (0.26) 205.7 (0.22) 202.1 (0.22) 204.2 (0.21) 252.1 (0.19) 200.4 (0.26)
LO: correct number 17.8 (0.27) 20.4 (0.20) 18.5 (0.23) 18.2 (0.22) 21.9 (0.19) 18.8 (0.18)
LO: positions off 195.8 (0.27) 203.5 (0.25) 201.8 (0.22) 202.6 (0.24) 259.0 (0.19) 200.7 (0.27)

With the bigger sample size (n = 5252) in the ABCD data, the improvements of reGATE are more significant than those for the HCP data. It is well known that deep neural networks tend to perform exceptionally well for large training sample sizes.

We also calculate the percentage of MSE improvement upon a baseline model, the sample means y¯, for the prediction result from the ABCD and HCP data. As shown in Fig. 10 (a.1)(a.4), most methods in ABCD study demonstrate an improved MSE compared with the sample mean y¯, indicating that there is a detectable relationship between the structural connectome and cognitive traits. However, reGATE achieves stable and significant prediction improvements ranging from 30% to 40%, while the competitors’ performance improvements fluctuate from −5% to 5%. From Fig. 10 (b.1)(b.4), we see reGATE still has the best performance among other competitors.

Fig. 10.

Fig. 10.

Percentage of the MSE improvement compared with a baseline method of using y¯ as the predicted trait. (a.1)–(a.4): ABCD with picture vocabulary score, oral reading recognition test score, crystallized composite age-corrected standard score, and cognition total composite score. (b.1)–(b.4): HCP with picture vocabulary score, oral reading recognition test score, LO-correct number, LO-positions off. The X-axis is the improved proportion; the y-axis marks different methods.

Besides MSE, we evaluate the correlation between the predicted value y^i and the observed value yi via five-fold cross-validation for different approaches. Correlation is reported in the parenthesis in Table 4. We can see that reGATE significantly improves the prediction by increasing the correlation from around 0.2 to 0.4.

4.3. Inference: generate brain connectomes conditioned on traits

An appealing characteristic of reGATE is the ability to generate brain networks for new individuals conditionally on their trait value. This facilitates inference on how brain networks, and their topological properties, change across different levels of a trait and how this dependence varies across individuals. In our first experiment, we assess the performance of GATE in characterizing the observed brain network data via posterior predictive checks (Gelman et al., 2013) for relevant network topological properties, including network density, mean eigencentrality, average path length, and average degree. Denote ηk=gk{L(𝓐)} as the random variable associated with the kth network summary measure, for k=1,2,3,4, where 𝓐 represents the brain connectomes. Then we calculate the posterior predictive distributions for these summary measures based on the generative model pθ(𝓐z) in Section 2.2.3 as pηk(ηz)=aAv:gk(a)=ηpθ(𝓐=az), for k=1,2,3,4, respectively. Figure 11 compares the network summary measures computed using all 1065 subjects from the HCP data (white-colored) and the generated network data (gray-colored) from GATE. GATE achieves good performance in characterizing the observed network summary measures.

Fig. 11.

Fig. 11.

HCP: Goodness-of-fit assessments for selected network summary measures. The violin plots summarize the distribution in the observed data (white) and the posterior predictive distribution arising from GATE (gray).

We next consider the conditional predictive distribution of network topological summaries given trait scores, i.e., ηky. We consider network density and average path length for both the ABCD and HCP data and focus on different levels of picture vocabulary test score as y. The distribution of ηky can be expressed as pηk(ηy)=aAv:gk(a)=ηpθ(𝓐=ay), for k=1,2, respectively, where pθ(𝓐=ay) can be obtained via the conditional generative model in Section 2.3. More specifically, conditional on each particular picture vocabulary test score level, we first generate 500 networks based on the well-trained reGATE model. Next, we calculate the network summaries for each generated network, and then calculate the mean and confidence bands based on 500 calculated network summaries. The first row in Fig. 12 is the constructed confidence band of network density and average path length versus y based on the generated networks via reGATE. As shown in the first row in Fig. 12, the network density increases, and the average path length decreases as the picture vocabulary score increases. There is more variability in network topological summaries for the adolescent subjects in ABCD than for the adults in HCP.

Fig. 12.

Fig. 12.

Comparison between generated brain networks and the observed brain networks for different trait levels in network density and average path length for ABCD and HCP datasets. First row for ABCD and HCP: posterior predictive network summaries with confidence bands for network density and average path length. The x-axis is the picture vocabulary test score; the y-axis corresponds to network density and average path length, respectively. The confidence bands are calculated based on 500 generated network summaries conditional on different picture vocabulary test scores. Second row: scatter plot and the fitted curve for different trait levels in network density and average path length for the original ABCD and HCP datasets.

It is challenging to conduct a similar inference on the real data (ABCD and HCP data) due to the limited observed brain networks for a specific picture vocabulary test score. Instead, we show the scatter plots of network density and average path length versus y on the real datasets in the second row of Fig. 12. The blue curves are the fitted mean trajectory. It can be seen that the overall trends are similar: the network density is increasing as y increases, while the average path length is decreasing as y increases.

We next explore how the brain network varies across different levels of trait y. We consider three levels that range from the minimum to the maximum for the oral reading recognition score in the HCP and generate multiple networks for each trait value. In particular, we consider the oral reading recognition score y = 60, 91 and 138, and generate brain networks according to the conditional generative procedure in Section 2.3. If we set y = 60, then the posterior pθ(Ay=60) indicates the distribution of the brain networks for people with oral reading recognition test equal to 60. A mean network is used to summarize the generated network data for a given y, and we further dichotomize the connectome to {0,1} depending on whether a connection exists for better visualization. The results are shown in Fig. 13, where Fig. 13 (a.1) shows the histogram of the observed trait scores, and Fig. 13 (a.2)(a.4) shows the generated mean networks (with the first 34 nodes from the left side of the brain and the next 34 nodes from the right side of the brain) for different y. We further compare the generated mean network with the observed network in HCP. From the observed HCP, we have one subject for y = 60, ten subjects with y = 91, and two subjects for y = 138. There are two issues in empirically estimating the posterior mean: (1) for different y, we have different numbers of observations; (2) we have very little data to estimate a high-dimensional object. These two issues make the estimation directly from observed brain connectomes less credible. Fig. 13 (b.1)(b.3) shows the mean connectome estimated from the observed HCP data. From the result in the first row, more connections between the brain’s two hemispheres are correlated with better reading ability. This result is consistent with findings in the literature (Bullmore and Sporns, 2009; Durante and Dunson, 2018). From the result of real data, we also observe such a trend, but not so obvious compared with the first row generated from reGATE.

Fig. 13.

Fig. 13.

Examples on generating conditional brain networks Ay for oral reading recognition test in HCP dataset; and comparison between generated brain networks and the observed brain networks for different y in oral reading recognition test for HCP. The first row (a.1) shows histograms of observed trait scores for the 1065 individuals in HCP (the x-axis shows the trait value, and the y-axis shows the frequency). (a.2)-(a.4) show the means of generated brain networks (after dichotomization) conditional on different y under the reGATE model (with the x-axis and the y-axis indexing brain regions; the Supplement contains a table with descriptions of each ROI). (b.1)-(b.3) show the observed brain networks for different y in HCP (with the x-axis and the y-axis indexing brain regions).

To enhance the above analysis, we reduce the number of levels and select only the 10%th and 90%th quantiles from the observed y as two representative levels. For each level, we generate 100 conditional networks Aiyi and calculate the mean difference between the networks in the high and low trait groups. Figure 14 plots the top 50 connections (based on absolute values) in the mean difference network for the picture vocabulary test for both HCP and ABCD. This procedure is done separately for the two datasets. The 50 connections are further separated into positive and negative connections and are plotted in separate panels in Fig. 14. After ranking the connections by absolute values, these connections are dominated by positive ones, indicating that better vocabulary ability is associated with more connections.

Fig. 14.

Fig. 14.

(a)-(b): Top 50 pairs of brain regions in terms of the mean changes in the generated brain connectivity between two levels of picture vocabulary reading scores (90% and 10% quantiles). The left panel shows the positive connections among the 50 pairs, and the right panel shows negative connections. More details about each node can be found in the Excel spreadsheet in Supplementary Material II.

The first row in Fig. 14 shows results from the HCP data, and the second row shows results from the ABCD data. Considering the different populations (young adults vs adolescents) in our data analysis, it is interesting to observe many similar results. For example, we observe denser connections both within the left and right frontal lobes and between them. In particular, we see that brain regions such as l26, r26 (rostral middle frontal), l27, r27 (superior frontal), and l3, r3 (caudal middle frontal) are densely involved (nodes with high degrees) in the top 50 connections. These brain regions are thought to contribute to higher cognition and particularly working memory (Boisgueheneuc et al., 2006), and are important for language-related activities (Binder et al., 1997; Friederici, 2002). We also observe that the four nodes in the occipital lobe (regions 4, 10, 12 and 20) all appear in the top 50 connections. This visual preprocessing center has a few connections (fiber bundles) to the parietal lobe (e.g., regions 7, 24, 28) and then to the frontal lobe. Hence, our results show that richer connections between the visual, sensory, and working memory systems are strongly associated with higher picture vocabulary in adolescents and adults. Some negative connections within each hemisphere are also observed, although their numbers and strengths are smaller than the positive ones. These connections may arise from errors introduced during the connection recovery stage or simply statistical noise.

We further compare the mean difference of the generated data with original brain connectomes in the high and low trait groups. Due to the limited and unbalanced observations for the observed connectome in the particular trait level, we first choose one group of observed networks with the picture vocabulary test score below the 10% quantile and one group of observed networks with the test scores above the 90% quantile, and then calculate the mean difference for the two groups. Figure 15 plots the top 50 connections (based on absolute values) in the mean difference network for the picture vocabulary test from the HCP and ABCD dataset. The 50 connections are further separated into positive and negative connections and are plotted in separate panels in Fig. 15. Comparing with Fig. 14, we can see a similar trend: after ranking the connections by absolute values, these connections are dominated by positive ones, indicating that better vocabulary ability is associated with more connections. Besides the shared dominated connections in both the positive and negative connections, the plots generated from the original weight matrix have more connections that may be caused by the individual’s random effects.

Fig. 15.

Fig. 15.

(a)-(b): Top 50 pairs of brain regions in terms of the mean changes in the original brain connectivity dataset HCP and ABCD between two levels of picture vocabulary reading scores (above 90% quantile and below 10% quantile). The left panel shows the positive connections among the 50 pairs, and the right panel shows negative connections. More details about each node can be found in the Excel spreadsheet in Supplementary Material II.

4.4. Computing details

We summarize the computing details for our analyses of ABCD and HCP data. We run Adam on GATE and reGATE with the learning rate 0.001 on one NVIDIA Titan-V GPU. In GATE, we used a batch size of 128 sampled uniformly at random at each epoch and ran 200 epochs. In reGATE, we used 5-fold CV to calculate the MSE and ran 100 epochs with a batch size of 128 for each training dataset. The latent dimension K can be chosen as 68 under the same criteria as in Section 3. The computing time in modeling the ABCD dataset is 14.34 mins for GATE and 15.2 mins for reGATE. The computing time in the HCP dataset is 4.74 mins for GATE and 5.18 mins for reGATE.

Table 5 shows the detailed network architectures for the inference model and generative model training.

Table 5.

Experimental details and network architectures. K is the dimension of zi, N is the number of layers in the inference network, M is the number of layers in GCN, R is the dimension of X(i).

Inference model (μϕ, σϕ)
Generative model
μϕ(N=2) σϕ(N=2) setting activation
GATE/reGATE (K = 68) W1 : 68 * 256 W1 : 68 * 256 k-NN: 32 h1 : Sigmoid
W2 : 256 * 68 W2 : 256 * 68 M = 2
R = 5
h2 : Sigmoid
b1 : 256 * 1 b1 : 256 * 1 R = 5
b2 : 68 * 1 b2 : 68 * 1
φ1 = ReLu φ1 = ReLu
φ2 = Linear φ2 = Linear

5. Discussion

We develop a novel nonlinear latent factor model to characterize the population distribution of brain connectomes across individuals and depending on human traits. GATE outputs two layers of low dimensional nonlinear latent representations: one on the individuals that can be used as a summary score for visualization and prediction of human traits of interest; and one on the nodes for characterizing the network structure of each individual based on a latent space model. A supervised model reGATE is proposed to analyze the relationship between human traits and brain connectomes. GATE/reGATE are developed based on a deep neural network framework and implemented via a stochastic variational Bayesian algorithm. The algorithm is computationally efficient and can be applied to massive networks with large number of nodes (brain ROIs).

With applications to the ABCD and HCP data, we used GATE and reGATE to study the relationship between brain structural connectomes and various cognition measures. Using the generative model of reGATE, we can simulate brain networks for a given y (e.g., cognition measure) and compare these brain connectomes and related network topological summaries under different cognition levels while allowing variability across individuals. For these cognition traits, we clearly observe that cross hemispheres connections are essential. In the ABCD and HCP datasets, we found that the network density increases while the average path length decreases as the cognition level increases; these network measures for adolescents evaluated in ABCD have a higher variation than adults in HCP. reGATE had superior performance in predicting trait scores from brain networks, with the gain particularly notable in the larger ABCD study.

The generative aspect of GATE/reGATE has a wide range of applications, including data augmentation, outlier network detection, and potentially sensitive data release. As an example in data augmentation, most existing neuroimaging studies contain only a few subjects, and models like GATE that require a large sample size do not work well for such data. With the help of large datasets, such as the ABCD and UK-Biobank data, we can pre-train a model using the large datasets and then refine the model with the smaller dataset. With our generative model, we can generate more data to mimic the data distribution of the smaller dataset and release confidential datasets generated from this distribution.

In the future, we would like to extent GATE/reGATE in the following directions. First, in brain networks, a refined brain division can give a larger number of nodes, providing a more detailed description of the brain. GATE and reGATE provide a new set of tools for handling high-resolution brain networks, and it becomes interesting to extend the methodology to handle multiresolution data. Moreover, current large studies all collect both functional MRI and diffusion MRI data. It is straightforward to extend GATE/reGATE to jointly embed both functional and structural connectomes, even allowing the strength and nature of the link to vary across individuals and with traits.

Supplementary Material

1

Acknowledgements

Data used in the preparation of this article were obtained from the Human Connectome Project (https://www.humanconnectome.org/) and Adolescent Brain Cognitive Development (ABCD) Study (https://abcdstudy.org). The HCP WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) were funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. The ABCD study is supported by the National Institutes of Health and additional federal partners under award numbers U01DA041022, U01DA041028, U01DA041048, U01DA041089, U01DA041106, U01DA041117, U01DA041120, U01DA041134, U01DA041148, U01DA041156, U01DA041174, U24DA041123, U24DA041147, U01DA041093, and U01DA041025. A full list of supporters is available at https://abcdstudy.org/federal-partners.html. A listing of participating sites and a complete listing of the study investigators can be found at https://abcdstudy.org/scientists/workgroups/. ABCD consortium investigators designed and implemented the study and/or provided data but did not necessarily participate in the analysis or writing of this report. This paper reflects the views of the authors and may not reflect the opinions or views of the NIH or ABCD consortium investigators.

Dunson and Zhang want to acknowledge support from the National Institutes of Health (NIH) of the United States under award number MH118927. We also want to acknowledge the support from the NVIDIA Corporation with their donation of the Titan-V GPU.

Footnotes

Credit authorship contribution statement

Meimei Liu: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing – original draft, Writing – review & editing. Zhengwu Zhang: Data curation, Formal analysis, Writing – review & editing, Visualization, Funding acquisition. David B. Dunson: Conceptualization, Methodology, Writing – review & editing, Supervision, Project administration, Funding acquisition.

Supplementary material

Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.neuroimage.2021.118750.

References

  1. Barabási A-L, Albert R, 1999. Emergence of scaling in random networks. Science 286 (5439), 509–512. [DOI] [PubMed] [Google Scholar]
  2. Beckmann CF, DeLuca M, Devlin JT, Smith SM, 2005. Investigations into resting-state connectivity using independent component analysis. Philos. Trans. R. Soc. B 360 (1457), 1001–1013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Binder JR, Frost JA, Hammeke TA, Cox RW, Rao SM, Prieto T, 1997. Human brain language areas identified by functional magnetic resonance imaging. J. Neurosci 17 (1), 353–362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boisgueheneuc F.d., Levy R, Volle E, Seassau M, Duffau H, Kinkingnehun S, Samson Y, Zhang S, Dubois B, 2006. Functions of the left superior frontal gyrus in humans: a lesion study. Brain 129 (12), 3315–3328. [DOI] [PubMed] [Google Scholar]
  5. Bullmore E, Sporns O, 2009. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci 10 (3), 186. [DOI] [PubMed] [Google Scholar]
  6. Casey B, Cannonier T, Conley MI, Cohen AO, Barch DM, Heitzeg MM, Soules ME, Teslovich T, Dellarco DV, Garavan H, Orr C, 2018. The adolescent brain cognitive development (ABCD) study: imaging acquisition across 21 sites. Dev. Cogn. Neurosci.. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Craddock RC, Jbabdi S, Yan C-G, Vogelstein JT, Castellanos FX, Di Martino A, Kelly C, Heberlein K, Colcombe S, Milham MP, 2013. Imaging human connectomes at the macroscale. Nat. Methods 10 (6), 524–539. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL, Dale AM, Maguire RP, Hyman BT, Albert MS, Killiany RJ, 2006. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31 (3), 968–980. [DOI] [PubMed] [Google Scholar]
  9. Durante D, Dunson DB, 2018. Bayesian inference and testing of group differences in brain networks. Bayesian Anal. 13 (1), 29–58. [Google Scholar]
  10. Durante D, Dunson DB, Vogelstein JT, 2017. Nonparametric Bayes modeling of populations of networks. J. Am. Stat. Assoc 112 (520), 1516–1530. [Google Scholar]
  11. Fornito A, Zalesky A, Breakspear M, 2013. Graph analysis of the human connectome: promise, progress, and pitfalls. Neuroimage 80, 426–444. [DOI] [PubMed] [Google Scholar]
  12. Friederici AD, 2002. Towards a neural basis of auditory sentence processing. Trends Cogn. Sci 6 (2), 78–84. [DOI] [PubMed] [Google Scholar]
  13. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB, 2013. Bayesian Data Analysis. Chapman and Hall/CRC. [Google Scholar]
  14. Gershon RC, Wagster MV, Hendrie HC, Fox NA, Cook KF, Nowinski CJ, 2013. NIH toolbox for assessment of neurological and behavioral function. Neurology 80 (11), S2–S6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Girard G, Whittingstall K, Deriche R, Descoteaux M, 2014. Towards quantitative connectivity analysis: reducing tractography biases. Neuroimage 98, 266–278. [DOI] [PubMed] [Google Scholar]
  16. Glasser MF, Smith SM, Marcus DS, Andersson JL, Auerbach EJ, Behrens TE, Coalson TS, Harms MP, Jenkinson M, Moeller S, Robinson EC, Sotiropoulos SN, Xu J, Yacoub E, Ugurbil K, Van Essen DC, 2016. The human connectome project’s neuroimaging approach. Nat. Neurosci 19 (9), 1175–1187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Goodfellow I, Bengio Y, Courville A, 2016. Deep Learning. MIT press. [Google Scholar]
  18. Hamilton W, Ying Z, Leskovec J, 2017. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst.. [Google Scholar]
  19. Heaton RK, Akshoomoff N, Tulsky D, Mungas D, Weintraub S, Dikmen S, Beaumont J, Casaletto KB, Conway K, Slotkin J, Gershon R, 2014. Reliability and validity of composite scores from the NIH toolbox cognition battery in adults. J. Int. Neuropsychol. Soc 20 (6), 588. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hinton G, Deng L, Yu D, Dahl G, Mohamed A. r., Jaitly N, Senior A, Vanhoucke V, Nguyen P, Kingsbury B, Kingsbury B, 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag 29. [Google Scholar]
  21. Hoff PD, Raftery AE, Handcock MS, 2002. Latent space approaches to social network analysis. J. Am. Stat. Assoc 97 (460), 1090–1098. [Google Scholar]
  22. Hoffman MD, Blei DM, Wang C, Paisley J, 2013. Stochastic variational inference. J. Mach. Learn. Res 14 (1), 1303–1347. [Google Scholar]
  23. Johnson DB, 1977. Efficient algorithms for shortest paths in sparse networks. J. ACM 24 (1), 1–13. [Google Scholar]
  24. Jones DK, Knosche TR, Turner R, 2013. White matter integrity, fiber count, and other fallacies: the do’s and don’ts of diffusion MRI. Neuroimage 73, 239–254. [DOI] [PubMed] [Google Scholar]
  25. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK, 1999. An introduction to variational methods for graphical models. Mach. Learn 37 (2), 183–233. [Google Scholar]
  26. Kawahara J, Brown CJ, Miller SP, Booth BG, Chau V, Grunau RE, Zwicker JG, Hamarneh G, 2017. BrainNetCNN: convolutional neural networks for brain networks; towards predicting neurodevelopment. Neuroimage 146, 1038–1049. [DOI] [PubMed] [Google Scholar]
  27. Kingma DP, Ba J, 2015. Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR). [Google Scholar]
  28. Kingma DP, Welling M, 2014. Auto-encoding variational Bayes. In: International Conference on Learning Representations (ICLR). [Google Scholar]
  29. Kipf TN, Welling M, 2017. Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations. [Google Scholar]
  30. Krizhevsky A, Sutskever I, Hinton GE, 2012. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst.. [Google Scholar]
  31. Ktena SI, Parisot S, Ferrante E, Rajchl M, Lee M, Glocker B, Rueckert D, 2018. Metric learning with spectral graph convolutions on brain connectivity networks. Neuroimage 169, 431–442. [DOI] [PubMed] [Google Scholar]
  32. Van der Maaten L, Hinton G, 2008. Visualizing data using t-SNE. J. Mach. Learn. Res 9 (11). [Google Scholar]
  33. Maier-Hein KH, Neher PF, Houde J-C, Côté M-A, Garyfallidis E, Zhong J, Chamberland M, Yeh F-C, Lin Y-C, Ji Q, Reddick W, 2017. The challenge of mapping the human connectome based on diffusion tractography. Nat. Commun 8 (1), 1349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Miller KL, Alfaro-Almagro F, Bangerter NK, Thomas DL, Yacoub E, Xu J, Bartsch AJ, Jbabdi S, Sotiropoulos SN, Andersson JL, Griffanti L, 2016. Multi-modal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci 19 (11), 1523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Nowicki K, Snijders TAB, 2001. Estimation and prediction for stochastic blockstructures. J. Am. Stat. Assoc 96 (455), 1077–1087. [Google Scholar]
  36. Park H-J, Friston K, 2013. Structural and functional brain networks: from connections to cognition. Science 342 (6158), 1238411. [DOI] [PubMed] [Google Scholar]
  37. Rezende DJ, Mohamed S, Wierstra D, 2014. Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, pp. 1278–1286. [Google Scholar]
  38. Sermanet P, Chintala S, LeCun Y, 2012. Convolutional neural networks applied to house numbers digit classification. In: International Conference on Pattern Recognition (ICPR 2012). [Google Scholar]
  39. Smith RE, Tournier JD, Calamante F, Connelly A, 2012. Anatomically-constrained tractography: improved diffusion MRI streamlines tractography through effective use of anatomical information. Neuroimage 62 (3), 1924–1938. [DOI] [PubMed] [Google Scholar]
  40. Tuch DS, 2004. Q-ball imaging. Magn. Reson. Med 52 (6), 1358–1372. [DOI] [PubMed] [Google Scholar]
  41. Van Essen DC, Smith SM, Barch DM, Behrens TE, Yacoub E, Ugurbil K, Consortium W-MH, 2013. The WU-Minn human connectome project: an overview. Neuroimage 80, 62–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Van Essen DC, Ugurbil K, Auerbach E, Barch D, Behrens T, Bucholz R, Chang A, Chen L, Corbetta M, Curtiss SW, Della Penna S, 2012. The human connectome project: a data acquisition perspective. Neuroimage 62 (4), 2222–2231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Wang L, Zhang Z, Dunson DB, 2019. Symmetric bilinear regression for signal subgraph estimation. IEEE Trans. Signal Process 67 (7), 1929–1940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Watts DJ, Strogatz SH, 1998. Collective dynamics of ‘small-world’ networks. Nature 393 (6684), 440. [DOI] [PubMed] [Google Scholar]
  45. Xie J, Douglas PK, Wu YN, Brody AL, Anderson AE, 2017. Decoding the encoding of functional brain networks: an fMRI classification comparison of non-negative matrix factorization (nmf), independent component analysis (ica), and sparse coding algorithms. J. Neurosci. Methods 282, 81–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Yoo Y, Yun S, Jin Chang H, Demiris Y, Young Choi J, 2017. Variational autoencoded regression: high dimensional regression of visual data on complex manifold. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674–3683. [Google Scholar]
  47. Zhang J, Sun WW, Li L, 2018a. Network response regression for modeling population of networks with covariates. arXiv preprint arXiv:1810.03192.
  48. Zhang LWZ, Dunson D, 2019. Common and individual structure of multiple networks. Ann. Appl. Stat 13 (1), 85–112. [Google Scholar]
  49. Zhang Z, Allen GI, Zhu H, Dunson D, 2019. Tensor network factorizations: relationships between brain structural connectomes and traits. Neuroimage 197, 330–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Zhang Z, Descoteaux M, Zhang J, Girard G, Chamberland M, Dunson D, Srivastava A, Zhu H, 2018. Mapping population-based structural connectomes. Neuroimage 172, 130–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Zhao Q, Honnorat N, Adeli E, Pfefferbaum A, Sullivan EV, Pohl KM, 2019. Variational autoencoder with truncated mixture of Gaussians for functional connectivity analysis. In: International Conference on Information Processing in Medical Imaging. Springer, pp. 867–879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Zhou H, Li L, Zhu H, 2013. Tensor regression with applications in neuroimaging data analysis. J. Am. Stat. Assoc 108 (502), 540–552. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES