Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 1.
Published in final edited form as: Biometrics. 2020 Aug 20;77(3):879–889. doi: 10.1111/biom.13351

Multimodal neuroimaging data integration and pathway analysis

Yi Zhao 1, Lexin Li 2, Brian S Caffo 3
PMCID: PMC7881049  NIHMSID: NIHMS1636820  PMID: 32789850

Abstract

With advancements in technology, the collection of multiple types of measurements on a common set of subjects is becoming routine in science. Some notable examples include multimodal neuroimaging studies for the simultaneous investigation of brain structure and function and multi-omics studies for combining genetic and genomic information. Integrative analysis of multimodal data allows scientists to interrogate new mechanistic questions. However, the data collection and generation of integrative hypotheses is outpacing available methodology for joint analysis of multimodal measurements. In this article, we study high-dimensional multimodal data integration in the context of mediation analysis. We aim to understand the roles that different data modalities play as possible mediators in the pathway between an exposure variable and an outcome. We propose a mediation model framework with two data types serving as separate sets of mediators and develop a penalized optimization approach for parameter estimation. We study both the theoretical properties of the estimator through an asymptotic analysis and its finite-sample performance through simulations. We illustrate our method with a multimodal brain pathway analysis having both structural and functional connectivity as mediators in the association between sex and language processing.

Keywords: brain connectivity analysis, linear structural equation model, mediation analysis, multimodal data integration, regularization

1. INTRODUCTION

Modern magnetic resonance imaging (MRI) studies are usually multimodal, in the sense that multiple types of MRI measurements, for example, anatomical MRI, functional MRI (fMRI), and diffusion tensor imaging (DTI), are collected on the same subjects in a single scanner. Integration of these diverse, but scientifically complementary, imaging measurements would strengthen our understanding of brain structure and function, as well as their associations with neurological disorders (see Uludağ and Roebroeck, 2014, for a review).

In this article, we study multimodal neuroimaging data integration in the context of mediation analysis. Mediation analysis seeks to identify and explain the mechanism, or pathway, that underlies an observed relationship between exposure and outcome variables, through the inclusion of an intermediary variable, known as a mediator. Such an analysis often represents a starting point for mechanistic studies. It was originally developed in psychometrics (Baron and Kenny, 1986), and has been extensively studied in statistics (see, eg, Pearl, 2003; Wang et al., 2013; Huang and Pan, 2016, among many others). Recently, mediation analysis has received attention in neuroimaging analysis to understand the roles of brain structure and function as possible mediators between the exposure and some cognitive or behavioral outcome (Caffo et al., 2007; Atlas et al., 2010; Lindquist, 2012; Zhao and Luo, 2016; Chén et al., 2017; Zhao et al., 2020). However, all existing work focused on a single imaging modality as mediators. By contrast, we aim to study the problem of mediation analysis based on multiple high-dimensional imaging modalities.

Our motivation is an investigation of how brain structure and function mediate the relationship between sex and language processing behavior. Numerous studies have observed sex differences in language processing, as well as different structure and function of brain regions related to language processing (Shaywitz et al., 1995; Pinker, 2007). We study a dataset from the Human Connectome Project, where each participant took a picture vocabulary test, a measure of language processing behavior, and both a DTI and a resting-state fMRI imaging scan. DTI is an MRI technique that measures the diffusion of water molecules along white matter fiber bundles, which in turn provides an assessment of brain structural connectivity. fMRI is an MRI technique that measures blood oxygen level as a surrogate of brain neural activity, which in turn assesses brain functional connectivity. Our goal is to develop a new statistical model to integrate brain structural and functional connectivity to identify brain pathways that associate sex with language processing. It should be emphasized that, although motivated by a multimodal neuroimaging application, our method is equally applicable to a variety of multimodal data, such as multi-omics data (Richardson et al., 2016).

The rest of the article is organized as follows. Section 2 presents our multimodal mediation model. Section 3 develops the penalized estimation algorithm, and Section 4 studies the asymptotic properties. Section 5 presents the simulations, and Section 6 revisits our motivating multimodal brain imaging example. Section 7 concludes the paper with a discussion. The supporting information collects the technical proofs and additional results.

2. MODEL

We begin with a description of our multimodal high-dimensional mediation model. Let X denote the exposure variable, and Y the outcome. In our example, X is the participant’s sex, and Y is the picture vocabulary test score. Let M1=(M11,,M1p1) denote the set of potential mediators from the first modality, and M2=(M21,,M2p2) the set of potential mediators from the second modality. In our case, the structural connectivity measures from DTI are M1, and the functional connectivity measures from fMRI are M2, with p1 = 531 and p2 = 917, respectively. The order of the two sets of mediators is determined by the knowledge that structural connectivity shapes and constrains functional connectivity in adults. More specifically, Hebb’s law portrays that brain regions that communicate frequently are more likely the consequences of direct structural connections (Hebb, 2005). In addition, brain structural connectivity regulates the dynamics of cortical circuits and systems captured by functional connectivity (Sporns, 2007). For n independent and identically distributed observations, we consider the model,

M1j=Xβj+εj,     j=1,,p1,M2k=Xζk+j=1p1M1jλjk+ϑk,     k=1,,p2,Y=Xδ+j=1p1M1jθj+k=1p2M2kπk+ξ, (1)

where Xn is the exposure vector stacking all n samples, Yn is the response vector, and M1jn, M2kn are two mediator vectors. Moreover, βj, ζk, λjk, δ, θj, and πk are scalar coefficients, and εjn, ϑkn, ξn are normal random errors with zero means, j = 1, … , p1, k = 1, … , p2. For simplicity, we assume the data are centered, and thus drop the intercept terms. We further assume the error term, εj, is independent of X, M2k, ϑk is independent of X, M1j, and ξ is independent of X, M1j, M2k. Model (1) is essentially a linear structural equation model (Pearl, 2003); a graphic illustration is given in Figure 1.

FIGURE 1.

FIGURE 1

The diagram of the proposed two-modality mediation model: X is the exposure variable, Y is the outcome variable, M1=(M11,,M1p1) collects the mediators from the first modality, and M2=(M21,,M2p2) collects the mediators from the second modality.

The key of Model (1) is that it does not delineate the underlying relationships among the mediators within the same modality. Instead, it focuses on the roles of the mediators between the two sets of modalities on the treatment-outcome pathway. As is shown later, Model (1) does not assume the mediators within the same modality to be conditionally independent, but instead uses correlated errors to account for dependence structures among the mediators. On the other hand, we employ a least-squares type optimization to estimate the mediation effects, and do not estimate the underlying error structures. By contrast, an alternative model to (1) is to fully account for all pathways among the mediators within each modality. More details of this model is given in Section A.1 of the Supporting Information. The main issue with this alternative, however, is that it requires the knowledge of the mechanistic ordering of the mediators within each modality, which is rarely known in practice.

Our model is motivated by Zhao and Luo (2016), but is also distinct in several ways. First, while Zhao and Luo (2016) tackled single-modality pathway analysis, we consider the multimodality scenario, for which there is no existing solution. Given the strong demand for this type of multimodal mediation analysis in brain imaging and many other applications, we offer a timely solution to this important family of scientific problems. Second, even though a straightforward extension conceptually, our multimodal pathway analysis is technically much more involved than the single-modality analysis of Zhao and Luo (2016). In addition, we numerically compare the two methods, and show the superior performance of our method, both through simulations and an application to a multimodal neuroimaging dataset.

Stacking the terms in (1) together, we have β=(β1,,βp1)1×p1, ζ=(ζ1,,ζp2)1×p2, Λ=(λjk)p1×p2, θ=(θ1,,θp1)p1×1, π=(π1,,πp2)p2×1, M1=(M11,,M1p1)n×p1, M2=(M21,,M2p2)n×p2, ε=(ε1,,εp1)n×p1, and ϑ=(ϑ1,,ϑp2)n×p2. We can then rewrite Model (1) in a matrix form,

(M1M2Y)=(XM1M2)(βζδ0Λθ00π)+(εϑξ), (2)

where vec(ε)~N(0,Σ1In), vec(ϑ)~N(0,Σ2In), and ξ~N(0,σ2In), Σ1p1×p1 and Σ2p2×p2 are the covariance matrices, In is the identity matrix, and ⊗ is the Kronecker product operator. The terms β and ζ summarize the overall treatment effect on the mediators M1 and M2, respectively. The (j, k)th element in Λ reveals the overall effect of M1j on M2k regardless of the underlying relationships among the mediators in M2. The error terms in ε and ϑ are dependent, due to the influences among the mediators. Therefore, even though Model (1) does not explicitly model the relationships among the mediators within the same modality, it encapsulates the dependence among the mediators through the correlations between the error terms. Next, we formally define various pathway effects between X and Y under Model (1). Additional assumptions, as discussed in Section A.2 of the Supporting Information, must be made to guarantee a causal interpretation to pathway effects.

Definition 1. Under Model (1), considering two exposure conditions X = x and X = x*, we define the following pathway effects of X on the outcome Y.

  1. The indirect pathway effect of X through path XM1jY is IEj1(x,x*)=βjθj(xx*), j = 1, … , p1. The total indirect pathway effect of X through M1, but not through M2, is IE1(x,x*)=j=1p1βjθj(xx*).

  2. The indirect pathway effect of X through path XM2kY is IEk2(x,x*)=ζkπk(xx*), k = 1, … , p2. The total indirect pathway effect of X through M2, but not through M1, is IE2(x,x*)=k=1p2ζkπk(xx*).

  3. The indirect pathway effect of X through path XM1jM2kY is IEjk1,2(x,x*)=βjλjkπk(xx*), j = 1, … , p1, k = 1, … , p2. The total indirect effect of X through both M1 and M2 is IE1,2=j=1p1k=1p2βjλjkπk(xx*).

  4. The direct effect of X is DE(x, x*) = δ(xx*).

By Definition 1, the total effect (TE) of X on Y is decomposed as the sum of the direct effect (DE) and the total indirect effect (IE):

TE(x,x*)=DE(x,x*)+IE(x,x*)DE(x,x*)+IE1(x,x*)+IE2(x,x*)+IE1,2(x,x*). (3)

3. ESTIMATION

Our goal is to estimate the pathway effects defined in Definition 1 under Model (1). To achieve this, we define the objective function,

l(β,θ,ζ,π,Λ,δ)=trace{(M1Xβ)(M1Xβ)}+trace{(M2XζM1Λ)(M2XζM1Λ)}+(YXδM1θM2π)(YXδM1θM2π).

This objective function in effect sets Σ1 and Σ2 to be identity matrices. However, this simplification would not affect the consistency of our least-squares type estimators, as long as all the variables are standardized (to the unit scale, White, 1980).

Next, we introduce a series of penalty functions. One challenge is that the number of the mediators far exceeds the sample size. In our example, p1 = 531, p2 = 917, while n = 136. We introduce sparsity in the pathway effects through l1 regularization. Sparsity assumptions are commonly employed in many applications, including neuroscience. This can be justified as human brain anatomy is economically organized (Bullmore and Sporns, 2012), and information processing is often highly clustered and concentrated (Bassett and Bullmore, 2017). We thus consider the following regularized optimization problem

minimizeβ,θ,ζ,π,Λ,δl(β,θ,ζ,π,Λ,δ),
 subject to j=1p1|βjθj|t1,k=1p2|ζkπk|t2,
j=1p1k=1p2|βjλjkπk|t3,|δ|t4,

where t1, t2, t3, t4 ≥ 0 are the regularization parameters. The first three l1-penalty functions are not convex. The next lemma presents a convex relaxation. We observe that the mediation effect through both M1 and M2 is defined as a three-way product, βjλjkπk. Since βj and πk are already regularized in the pathway effect through M1, that is, βjθj, and that through M2, that is, ζkπk, respectively, it suffices to regularize λjk alone.

Lemma 1.For any ν1, v21/2,

j=1p1{|βjθj|+v1(βj2+θj2)},     k=1p2{|ζkπk|+v2(ζk2+πk2)},      and j=1p1k=1p2|λjk|

are convex functions of {β, θ}, {ζ, π}, and Λ, respectively. For any t1, t2, t30, there exist r1, r2, r30, such that

{j=1p1{|βjθj|+v1(βj2+θj2)}r1k=1p2{|ζkπk|+v2(ζk2+πk2)}r2     j=1p1k=1p2|λjk|r3     {j=1p1|βjθj|t1k=1p2|ζkπk|t2j=1p1k=1p2|βjλjkπk|t3.

The proof of Lemma 1 is given in Section C.1 of the Supporting Information. Based on this convex relaxation, we turn to the following optimization problem,

minimizeβ,θ,ζ,π,Λ,δ{12l(β,θ,ζ,π,Λ,δ)+P1(β,θ,ζ,π)+P2(β,θ,ζ,π)+P3(Λ,δ)}, (4)

where the three penalty functions are of the form

P1(β,θ,ζ,π)=κ1[j=1p1{|βjθj|+v1(βj2+θj2)}]+κ2[k=1p2{|ζkπk|+v2(ζk2+πk2)}],
P2(β,θ,ζ,π)=μ1j=1p1(|βj|+|θj|)+μ2k=1p2(|ζk|+|πk|),
P3(Λ,δ)=κ3j=1p1k=1p2|λjk|+κ4|δ|,

with ν1, ν2 ≥ 1∕2, κ1, κ2, κ3, κ4 ≥ 0, and μ1, μ2 ≥ 0 as the tuning parameters. Here ν1 and ν2 control the level of convexity relaxation, κ1, κ2, κ3, and κ4 control the level of penalty on various pathway effects, and μ1 and μ2 control the level of sparsity of individual parameters. We note that the tuning parameters κ1, κ2, κ3, μ1, and μ2 can also vary with j and k, similarly as in the adaptive Lasso (Zou, 2006). For simplicity, they are kept the same across j = 1, … , p1 and k = 1, … , p2. In (4), the penalties P1 and P3 work together as a convex relaxation of the l1-regularization on path effects to reduce the estimation bias. The penalty P2 adds an additional l1-regularization on individual parameters to improve the selection accuracy.

The objective function in (4) consists of a differentiable loss function, l/2, and an indifferentiable regularization function, (P1 + P2 + P3). We next develop an alternating direction method of multipliers (ADMM; Boyd et al., 2011) to solve (4). The ADMM form of the optimization problem (4) is

minimizeβ,θ,ζ,π,Λ,δ,β˜,θ˜,ζ˜,π˜12l(β,θ,ζ,π,Λ,δ)+P1(β˜,θ˜,ζ˜,π˜)+P2(β˜,θ˜,ζ˜,π˜)+P3(Λ,δ),
subject to β=β˜,   θ=θ˜,   ζ=ζ˜,    π=π˜,

where β˜1×p1, θ˜p1, ζ˜1×p2, and π˜p2 are the newly introduced parameters. This way, the objective function is decomposed into a convex loss function plus

Algorithm 1.

The optimization algorithm for (5)

Input: (X, M1, M2, Y).
 1: initialization: {β(0),θ(0),ζ(0),π(0),Λ(0),δ(0),β˜(0),θ˜(0),ζ˜(0),π˜(0),τ1(0),τ2(0),τ3(0),τ4(0)}.
 2: repeat for iteration s = 0,1,2, …
 3:  update β(s+1)=(XX+ρ)1{XM1τ1(s)+ρβ˜(s)}.
 4:  update θ(s+1)=(M1M1+ρI)1[M1{YXδ(s)M2π(s)}τ2(s)+ρθ˜(s)].
 5:  update ζ(s+1)=(XX+ρ)1[X{M2M1Λ(s)}τ3(s)+ρζ˜(s)].
 6:  update π(s+1)=(M2M2+ρI)1[M2{YXδ(s)M1θ(s+1)}τ4(s)+ρπ˜(s)].
 7:  update δ(s+1)=(XX)1Soft[X{YM1θ(s+1)M2π(s+1)},κ4].
 8:  for k = 1 to p2 do update Λk(s+1) by solving (6). end for
 9:  for j = 1 to p1 do update {β˜j(s+1),θ˜j(s+1)} by solving (7). end for
 10:  for k = 1 to p2 do update {ζ˜k(s+1),π˜k(s+1)} by solving (8). end for
 11:  update τr(s+1)=τr(s)+ρhr{Υ(s+1),Υ˜(s+1)},r=1,,4.
 12: until the objective function converges.
   Output: (β^,θ^,ξ^,π^,Λ^,δ^).

a convex, but nondifferentiable regularization function, which we handle separately. Let ϒ = (β, θ, ζ, π) and Υ˜=(β˜,θ˜,ζ˜,π˜); the augmented Lagrangian function to enforce the constraints is

12l(Υ,Λ,δ)+P1(Υ˜)+P2(Υ˜)+P3(Λ,δ)+r=14(hr(Υ,Υ˜),τr+ρ2hr(Υ,Υ˜)22), (5)

where h1(Υ,Υ˜)=ββ˜, h2(Υ,Υ˜)=θθ˜, h3(Υ,Υ˜)=ζζ˜, h4(Υ,Υ˜)=ππ˜, τ1, τ2p1, τ3, τ4p2, and ρ > 0 is the augmented Lagrangian parameter. We propose to update the parameters in (5) iteratively, and summarize our estimation procedure in Algorithm 1.

A few remarks are in order. The explicit forms of Steps 3 to 6 of Algorithm 1 are derived in Section B of the Supporting Information. In Step 7, Soft(a, b) = sgn(a) max{|a| − b, 0} is the soft-thresholding function. Step 8 is to update Λ, one column at a time. Its kth column, k = 1, … , p2, can be obtained by

minimizeΛkp112M2kXζk(s+1)M1Λk22+κ3Λk1. (6)

This is a standard Lasso problem with {M2kXζk(s+1)} as the response and M1 as the predictor. Step 9 is to update (β˜,θ˜), also one column pair at a time, and the jth column pair (β˜j,θ˜j), j = 1, … , p1, can be obtained by

minimize(β˜j,θ˜j)v{β˜j,θ˜j;κ1,μ1,2κ1v1+ρ,2κ1v1+ρ,τ1j(s)+ρβj(s+1),τ2j(s)+ρθj(s+1)}. (7)

Similarly, Step 10 is to update (ζ˜,π˜k), one column pair at a time, and the kth column pair (ζ˜k,π˜k), k = 1, … , p2, can be obtained by

minimize(ζ˜k,π˜k)v{ζ˜k,π˜k;κ2,μ2,2κ2v2+ρ,2κ2v2+ρ,τ3k(s)+ρζk(s+1),τ4k(s)+ρπk(s+1)}. (8)

In both (7) and (8), the function (a1, a2) is of the form

v(a1,a2;b1,b2,b3,b4,b5,b6)=b1|a1a2|+b2|a1|+b2|a2|+12b3a12+12b4a22b5a1b6a2.

Its optimization has a closed-form solution (see Zhao and Luo, 2016, Lemma 3.2).

Our method involves a number of tuning parameters. For ν1 and ν2, our simulations have found that the final estimators are not overly sensitive to their values (see Section D.2 of the Supporting Information). The same phenomenon was observed in Zhao and Luo (2016). For ρ, the augmented Lagrangian parameter, it can be fixed at a constant level (Boyd et al., 2011). We thus fix ν1 = ν2 = 2 and ρ = 1. For (κ1, κ2, κ3, κ4, μ1, μ2), for simplicity, we set κ1=κ2=κ3=κ4=κ˜, and μ1=μ2=μ˜. A grid search is then run to minimize a modified Bayesian information criterion (BIC) to select κ˜ and μ˜,

BIC=2 log L(β^,θ^,ζ^,π^,Λ^,δ^)+log(n)(|A^1|+|A^2|+|A^3|), (9)

where the estimators are obtained under a given set of tuning parameters, A^1={j:β^jθ^j0}, A^2={k:ζ^kπ^k0}, A^3={(j,k):β^jλ^jkπ^k0}, and |A^j| is the cardinality of set A^j.

4. THEORY

We next study the asymptotic properties of our proposed estimator. To simplify the presentation and focus on the mediation pathways, assume that the direct effect, δ, is known, and let V = Y. Let Θ* = (β*, θ*, ζ*, π*, Λ*) denote the true parameters, and Θ^=(β^,θ^,ζ^,π^,Λ^) the global minimizer of the optimization in (4). Let ς*=Λ*π*p1. Further, let S1={j:βj*0}, S2={j:θj*0}, S3={k:ζk*0}, S4={k:πk*0}, and S5={j:ςj*0} denote the support of β*, θ*, ζ*, π*, and ς*, respectively, and sl=|Sl| the cardinality of set Sl, l = 1, … , 5. Consider a set of regularity conditions.

  • (C1)

    The distribution of X has a finite variance, and |Xi| ≤ c0 almost surely, i = 1, … , n.

  • (C2)

    The penalty functions, evaluated at the true parameters, are bounded. That is, for P1, j=1p1{|βj*θj*|+v1(βj*2+θj*2)}c11, k=1p2{|ζk*πk*|+v2(ζk*2+πk*2)}c12; for P2, j=1p1(|βj*|+|θj*|)c21, k=1p2(|ζk*|+|πk*|)c22; and for P3, j=1p1k=1p2|λjk*|c3.

  • (C3)

    All the entries of the error variance terms are bounded by c4.

Condition (C1) is a standard regularity condition on the design matrix in high-dimensional regression settings; when X is binary or categorical, (C1) is satisfied. (C2) regulates the sparsity levels of β*, θ*, ζ*, and π*, and (C3) is the finite variance condition on the model errors, both of which are again common in the literature.

We evaluate the accuracy of our estimator by the mean squared prediction error

MSPE=1ni=1n(V^iVi*)2,

where V = Y, and V^i and Vi* are the corresponding values under the estimated parameter Θ^ and the true parameter Θ*, respectively, for subject i, i = 1, … , n. The predicted pathway effects under our multimodal mediation model are defined as follows.

Definition 2. For a treatment condition X = x, define

  1. the predicted outcome through M1, but not through M2, is V^1=xβ^θ^=x(j=1p1β^jθ^j);

  2. the predicted outcome through M2, but not through M1, is V^2=xζ^π^=x(k=1p2ζ^kπ^k).

  3. the prediction through both M1 and M2 is V^3=xβ^Λ^π^=x(j=1p1k=1p2β^jλ^jkπ^k);

  4. the total prediction of V^=V^1+V^2+V^3.

Theorem 1. Suppose regularity conditions (C1)-(C3) hold. Assume EX2=c5>0. Then,

E(MSPE)2c0c4{c11c212 log(2p1)n+c12c222 log(2p2)n+c3c11c12/v1v2(1+c3c22)2 log(2p1)n},
β^θ^β*θ*221c5{8c112c022 log(2)n+2c0c4c11c212 log(2p1)n},
ζ^π^ζ*π*221c5{8c122c022 log(2)n+2c0c4c12c222 log(2p2)n},
β^Λ^π^β*Λ*π*221c5{8c32(c11c12/v1v2)c022 log(2)n+2c0c4c3c11c12/v1v2(1+c3c22)2 log(2p1)n}.

The proof is given in Section C.2 of the Supporting Information. Theorem 1 shows that the mean squared prediction error converges. In addition, all three types of total pathway effect estimators are consistent in l2-norm. Particularly, the convergence rate of the total indirect effect IE1 and IE2 are log(p1)/n and log(p2)/n, respectively, which are consistent with the rate under only one set of mediators as studied in Zhao and Luo (2016). When considering the indirect effect through both M1 and M2, ς = Λπ summarizes the post-M1 pathway effect, and the problem degenerates to the case with a single set of p1 mediators. As such, the convergence rate is log(p1)/n, and depends on the number of nonzero elements in ς.

5. SIMULATIONS

We next investigate the finite-sample performance. We first study the method under different mediator dimensions and sample sizes. We then compare with the method of Zhao and Luo (2016) under different mediation scenarios. We also study the performance under different sparsity levels, and report those results in Section D.3 of the Supporting Information.

First, the data are simulated following Model (1), where the detailed data generation scheme is reported in Section D.1 of the Supporting Information. Consider two mediator dimension sizes p1 = 20, p2 = 30, with the sparsity level set at 0.9, and p1 = p2 = 100, with the sparsity level at 0.95. Also consider two sample sizes, n = 50 and n = 500. We employ a similar tuning strategy as in Zou and Hastie (2005) to tune κ˜ and μ˜/κ˜, and select the tuning parameters that minimize the modified BIC criterion in (9). Table 1 reports the average results based on 200 data replications. The evaluation criteria include the estimation bias and mean squared error of the estimated total indirect effect, plus the sensitivity and specificity of the identified pathways with nonzero effects. From this table, we see that the proposed method yields a competitive performance in the finite-sample setting, and the performance improves as the sample size increases, complying with our theory.

TABLE 1.

Performance under varying mediator dimensions and sample sizes Note. Reported are the estimation bias and mean squared error (MSE) of the total indirect effect, and the sensitivity and specificity, as well as the standard error (SE) of the measures, of the pathway selection

Mediator dimension Sample size Bias MSE Sensitivity (SE) Specificity (SE)
p1 = 20, p2 = 30 n = 50 0.361 28.06 0.912 (0.112) 0.935 (0.022)
n = 500 0.077 3.108 1.000 (0.000) 0.989 (0.003)
p1 = 100, p2 = 100 n = 50 0.210 10.50 0.668 (0.153) 0.998 (0.001)
n = 500 0.027 1.223 1.000 (0.000) 0.999 (0.000)

Next, we compare our multimodality pathway method with the single-modality method of Zhao and Luo (2016). Different pathway scenarios are considered in Figure 2, while fixing the mediator dimension at p1 = p2 = 20 and the sample size at n = 50. More data generation details are given in Section D.4. For scenario (a), only M1 has a direct path to Y, and for (b), only M2 has a direct path to Y. For (c), both M1 and M2 have paths directly to Y, while we further consider two scenarios, (c-1) with the effect size of the first modality dominating the second, and (c-2) vice versa. For the single-modality method, we simply concatenate the two modalities of mediators as M=(M1,M2)n×(p1+p2). Table 2 reports the average results based on 200 data replications. From this table, it is seen that, for scenarios (a) and (b) where only one modality of mediators has a direct path to the response, the multimodality approach works almost as well as, and sometimes even slightly better, than the single-modality approach. For scenario (b), the multimodality approach can distinguish both types of path effect, that is, XM1M2Y and XM2Y, whereas the single-modality approach cannot. Generally, if these two types of path effect are of the same magnitude, but opposite signs, then the single-modality approach cannot identify this path as the total path effect is zero. However, the multimodality approach is still able to identify both paths. We consider such a case in the simulation. The single-modality approach identifies a zero-effect pathway, while the multimodality approach identifies two nonzero effect pathways. For scenario (c) where both modalities of mediators have paths directly to the response, our multimodality approach clearly outperforms the single-modality approach in both estimation and selection accuracy, especially when the effect of the first modality dominates the second. Overall, we find the multimodality approach achieves a superior numerical performance, and can identify and distinguish pathways that could have been missed by the single-modality approach.

FIGURE 2.

FIGURE 2

Comparison of single- and multimodality mediation methods under different scenarios: A, Only M1 has a direct path to Y; B, Only M2 has a direct path to Y; C, Both M1 and M2 have direct paths to Y.

TABLE 2.

Performance comparison between the single and multimodality methods Note. Reported are the estimation bias and mean squared error (MSE) of the total indirect effect, and the sensitivity and specificity, as well as the standard error (SE) of the measures, of the pathway selection

Scenario Method Bias MSE Sensitivity (SE) Specificity (SE)
(a) Multimodality 0.293 3.538 0.997 (0.035) 0.909 (0.030)
Single-modality 0.613 2.208 0.995 (0.050) 0.898 (0.044)
(b) Multimodality 0.069 23.95 0.966 (0.073) 0.935 (0.017)
Single-modality 0.306 22.91 0.998 (0.023) 0.929 (0.043)
(c-1) Multimodality 0.420 3.552 0.814 (0.151) 0.952 (0.011)
Single-modality 2.524 7.874 0.341 (0.302) 0.923 (0.045)
(c-2) Multimodality −0.015 14.55 0.814 (0.110) 0.958 (0.012)
Single-modality 1.219 13.23 0.638 (0.121) 0.982 (0.020)

6. A MULTIMODAL NEUROIMAGING STUDY

We revisit the motivation example in Section 1. A publicly available dataset from the recent S1200 release of the Human Connectome Project is analyzed, with n = 136 participants. The outcome of interest is a language behavior measure, which is a composite score of a picture vocabulary test evaluating vocabulary comprehension (ranging between 98.23 and 145.27 in the data). In the test, an audio recording of a word and four photographic images were presented to the participants on a screen, who responded by choosing the image that most closely matches the meaning of the word. More details are presented in Section E.1 of the Supporting Information. The two sets of mediators are DTI and resting-state fMRI measures. The DTI images is preprocessed following the pipeline of Zhang et al. (2019). From each DTI scan, a symmetric structural connectivity matrix is constructed, with nodes corresponding to the brain regions-of-interest based on the Desikan Atlas (Desikan et al., 2006), and the edges recording the number of white fiber pathways, which measures the structural connectivity between pairs of brain regions. The regions with zero connectivity in over 25% subjects are removed and the upper triangular connectivity matrix is vectorized to obtain a p1 = 531-dimensional vector of DTI measures. The fMRI images are preprocessed following the pipeline of Glasser et al. (2013). From each fMRI scan, a symmetric functional connectivity matrix is constructed, with nodes corresponding to the brain regions based on the Harvard-Oxford Atlas of FSL (Smith et al., 2004), and the edges recording the z-transformed Pearson correlation. The focus is on the brain regions corresponding to those of DTI, vectorizing the upper triangular connectivity matrix, and obtaining a p2 = 917-dimensional vector of fMRI measures. Moreover, following Rosenbaum (2002), the mediator measures from both DTI and fMRI, as well as the picture vocabulary outcome, are all age adjusted.

A significant sex total effect (TE 3.831, standard error 1.567, P-value 0.016) is observed under a linear model. Consider the proposed method to this data. This total effect can be decomposed following (3). Specifically, the penalized estimate of direct effect is zero, suggesting that the sex difference is fully explained by the variations in brain connectivity. The estimated total indirect effects due to the structural connectivity alone, the functional connectivity alone, and both connectivities are 3.322, 0.297, and −0.109, respectively. Figure 3 presents the identified brain pathways through both the structural and functional connectivities by the penalized estimation. The 95% confidence interval of the path effects is reported using a residual bootstrap method in Section E.3 of the Supporting Information. The single-modality method is also applied, with the results presented in Section E.2 of the Supporting Information. The multimodality method finds about twice as many paths, meanwhile identifies nearly all the paths found by the single-modality method.

FIGURE 3.

FIGURE 3

The estimated pathways with picture vocabulary test performance as the outcome (Y) when comparing male (X = 1) versus female (X = 0). The nodes in purple are diffusion-based brain structural connectivity, and nodes in orange are brain functional connectivity measures. The edges in red indicate positive effects, and the ones in blue indicate negative effects. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

Among the pathways found by the multimodality method, one group involves the structural connectivity between the left postcentral gyrus (postCentral_L) and left superior parietal lobule (SupP_L), then through the functional connectivity between numerous brain regions. Figure 4 top path shows a subset of the pathways through the structural connectivity and functional connectivity between the left superior parietal lobule and left cuneus (Cuneus_L), and between the left precuneus (Precuneus_L) and left lingual gyrus (Lingual_L). These pathways are related to working memory. In particular, the postcentral gyrus and cuneus have been identified in visual processing. The cuneus is a mid-level visual processing area and has been found to be modulated by working memory (Salmon et al., 1996). The precuneus, as part of the default mode network, is involved in working memory, especially for tasks related to verbal processing (Wallentin et al., 2006). The lingual gyrus, located in the occipital lobe, plays an important role in visual processing. The left lingual gyrus is found activated during memorization (Kozlovskiy et al., 2014), and is involved in tasks related to naming and word recognition (Mechelli et al., 2000).

FIGURE 4.

FIGURE 4

The identified brain pathways related to working memory and language. The working memory pathway includes DTI postcentral gyrus (Left) – superior parietal lobule (Leff) → fMRI superior parietal lobule (Left) – cuneus (Left) and fMRI precuneus (Left) – lingual gyrus (Left). The language pathway includes DTI inferior temporal gyrus (Left) – precentral gyrus (Left) → fMRI middle temporal gyrus (Left) – postcentral gyrus (Left). This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

The other pathway involves the structural connectivity between the left inferior temporal gyrus (IT_L) and left precentral gyrus (preCentral_L), then through the functional connectivity between the left middle temporal gyrus (MT_L) and left postcentral gyrus. The bottom path of Figure 4 shows this pathway, which is suggestive of a language mechanism. The inferior temporal gyrus, as part of the inferior longitudinal and inferior occipitofrontal fasciculi, is crucial for semantic processing and for word naming (Race et al., 2013). The middle temporal gyrus is typically viewed as part the language network (Ficek et al., 2018).

7. DISCUSSION

In this article, we propose a method of multimodal mediation analysis. It requires the ordering of the multiple modalities on the mediation pathways, but it does not require the ordering of the potential individual mediators within each modality. We define three types of indirect pathway effects, and develop a regularized estimation algorithm. Both the asymptotic and empirical behaviors of the method are studied.

The proposed method neatly fits within the context of multimodal brain imaging analysis for combining diffusion weighted MRI with fMRI data. Abstracting the setting, the framework interrogates the idea that, across a sample of subjects, an exposure or treatment impacts neural wiring, and the consequent changes in structural wiring impacts brain functional activity, which in turn impacts behavior. As our model is agnostic to domain, one might also consider applying the same approach where an exposure is postulated to impact epigenetic measurements, then subsequently genomic measurements, such as RNA expression, and subsequently changes in behavior or clinical outcome.

In our data analysis, by integrating structural and functional imaging, we have postulated mechanistic pathways, including ones related to working memory and language, that mediate sex related differences in language behavior. However, care must be taken, as any mechanistic interpretation would be highly dependent on a variety of modeling assumptions, such as the path analysis ordering, the linearity, and the confounders.

Our work points to a number of potential extensions. The model vectorizes the connectivity measures and does not exploit the symmetric and positive definite matrix structure. Recent developments in covariance regression (Sun and Li, 2017; Zhao et al., 2019) are potentially useful. Our current work focuses on penalized point estimation, whereas both statistical inference and postselection inference are crucial problems that require further investigation. Moreover, we have not yet considered the cases where the exposure and/or the outcome are themselves are high dimensional. We plan to pursue these lines of work as future research.

Supplementary Material

Supplement

ACKNOWLEDGMENTS

Zhao was partially supported by NIH grant U54AG065181; Li by NSF grant DMS-1613137 and NIH grants R01AG061303, R01AG062542, and R01AG034570; Caffo by NIH grants NS060910-09A1 and P41 110156-0818.

Funding information

National Institute of Biomedical Imaging and Bioengineering, Grant/Award Number: P41 110156-0818; National Institute on Aging, Grant/Award Numbers: R01AG061303, R01AG062542, U54AG065181; National Institute of Neurological Disorders and Stroke, Grant/Award Number: NS060910-09A1; National Science Foundation, Grant/Award Number: DMS-1613137

Footnotes

DATA AVAILABILITY STATEMENT

The data that support the findings in this paper are available in Github at https://github.com/zhaoyi1026/multimodal_integration. These data were derived from the following resources available in the public domain: the Human Connectome Project (http://www.humanconnectomeproject.org/).

SUPPORTING INFORMATION

Web Appendices, Tables, and Figures referenced in Sections 26, along with R code and data are available with this paper at the Biometrics website on Wiley Online Library.

REFERENCES

  1. Atlas LY, Bolger N, Lindquist MA and Wager TD (2010) Brain mediators of predictive cue effects on perceived pain. The Journal of Neuroscience, 30, 12964–12977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Baron RM and Kenny DA (1986) The moderator–mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173. [DOI] [PubMed] [Google Scholar]
  3. Bassett DS and Bullmore ET (2017) Small-world brain networks revisited. The Neuroscientist, 23, 499–516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Boyd S, Parikh N, Chu E, Peleato B and Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3, 1–122. [Google Scholar]
  5. Bullmore E and Sporns O (2012) The economy of brain network organization. Nature Reviews Neuroscience, 13, 336. [DOI] [PubMed] [Google Scholar]
  6. Caffo B, Chen S, Stewart W, Bolla K, Yousem D, Davatzikos C and Schwartz BS (2007) Are brain volumes based on magnetic resonance imaging mediators of the associations of cumulative lead dose with cognitive function? American Journal of Epidemiology, 167, 429–437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chén OY, Crainiceanu C, Ogburn EL, Caffo BS, Wager TD and Lindquist MA (2017) High-dimensional multivariate mediation with application to neuroimaging data. Biostatistics, 19, 121–136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL, Dale AM, Maguire RP and Hyman BT (2006) An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31, 968–980. [DOI] [PubMed] [Google Scholar]
  9. Ficek BN, Wang Z, Zhao Y, Webster KT, Desmond JE, Hillis AE, Frangakis C, Faria AV, Caffo B and Tsapkini K (2018) The effect of tdcs on functional connectivity in primary progressive aphasia. NeuroImage: Clinical, 19, 703–715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Glasser MF, Sotiropoulos SN, Wilson JA, Coalson TS, Fischl B, Andersson JL, Xu J, Jbabdi S, Webster M and Polimeni JR (2013) The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Hebb DO (2005). The Organization of Behavior: A Neuropsychological Theory. London: Psychology Press. [Google Scholar]
  12. Huang Y-T and Pan W-C (2016) Hypothesis test of mediation effect in causal mediation model with high-dimensional continuous mediators. Biometrics, 72, 402–413. [DOI] [PubMed] [Google Scholar]
  13. Kozlovskiy SA, Pyasik MM, Korotkova AV, Vartanov AV, Glozman JM and Kiselnikov AA (2014) Activation of left lingual gyrus related to working memory for schematic faces. International Journal of Psychophysiology, 2, 241. [Google Scholar]
  14. Lindquist MA (2012) Functional causal mediation analysis with an application to brain connectivity. Journal of the American Statistical Association, 107, 1297–1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Mechelli A, Humphreys GW, Mayall K, Olson A and Price CJ (2000) Differential effects of word length and visual contrast in the fusiform and lingual gyri during. Proceedings of the Royal Society of London. Series B: Biological Sciences, 267, 1909–1913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Pearl J (2003) Causality: models, reasoning, and inference. Econometric Theory, 19, 675–685. [Google Scholar]
  17. Pinker S (2007). The Stuff of Thought: Language as a Window into Human Nature. Park Imperial, NY: Penguin. [Google Scholar]
  18. Race D, Tsapkini K, Crinion J, Newhart M, Davis C, Gomez Y, Hillis A and Faria AV (2013) An area essential for linking word meanings to word forms: evidence from primary progressive aphasia. Brain and Language, 127, 167–176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Richardson S, Tseng GC and Sun W (2016) Statistical methods in integrative genomics. Annual Review of Statistics and its Application, 3, 181–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Rosenbaum PR (2002) Covariance adjustment in randomized experiments and observational studies. Statistical Science, 17, 286–327. [Google Scholar]
  21. Salmon E, Van der Linden M, Collette F, Delfiore G, Maquet P, Degueldre C, Luxen A and Franck G (1996) Regional brain activity during working memory tasks. Brain, 119, 1617–1625. [DOI] [PubMed] [Google Scholar]
  22. Shaywitz BA, Shaywltz SE, Pugh KR, Constable RT, Skudlarski P, Fulbright RK, Bronen RA, Fletcher JM, Shankweiler DP and Katz L (1995) Sex differences in the functional organization of the brain for language. Nature, 373, 607. [DOI] [PubMed] [Google Scholar]
  23. Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H, Bannister PR, De Luca M, Drobnjak I, Flitney DE, et al. , (2004) Advances in functional and structural MR image analysis and implementation as FSL. NeuroImage, 23, S208–S219. [DOI] [PubMed] [Google Scholar]
  24. Sporns O (2007) Brain connectivity. Scholarpedia, 2, 4695. [Google Scholar]
  25. Sun WW and Li L (2017) STORE: sparse tensor response regression and neuroimaging analysis. The Journal of Machine Learning Research, 18, 4908–4944. [Google Scholar]
  26. Uludağ K and Roebroeck A (2014) General overview on the merits of multimodal neuroimaging data fusion. NeuroImage, 102, 3–10. [DOI] [PubMed] [Google Scholar]
  27. Wallentin M, Roepstorff A, Glover R and Burgess N (2006) Parallel memory systems for talking about location and age in precuneus, caudate and broca’s region. NeuroImage, 32, 1850–1864. [DOI] [PubMed] [Google Scholar]
  28. Wang W, Nelson S and Albert JM (2013) Estimation of causal mediation effects for a dichotomous outcome in multiple-mediator models using the mediation formula. Statistics in Medicine, 32, 4211–4228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48, 817–838. [Google Scholar]
  30. Zhang Z, Allen GI, Zhu H and Dunson D (2019) Tensor network factorizations: Relationships between brain structural connectomes and traits. NeuroImage, 197, 330–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Zhao Y, Lindquist MA and Caffo BS (2020) Sparse principal component based high-dimensional mediation analysis. Computational Statistics & Data Analysis, 142, 106835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Zhao Y and Luo X (2016) Pathway lasso: estimate and select sparse mediation pathways with high dimensional mediators. Preprint. Available at: http://arXiv.org/abs/1603.07749.
  33. Zhao Y, Wang B, Mostofsky S, Caffo B and Luo X (2019) Covariate assisted principal regression for covariance matrix outcomes. Biostatistics. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Zou H (2006) The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429. [Google Scholar]
  35. Zou H and Hastie T (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 301–320. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

RESOURCES