Abstract
Mediation analysis attempts to determine whether the relationship between an independent variable (e.g., exposure) and an outcome variable can be explained, at least partially, by an intermediate variable, called a mediator. Most methods for mediation analysis focus on one mediator at a time, although multiple mediators can be jointly analyzed by structural equation models that account for correlations among the mediators. We extend the use of structural equation models for analysis of multiple mediators by creating a sparse group lasso penalized model such that the penalty considers the natural groupings of parameters that determine mediation, as well as encourages sparseness of the model parameters. This provides a way to simultaneously evaluate many mediators and select those that have the most impact, a feature of modern penalized models. Simulations are used to illustrate the benefits and limitations of our approach, and application to a study of DNA methylation and reactive cortisol stress following childhood trauma discovered two novel methylation loci that mediate the association of childhood trauma scores with reactive cortisol stress levels. Our new methods are incorporated into R software called regmed.
Keywords: elastic net, graphical lasso, seemingly unrelated regression, sparse group lasso, structural equation models
Introduction
Understanding how biological variables are interconnected and influence human traits is a goal of many studies aimed to improve human health, such genetic epidemiology studies that aim to determine the interplay of environmental factors, genetic variants, and biomarkers on the risk of disease. Many computational approaches have been developed to learn how variables are connected, under a broad framework called systems biology, such as Bayesian networks or probabilistic graphical models (Koller & Friedman, 2009). Although flexible, these approaches require very large sample sizes to provide robust estimates of the very large number of estimated parameters. In contrast, when specific hypotheses can be formulated about how variables are interconnected, a more focused approach can provide greater power, and deeper biological insights. Mediation analysis provides this avenue of focused hypotheses. Our motivation to develop new statistical methods for mediation analysis is to determine if an exposure variable influences any of a large number of potential mediators, and whether any of the mediators in turn influence an outcome variable.
Mediation analysis attempts to determine whether the relationship between an independent variable (e.g., exposure) and an outcome variable can be explained, at least partially, by an intermediate variable, called a mediator. The concept of intermediate traits has a long history in genetic epidemiology, such as endophentoypes that link genetic variation with clinical symptoms in the study of psychiatric genetics (Gottesman & Gould, 2003), or more recent studies that evaluate the role of smoking behaviors as mediators between genetic variants and the risk of lung cancer (T. VanderWeele et al., 2012). An advantage of mediation analysis is that it can provide deeper biological insights about the potential casual mechanisms that lead to the observed association between exposure and outcome variables. Some limitations of mediation analysis include the assumptions required for valid inference, including the assumed causal order of variables in a model, no unmeasured confounding variables, independence of model residuals, and no interaction of exposure and mediators on the outcome variable (D. MacKinnon, 2008).
Statistical mediation models can be used to test whether an exposure variable causes variation in a mediator variable, which in turn causes variation in an outcome variable (D. MacKinnon, 2008; T. Vanderweele, 2015). For quantitative variables, linear regression models are often used to partition the total effect of an exposure variable on the outcome variable into a direct effect of exposure on the outcome, and an indirect effect that passes through the mediator (D. P. MacKinnon, Lockwood, Hoffman, West, & Sheets, 2002; T. J. VanderWeele, 2016). Traditional analysis of a single mediator is based on two regression models that involve the outcome (), the mediator (), and the exposure ():
The parameter links with , and the parameter links with , emphasizing that both and are needed to complete the linkage from to to . The parameter represents the direct effect of on (i.e., adjusted for the effect of the mediator). The null hypothesis of no mediation can be stated as , which implies that both and must differ from zero in order to infer that mediation exists. A common statistical test for this hypothesis is based on the product of estimated coefficients, (D. P. MacKinnon et al., 2002). These regression models can be extended for multiple mediators by regressing simultaneously on all mediators, and regressing each mediator on exposure. The total indirect effect of all mediators is then estimated by the sum of products, . Because is simultaneously regressed on all mediators, the effects of the mediators are adjusted for each other, with the result that the total indirect effect does not require knowing the ordering of the causal effects of the mediators on each other (T. J. VanderWeele & Vansteelandt, 2014b). These methods, like many others developed for mediation analysis, are based on parametric models, often linear models. More general models, and nonparametric approaches, have been recently developed (Imai, Keele, & Tingley, 2010; Imai, Keele, & Yamamoto, 2010), yet are limited to a small number of mediators.
Recent efforts to model high-dimensional biomarkers (e.g., metabolite levels or gene expression) that could potentially act as mediators have focused on latent factors, whereby the unobserved latent factors are treated as mediators, and the latent factors influence the high-dimensional biomarkers (Albert, Geng, & Nelson, 2016; Derkach, Pfeiffer, Chen, & Sampson, 2019; Huang & Pan, 2016). This approach allows the latent factors to be a weighted combination of the high-dimensional variables, hence reducing the dimension of the mediation models. Although a computational advantage, this approach does not detect biomarkers that directly mediate the effect of exposure on the outcome.
Alternatively, structural equation models (SEMs) are preferred for multiple mediators because they can be used to simultaneously estimate the parameters representing direct and indirect effects, they provide explicit ways to model the relationships among the variables, and they account for the correlations among the mediators (Hoyle, 2012; Preacher & Hayes, 2008; T. J. VanderWeele & Vansteelandt, 2014a). This approach offers several advantages. First, the regression of on all mediators, and the regression of each mediator on exposure, can all be achieved with a single model, allowing one to assess the fit of a model, to contrast with competing models. Second, the correlations among the residuals after regressing each mediator on exposure can be incorporated into the model to improve statistical efficiency. Although SEMs allow both measured variables and hypothetical latent variables, when only measured variables are included in the model, which is the case for standard mediation analysis, these types of models reduce to the approach of path analysis (Li, 1975), originated by Sewall Wright (Wright, 1921, 1923). Our approach is based on the framework SEMs without latent variables, but penalized so that a large number of mediators can be simultaneously fit to data. Prior work on penalized SEMs have been developed for general SEMs (Jacobucci, Grimm, & McArdle, 2016; Serang, Jacobucci, Brimhall, & Grimm, 2017), but the generality of the models can lead to difficulties in finding an optimal solution and long computation times, and the corresponding R package regsem failed to converge for the data on DNA methylation in our applications. To overcome these limitations, we propose new types of penalties and more efficient computational algorithms tailored for mediation analyses.
Methods
Notation and Statistical Models
Let denote a quantitative exposure variable for the subject (), let denote a quantitative outcome variable, and let denote a quantitative mediator (). The assumed relationships among these random variables are depicted in the directed acyclic graph (DAG) of Figure 1. A complete graphical representation of an SEM would include double-headed arrows that denote variances and covariances. Although these are not presented in Figure 1 to simplify its presentation, we include in our models the variance of (), the variance of (), and the covariance matrix for the mediators, . We assume that the residuals of the mediators are independent from the residuals of and independent from . Based on these assumptions, our SEM approach will account for the covariance structure of all variables. Let the random vector of all variables be arranged as , where, is the vector of all mediators for a subject. We assume that is centered about its mean and has a multivariate normal distribution with covariance matrix . The structure of is determined by the assumed SEM.
Figure 1.
Directed acyclic graph depicting assumed relationships among exposure (), mediators , and outcome (). The parameters and represent the indirect effect of on via the mediator ( the effect of on ; the effect of on ), and represents the direct effect of on .
With this setup, we can use the reticular action model (RAM) that provides a 1:1 mapping of the parameters in a graphical representation of an SEM to matrices that determine the implied covariance matrix (Jacobucci et al., 2016; McArdle, 2005). The general RAM allows for latent variables which do not exist for our proposed mediation model, so these terms will be ignored in our presentation. The RAM matrices include two matrices. The asymmetric matrix represents the parameters for the directional effects of one-headed arrows (conceptually as column labels of matrix A pointing down to row labels of matrix A). The symmetric matrix represents all two-headed arrows (i.e., all variances and covariances). Based on how variables are arranged in the vector and the parameters in the SEM of Figure 1, the matrix is represented below, with row and column labels for the variables and with 0’s denoted as empty cells:

Because points to the mediators, the first column contains the parameters; the direct effect of pointing to implies the first column also contains . With each of the mediators pointing to , the last row contains the parameters. All other cells of matrix contain 0.
The symmetric matrix, for variances and covariances, is block diagonal with scalar , symmetric matrix , and scalar along the diagonal:

Based on the above matrices A and S, the implied covariance matrix can be expressed as . In the Appendix we illustrate how the special structure of these matrices allows analytic formulas to rapidly compute .
The minus twice log likelihood divided by is
| (1) |
where is the log of the determinant of , is the trace of a matrix, and is the sample covariance matrix. Note that the lnlike in expression (1) is the same as that used for sparse estimation of variances and covariances with penalties imposed on the sum of the absolute values of the elements of (Friedman, Hastie, & Tibshirani, 2008), by the popular glasso R package. The difference in our approach is use of the SEM to structure the implied covariance matrix, and the types of penalties we impose on the parameters.
Recently, Zhao and Luo (Zhao & Luo, 2016) proposed a penalty for high-dimensional mediators that depended on the absolute value of the product of coefficients for a mediator, . This parallels the single-mediator Sobel statistic that depends on the product of coefficients. Although the complete penalty of Zhao and Luo (Zhao & Luo, 2016) had more terms, and their loss function was not the same as the log likelihood in expression (1), simulations by Duncan Thomas (personal communication), based on a Bayesian version of the Zhao and Luo product of coefficients, showed the limitation of basing penalties on the product. If is large yet is small, the product would also be small, resulting in a small penalty on (and vice versa). This can result in the non-mediators being shrunk less than the mediators. This imbalance of shrinkage between mediators and non-mediators was caused by using in the penalty function. As an alternative penalized SEM, Jacobucci et al. (Jacobucci et al., 2016) used either an L1 penalty (e.g., ) or L2 penalty (e.g., ) on all parameters, for general penalized SEMs. In contrast, we propose a different type of penalty, based on sparse group lasso that is tailored for mediation analyses.
Because each mediator has two coefficients, and , we treat the pair as a group, and impose a sparse group penalty function, while carefully balancing the penalty placed on the many mediators versus the single coefficient that represents the direct effect of on . The proposed penalty function is
| (2) |
The penalty parameter governs the overall amount of shrinkage, is a weight to balance the shrinkage of the direct effect versus all the mediators, and is the fraction of the penalty assigned to a lasso L1 penalty. This fraction provides sparseness in selection of either or . The grouping is encouraged by the L2 terms (square-root of sums of squares) (T. Wu & Lange, 2008). Adding expressions (1) and (2) results in the penalized that we need to minimize,
| (3) |
Optimization Methods
The parameters to update in order to minimize the penalized lnlike are the penalized parameters , , and , and the un-penalized parameters and . The covariance matrix for the mediator is also unknown. However, optimizing the penalized lnlike for all the terms in the matrix can lead to long computation times and numerical instability. For this reason, we first use concepts from seemingly unrelated regression models (Zellner, 1962) to provide a robust estimate of , and then perform a single regularization of this matrix in order to assure that it is of full rank. This matrix is then fixed during our optimization algorithm.
To estimate , we first perform ordinary least squares regression of each on to create a vector of residuals for each subject (vector of length for the mediators). These residuals are used to estimate the sample variance matrix , a method used by seemingly unrelated regression (Zellner, 1962) when computing generalized least squares. Because this matrix is not full rank when some mediators are highly correlated, or when , we regularize this matrix by using the glasso R package (Friedman et al., 2008). This places an L1 penalty on terms in , shrinking small values to zero, based on a penalty parameter. We use a small penalty in order to achieve a full rank matrix (i.e., penalty of 0.02), denoted .
To estimate the SEM parameters, we use an algorithm that iteratively updates , , (for ), and . Our algorithm closely follows the approach developed for sparse group-lasso (Simon, Friedman, Hastie, & Tibshirani, 2013), but adapted for our proposed SEM. Some key aspects of this gradient descent algorithm are: 1) the use of sub-gradient equations to account for penalty functions that are not differentiable for some values of the parameters, such as the sparse group penalty; 2) use of majorization-minimization, a way to expand the objective function such that the new function dominates the original objective function, yet minimizing the new function is easier and leads to the same solution (Lange, 2004).
The algorithm is a sequence of nested loops. The outer loop cycles over several inner loops: an inner loop to update , inner loops to update each pair of and , an inner loop to update , and an inner loop to update . When updating parameters within a loop, all other parameters remain fixed. Each of the inner loops follows the same algorithm, with differences determined by how the derivatives of the differ across the different parameters, and how the parameters are updated. Figure 2 illustrates the steps of the inner loop for updating a pair and to illustrate the general algorithm, and details for derivatives and parameter updates for all parameters are provided in the Appendix.
Figure 2.
Steps of inner loop of iterative algorithm, to update a pair of parameters, and .
We created an R package (regmed) that implements the optimization algorithm with efficient C++ code, based on the linear algebra library in the package RcppArmadillo. To choose a value of penalty parameter , we search a grid of values, starting with a large value that shrinks all penalized parameters to zero. The estimated parameters for a specified value of are used as initial values for the next smaller grid value of . The grid value that results in the smallest Bayesian Information Criterion (BIC) is chosen for the best fitting model. Although our models are conceptually similar to regularization of general SEMs (Jacobucci et al., 2016), our sparse group lasso penalty for mediation groupings, the details of our iterative algorithm, and our computational efficiencies make regmed unique. As a confirmation of our code, we compared results from our regmed software without any penalties to the sem function within the lavaan R library that fits unpenalized SEMs, and found they give numerically equivalent results.
Simulations
To evaluate the statistical properties of our methods, we simulated multivariate normal data with an implied covariance matrix determined by a set of assumed models. We assumed an exposure variable with variance , an outcome variable with variance , and a direct effect of . We assumed 20 potential mediators with a compound symmetric covariance structure, with diagonal variances of 1 and off-diagonal covariances all equal to (). Our simulations allowed for the presence or absence of effects due to mediators or non-mediators. The general model is depicted in Figure 3. The variables and were allowed to either have no effect ( all 0) or be true mediators ( all non-zero). The variables either had no effect or had effects that were not mediation (i.e., either influenced or influenced , but not both). In Figure 3, the effects of the mediators (red arrows) are denoted and . In contrast, the effects of the non-mediators (blue arrows) are denoted and . The magnitude of the effect sizes varied from no effect (0) to small effect (0.1) to large effect (0.5). All other terms were not influenced by and did not influence . We simulated sample sizes of either 100 or 500, and analyzed the data with the fraction () of the penalty assigned to lasso set to values of 0.5, 0.8, 0.9, and 0.99. For all simulations, the weight to balance the shrinkage of the direct effect versus all the mediators was set to . For each scenario, we repeated 100 simulations in order to estimate the frequency for which the mediators were selected (either as false or true positives), and the frequency for which the non-mediators were falsely selected to be mediators, or correctly selected as non-mediators.
Figure 3.
Directed acyclic graph illustrating model used in simulations. Exposure influenced mediators and with effect sizes and , and these mediators influenced with effect sizes and (red lines indicate mediation paths). Exposure influenced non-mediators and with effect sizes and . Also, the non-mediators and influenced with effect sizes and (blue lines indicate effects that are not mediation). The direct effect of on is denoted .
Application: DNA Methylation and Reactive Cortisol Stress Following Childhood Trauma
Hautepen et al. (Houtepen et al., 2016) reported the association of genome-wide DNA methylation levels with cortisol stress reactivity, and reported that a methylation locus (cg27512205) in the gene KITLG was strongly associated with cortisol levels, and that this locus mediated 32% of the influence of childhood trauma on cortisol stress reactivity. However, their analytic strategy was focused on first finding a methylation locus most strongly associated with cortisol levels, which can limit finding mediation effects, because mediators require both the association of the exposure variable (childhood trauma) with a mediator and the association of the mediator with the outcome (cortisol level). Furthermore, their approach was based on analyzing one methylation locus at a time. Marginal associations ignore the correlations among mediators and can miss mediators that are relevant when conditioning on other mediators (Guyon & Ellisseff, 2003). To evaluate the utility of our proposed penalized mediation model, we applied it to the publically available data by Hautepen et al. (Houtepen et al., 2016).
The data are briefly summarized, with full details provided elsewhere (Houtepen et al., 2016). The discovery data by Houtepen et al. (Houtepen et al., 2016) that we analyzed included 85 healthy subjects whose cortisol levels were measured during eight 15 minute intervals; before, during, and after stress was induced by the Trier Social Stress Test (TSST). The TSST stress test required subjects to prepare and deliver a speech, and verbally respond to arithmetic problems in front of an audience. The increase in cortisol level induced by stress was measured as the increase in the cortisol area under the curve (AUCi), and this was used as the outcome variable. The exposure variable was based on a Childhood Trauma Questionnaire. The potential mediators were the processed 385,882 DNA methylation loci based on Illumina Infinium Human Methylation 450K BeadChip (normalized, and based on M values, the log2 ratio of methylation probe intensity).
Results
Simulation Results
Simulation results for when there were no mediation effects are summarized in Table 1. The frequency of falsely selecting either or as mediators was often 0, and no greater than 0.05 for sample size of 100, and always 0 for sample size of 500. When the non-mediators (e.g., ) had weak effects, they were occasionally (e.g., frequency less than 0.03) misclassified as being mediators for sample size of 100, but never falsely selected as mediators for sample size of 500. When the non-mediators had strong effects, for example, when the true parameters were , the estimated parameters and were sometimes non-zero, falsely inferring to be a mediator. The frequency of falsely selecting any variables as mediators was largest for sample size of 100, when correlation among mediators was largest (up to 0.5), and when the fraction of lasso penalty was 0.5. In contrast, for sample size of 500, the frequency of falsely inferring any variables as mediators dramatically decreased when the fraction of lasso penalty was large (e.g., ) and the correlation among mediators was less than 0.5. For example, when the correlation among mediators was 0.2, the frequency of falsely selecting any mediators was no greater than 0.07 when using . For larger correlation of 0.5, was needed to control the frequency of false selection of any mediators to be 0.10. In summary, factors that decreased the frequency of falsely selecting any mediators were larger sample size and a larger fraction of lasso penalty when the correlation among mediators was large.
Table 1.
Simulation results for when there are no true mediators, but possible effects of non-mediators.
| Non-Mediator Effect Size | Effect Sizes | Med. Cor. | Frac. Lasso | N = 100 | N = 500 | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mediator | Non-Mediator | Frq. Select m1 or m2 | Frq. Select m3,…,m20 | Frq. Select α3 or α4 | Frq. Select β5 or β6 | Frq. Select α7,…,α20 | Frq. Select β7,…,β20 | Frq. Select m1 or m2 | Frq. Select m3,…,m20 | Frq. Select α3 or α4 | Frq. Select β5 or β6 | Frq. Select α7,…,α20 | Frq. Select β7,…,β20 | ||||||
| α | β | α | β | ||||||||||||||||
| Absent | 0 | 0 | 0 | 0 | 0 | 0.5 | 0 | 0 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0 | 0 | 0 | 0 | 0 | 0.8 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 0 | 0.9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 0 | 0.99 | 0 | 0 | 0.01 | 0 | 0.04 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 0.2 | 0.5 | 0 | 0 | 0.02 | 0 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 0.2 | 0.8 | 0 | 0 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 0.2 | 0.9 | 0 | 0 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 0.2 | 0.99 | 0 | 0 | 0 | 0 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 0.5 | 0.5 | 0 | 0 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 0.5 | 0.8 | 0 | 0.01 | 0 | 0 | 0.04 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 0.5 | 0.9 | 0 | 0 | 0.01 | 0 | 0.02 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 0 | 0.5 | 0.99 | 0 | 0 | 0 | 0 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Weak | 0 | 0 | 0.1 | 0.1 | 0 | 0.5 | 0 | 0 | 0.01 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0 | 0 | 0.1 | 0.1 | 0 | 0.8 | 0 | 0.01 | 0.04 | 0 | 0.04 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0.1 | 0.1 | 0 | 0.9 | 0 | 0 | 0.03 | 0 | 0.03 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0.1 | 0.1 | 0 | 0.99 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0.1 | 0.1 | 0.2 | 0.5 | 0 | 0.01 | 0.01 | 0.01 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0.1 | 0.1 | 0.2 | 0.8 | 0 | 0.02 | 0.03 | 0 | 0.01 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0.1 | 0.1 | 0.2 | 0.9 | 0 | 0 | 0.05 | 0 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0.1 | 0.1 | 0.2 | 0.99 | 0 | 0 | 0.03 | 0 | 0.03 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0.1 | 0.1 | 0.5 | 0.5 | 0 | 0.02 | 0.05 | 0 | 0.03 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0.1 | 0.1 | 0.5 | 0.8 | 0.01 | 0.03 | 0.09 | 0.03 | 0.09 | 0.04 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0.1 | 0.1 | 0.5 | 0.9 | 0 | 0 | 0.06 | 0 | 0.03 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 0 | 0 | 0.1 | 0.1 | 0.5 | 0.99 | 0 | 0.01 | 0.07 | 0.01 | 0.09 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| Strong | 0 | 0 | 0.5 | 0.5 | 0 | 0.5 | 0.02 | 0.69 | 0.99 | 0.98 | 0.54 | 0.31 | 0 | 0.15 | 1 | 1 | 0.01 | 0 | |
| 0 | 0 | 0.5 | 0.5 | 0 | 0.8 | 0 | 0.33 | 1 | 1 | 0.46 | 0.29 | 0 | 0.01 | 1 | 1 | 0.01 | 0 | ||
| 0 | 0 | 0.5 | 0.5 | 0 | 0.9 | 0 | 0.36 | 1 | 0.99 | 0.58 | 0.35 | 0 | 0.01 | 1 | 1 | 0 | 0 | ||
| 0 | 0 | 0.5 | 0.5 | 0 | 0.99 | 0.02 | 0.27 | 1 | 0.99 | 0.58 | 0.34 | 0 | 0 | 1 | 1 | 0 | 0 | ||
| 0 | 0 | 0.5 | 0.5 | 0.2 | 0.5 | 0.04 | 0.89 | 1 | 0.96 | 0.53 | 0.36 | 0 | 0.82 | 1 | 1 | 0 | 0 | ||
| 0 | 0 | 0.5 | 0.5 | 0.2 | 0.8 | 0 | 0.48 | 1 | 0.97 | 0.47 | 0.32 | 0 | 0.07 | 1 | 1 | 0 | 0 | ||
| 0 | 0 | 0.5 | 0.5 | 0.2 | 0.9 | 0.01 | 0.36 | 1 | 1 | 0.49 | 0.31 | 0 | 0.02 | 1 | 1 | 0 | 0 | ||
| 0 | 0 | 0.5 | 0.5 | 0.2 | 0.99 | 0 | 0.24 | 1 | 1 | 0.53 | 0.27 | 0 | 0.02 | 1 | 1 | 0 | 0 | ||
| 0 | 0 | 0.5 | 0.5 | 0.5 | 0.5 | 0.05 | 1 | 1 | 0.82 | 0.49 | 0.4 | 0 | 1 | 1 | 1 | 0 | 0 | ||
| 0 | 0 | 0.5 | 0.5 | 0.5 | 0.8 | 0.01 | 0.85 | 1 | 0.99 | 0.54 | 0.44 | 0 | 0.79 | 1 | 1 | 0 | 0.01 | ||
| 0 | 0 | 0.5 | 0.5 | 0.5 | 0.9 | 0.01 | 0.58 | 1 | 1 | 0.54 | 0.43 | 0 | 0.33 | 1 | 1 | 0 | 0.02 | ||
| 0 | 0 | 0.5 | 0.5 | 0.5 | 0.99 | 0.02 | 0.42 | 1 | 1 | 0.42 | 0.44 | 0 | 0.1 | 1 | 1 | 0 | 0.05 | ||
Table 1 also illustrates the frequency of selecting at least one non-mediator (among and , and the frequency of selecting at least one non-mediator (among and . When the true parameters were 0, the frequency of falsely selecting an or was no greater than 0.02 for sample size of 100, and the frequency was 0 for sample size of 500. When the true parameters were non-zero, Table 1 illustrates that the power to correctly select these parameters as non-mediators increased with larger effect size and sample size. Finally, the frequency of falsely selecting a null parameter (i.e., true value of or was zero) was no greater than 0.09, and often much lower, when the non-mediators had weak effects, but increased to much higher frequencies when sample size was 100. For sample size of 500, the frequency of falsely selecting a null parameter was no greater than 0.05, and often close to 0.
The power to detect at least one true mediator is presented in Table 2. For these simulations, all non-mediators had no effects. For small effects (e.g., ), there was very little power, even with sample size of 500. As the mediator effect sizes increased, or as the correlation among mediators increased, power increased. There is a trade-off between increased power and increased false-inference of mediation governed by the fraction of the penalty assigned to the lasso penalty. Table 1 illustrates that larger values of decrease the chance of false inference of mediation, while Table 2 illustrates that larger values of can also decrease power to detect mediation. Based on results of Tables 1 and 2, a value of provides a reasonable balance. Table 2 also illustrates that when the mediation effect sizes are large and sample size is large (N=500), the estimated model tends to converge to the true model, with high power to detect the mediators, little chance of falsely inferring other variables to be mediators, and little chance of declaring a null or to be non-zero.
Table 2.
Simulation results for power to detect mediation, the frequency of falsely inferring mediation among non-mediators, and the frequency of falsely inferring a null parameter.
| Mediator Effect Size | Med. Cor. | Frac. Lasso | N=100 | N=500 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| α | β | Frq. Select m1 or m2 | Frq. Select m3,…,m20 | Frq. Select α3 or α4 | Frq. Select β5 or β6 | Frq. Select α7,…,α20 | Frq. Select β7,…,β20 | Frq. Select m1 or m2 | Frq. Select m3,…,m20 | Frq. Select α3 or α4 | Frq. Select β5 or β6 | Frq. Select α7,…,α20 | Frq. Select β7,…,β20 | |||
| 0.1 | 0.1 | 0 | 0.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.8 | 0 | 0 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.9 | 0 | 0 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.99 | 0 | 0 | 0.01 | 0 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.2 | 0.5 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
| 0.8 | 0.01 | 0 | 0 | 0 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.9 | 0 | 0 | 0 | 0 | 0.04 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.99 | 0 | 0.01 | 0.01 | 0.02 | 0.04 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.5 | 0.5 | 0.03 | 0.03 | 0.02 | 0.01 | 0.04 | 0.03 | 0 | 0 | 0 | 0 | 0 | 0 | |||
| 0.8 | 0 | 0.02 | 0.03 | 0.02 | 0.05 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.9 | 0.01 | 0 | 0.03 | 0.01 | 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.99 | 0.01 | 0 | 0.02 | 0 | 0.07 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.2 | 0.2 | 0 | 0.5 | 0.13 | 0.03 | 0.03 | 0 | 0.09 | 0.03 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.8 | 0.07 | 0.02 | 0.01 | 0 | 0.04 | 0.04 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.9 | 0.07 | 0 | 0.05 | 0 | 0.07 | 0.01 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.99 | 0.06 | 0.02 | 0.03 | 0.01 | 0.1 | 0.03 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.2 | 0.5 | 0.23 | 0.04 | 0 | 0.02 | 0.14 | 0.04 | 0.02 | 0 | 0 | 0 | 0 | 0 | |||
| 0.8 | 0.12 | 0 | 0.02 | 0 | 0.1 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.9 | 0.08 | 0 | 0.05 | 0.01 | 0.06 | 0 | 0.01 | 0 | 0 | 0 | 0.01 | 0 | ||||
| 0.99 | 0.12 | 0 | 0.03 | 0.01 | 0.11 | 0.04 | 0.02 | 0 | 0 | 0 | 0.01 | 0 | ||||
| 0.5 | 0.5 | 0.38 | 0.06 | 0.05 | 0.02 | 0.13 | 0.07 | 0.48 | 0.01 | 0.01 | 0 | 0.01 | 0.01 | |||
| 0.8 | 0.31 | 0.03 | 0.06 | 0.01 | 0.2 | 0.1 | 0.34 | 0 | 0 | 0 | 0.02 | 0.03 | ||||
| 0.9 | 0.22 | 0.02 | 0.04 | 0.02 | 0.18 | 0.07 | 0.3 | 0 | 0 | 0 | 0.03 | 0.01 | ||||
| 0.99 | 0.25 | 0.03 | 0.06 | 0.02 | 0.24 | 0.17 | 0.3 | 0 | 0 | 0 | 0.03 | 0.06 | ||||
| 0.3 | 0.3 | 0 | 0.5 | 0.58 | 0.11 | 0.04 | 0.01 | 0.29 | 0.14 | 0.82 | 0 | 0 | 0 | 0.01 | 0 | |
| 0.8 | 0.45 | 0.09 | 0.08 | 0.04 | 0.27 | 0.16 | 0.85 | 0 | 0 | 0.02 | 0.04 | 0 | ||||
| 0.9 | 0.48 | 0.04 | 0.04 | 0.01 | 0.32 | 0.17 | 0.77 | 0 | 0 | 0 | 0.04 | 0.03 | ||||
| 0.99 | 0.46 | 0.05 | 0.11 | 0.02 | 0.36 | 0.13 | 0.7 | 0 | 0 | 0 | 0.05 | 0.02 | ||||
| 0.2 | 0.5 | 0.66 | 0.09 | 0.09 | 0.03 | 0.26 | 0.1 | 0.98 | 0 | 0 | 0 | 0.01 | 0.01 | |||
| 0.8 | 0.67 | 0.09 | 0.11 | 0.01 | 0.39 | 0.22 | 0.92 | 0 | 0.01 | 0 | 0.06 | 0.02 | ||||
| 0.9 | 0.65 | 0.03 | 0.07 | 0.02 | 0.43 | 0.18 | 0.88 | 0 | 0.01 | 0 | 0.03 | 0.05 | ||||
| 0.99 | 0.61 | 0.06 | 0.14 | 0.05 | 0.41 | 0.21 | 0.9 | 0 | 0 | 0 | 0.07 | 0.07 | ||||
| 0.5 | 0.5 | 0.96 | 0.22 | 0.11 | 0.04 | 0.46 | 0.24 | 1 | 0 | 0 | 0.01 | 0 | 0 | |||
| 0.8 | 0.92 | 0.07 | 0.11 | 0.04 | 0.52 | 0.26 | 1 | 0 | 0 | 0 | 0.01 | 0 | ||||
| 0.9 | 0.73 | 0.09 | 0.14 | 0 | 0.52 | 0.22 | 1 | 0 | 0 | 0.01 | 0.04 | 0.02 | ||||
| 0.99 | 0.77 | 0.07 | 0.13 | 0.1 | 0.58 | 0.23 | 1 | 0 | 0 | 0 | 0.08 | 0.06 | ||||
| 0.4 | 0.4 | 0 | 0.5 | 0.95 | 0.14 | 0.14 | 0.04 | 0.4 | 0.18 | 1 | 0 | 0 | 0 | 0 | 0 | |
| 0.8 | 0.94 | 0.09 | 0.13 | 0.05 | 0.53 | 0.25 | 1 | 0 | 0 | 0 | 0.02 | 0.01 | ||||
| 0.9 | 0.89 | 0.07 | 0.15 | 0.05 | 0.52 | 0.25 | 1 | 0 | 0 | 0 | 0.01 | 0.01 | ||||
| 0.99 | 0.94 | 0.09 | 0.17 | 0.06 | 0.63 | 0.36 | 1 | 0 | 0 | 0 | 0.02 | 0.02 | ||||
| 0.2 | 0.5 | 0.99 | 0.12 | 0.09 | 0.02 | 0.42 | 0.14 | 1 | 0 | 0 | 0 | 0 | 0 | |||
| 0.8 | 0.98 | 0.11 | 0.2 | 0.07 | 0.5 | 0.26 | 1 | 0 | 0 | 0 | 0 | 0.01 | ||||
| 0.9 | 0.97 | 0.07 | 0.08 | 0.06 | 0.58 | 0.24 | 1 | 0 | 0 | 0 | 0.01 | 0 | ||||
| 0.99 | 0.93 | 0.1 | 0.11 | 0.11 | 0.47 | 0.41 | 1 | 0 | 0 | 0 | 0 | 0.01 | ||||
| 0.5 | 0.5 | 1 | 0.16 | 0.02 | 0.03 | 0.35 | 0.18 | 1 | 0 | 0 | 0 | 0 | 0 | |||
| 0.8 | 1 | 0.12 | 0.12 | 0.04 | 0.59 | 0.23 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.9 | 0.98 | 0.07 | 0.12 | 0.05 | 0.45 | 0.26 | 1 | 0 | 0 | 0 | 0 | 0.02 | ||||
| 0.99 | 0.98 | 0.03 | 0.09 | 0.05 | 0.51 | 0.32 | 1 | 0 | 0 | 0 | 0.02 | 0 | ||||
| 0.5 | 0.5 | 0 | 0.5 | 1 | 0.06 | 0.06 | 0.02 | 0.33 | 0.09 | 1 | 0 | 0 | 0 | 0 | 0 | |
| 0.8 | 1 | 0.02 | 0.09 | 0.03 | 0.49 | 0.15 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.9 | 0.99 | 0.07 | 0.11 | 0.02 | 0.49 | 0.23 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.99 | 1 | 0.02 | 0.17 | 0.05 | 0.53 | 0.25 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.2 | 0.5 | 1 | 0.08 | 0.06 | 0.01 | 0.25 | 0.08 | 1 | 0 | 0 | 0 | 0 | 0 | |||
| 0.8 | 1 | 0.05 | 0.05 | 0.03 | 0.41 | 0.18 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.9 | 1 | 0.03 | 0.11 | 0.04 | 0.34 | 0.21 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.99 | 0.99 | 0.05 | 0.13 | 0.04 | 0.48 | 0.24 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.5 | 0.5 | 1 | 0.1 | 0.06 | 0.01 | 0.23 | 0.07 | 1 | 0 | 0 | 0 | 0 | 0 | |||
| 0.8 | 1 | 0.08 | 0.05 | 0.03 | 0.39 | 0.18 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.9 | 1 | 0.09 | 0.1 | 0.03 | 0.51 | 0.21 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
| 0.99 | 1 | 0.01 | 0.11 | 0.04 | 0.42 | 0.25 | 1 | 0 | 0 | 0 | 0 | 0 | ||||
Results: DNA Methylation and Reactive Cortisol Stress Following Childhood Trauma
For the mediation analyses presented here, we treated childhood trauma (a score ranging 24–63) as the x exposure variable, DNA methylation loci as a potential mediators, and cortisol increase AUCi as the outcome variable. We regressed AUCi on both sex and age to adjust for these covariates, and used the residuals as the adjusted outcome, y. Because it is not possible to fit all DNA methylation loci as potential mediators with our methods for a sample size of 85 subjects, we reduced the number of potential mediators by using sure independence screening (Fan & Lv, 2008). This approach is based on ranking marginal correlations and then selecting the highest ranked values such that the number of parameters is less than the sample size. Because mediation depends on the two correlations, and , where is a potential mediator (e.g., DNA methylation locus),we ranked the absolute values of their products, . From these, we chose the highest 40 ranked values to determine which potential mediators to include in our penalized mediation models. This results in potentially 80 parameters for mediation, plus a parameter for the direct effect, and parameters and . In addition, we centered and scaled all mediators, as well as the and variables. This effectively allowed us to model the correlation structure among , mediators, and . Note that because only has the role of an independent variable, not dependent variable (i.e., all arrows point away from ), does not change from its initial value, which is 1.0 when is standardized. In contrast, when the mediator parameters or direct effect differ from 0, decreases from its variance of 1 (when y is standardized), because some of the variance of y is explained by the model.
By fitting the penalized mediation model with sparseness parameter , we determined the best fitting model by the minimum BIC over a grid of values. This resulted in selection of four methylation loci as possible mediators between childhood trauma and cortisol reactivity. Results are summarized in Table 3 and graphically depicted in Figure 4. The probe cg01644731 is closest to the gene CTDSP2 on chromosome 12, the probe cg06890779 is near the gene ATF7IP2 on chromosome 16, the probe cg25626453 is closest to the gene PRRC2A on chromosome 6, and the probe cg26801646 is closest to the gene VDAC2 on chromosome 10.
Table 3.
Summary of mediation model fit to DNA methylation and reactive cortisol stress following childhood trauma, along with univariate Sobel tests for mediation, which ignores correlation among mediators.
| Mediator (gene) |
Univariate Sobel Test p-value |
|||||
|---|---|---|---|---|---|---|
| cg01644731 (CTDSP2) |
0.32 | 0.10 | −0.30 | 0.09 | −0.09 | 0.037 |
| cg06890779 (ATF7IP2) |
−0.37 | 0.10 | 0.14 | 0.09 | −0.05 | 0.100 |
| cg25626453 (PRRC2A) |
−0.−29 | 0.10 | 0.37 | 0.09 | −0.11 | 0.038 |
| cg26801646 (VDAC2) |
−0.32 | 0.10 | 0.28 | 0.09 | −0.09 | 0.039 |
| Total | −0.34 | |||||
| 0.07 | ||||||
| Grand Total | −0.27 | |||||
| −0.27 | ||||||
Figure 4.
Illustration of best fitting mediation model for the study of DNA methylation and reactive cortisol stress following childhood trauma. The blue lines represent negative parameter estimates, and red lines positive parameter estimates. The gene mediators are ordered from top to bottom according to the size of estimated mediation effect, . Corresponding parameter estimates are provided in Table 3.
Because penalized models tend to excessively shrink selected parameters, we followed the approach of relaxed lasso (Meinshausen, 2007) whereby the model selected by the penalized approach was then refit as an unconstrained model to obtain parameter estimates that were not shrunk (presented in Table 3). We refit with our R software regmed (without a penalty), as well as with the R function sem in the lavaan package. Results of the parameter estimates were identical. A benefit of the sem function within the lavaan package is that standard errors of the parameter estimates are readily available. The and estimates are of opposite signs for each mediator, so that their product is always negative. The sum of these products is −0.30, the estimated directed effects is , and the grand total of the sum of products and is −0.27, exactly the same as the correlation between x and y. This is no accident, but rather illustrates that the mediation model of standardized variables partitions the correlation between x and y into its components of direct effect and all indirect effects. In Table 3 we also present the p-values for the marginal Sobel test for mediation.
Each of the four genes detected by our approach have evidence that they are associated with neurologic function. The gene CTDSP2 gene is involved in neuron differentiation and is involve in regulation of cytoplasmic and nuclear SMAD2/3 signaling, which superficially seems to be unrelated to childhood trauma and cortisol levels. However, this gene is a target of the micro RNA miR-26b that is encoded by an intron of the CTDSP2 primary transcript (Dill, Linder, Fehr, & Fischer, 2012). This is fascinating because changes in saliva miR-26b has been found to be associated with stress (Wiegand et al., 2018), with the same TSST stress test used by Houptepen (Houtepen et al., 2016). Furthermore, miR-26b had been reported to be associated with psychological stress responses or psychiatric disorders (Hunsberger, Austin, Chen, & Manji, 2009). This strengthens the potential biological support for our findings of cg01644731 as a mediator.
The gene ATF7IP2 codes a protein that modulates transcription regulation. This protein interacts with the protein produced by the gene ATRX, and mutations in the gene ATRX cause alpha thalassemia/mental retardation X-linked syndrome (ATR-X). The gene ATF7IP2 has also been associated with expressive language disorder (Malacards, 2020), a disorder involving difficulties with verbal and written expression characterized by an ability to use expressive spoken language.
The gene PRRC2A resides on chromosome 6 within the human major histocompatibility complex class III region, and this gene has been associated with insulin-dependent diabetes mellitus, as well as rheumatoid arthritis (Singal, Li, & Zhu, 2000). More relevant to the development of the nervous system and its potential link with stress, this gene appears to have an important role in oligodendrocyte specification and myelination, with potential for treating neurologic diseases (R. Wu et al., 2019).
The gene VDAC2 contributes to oxidative metabolism by playing a role in solute transport across the outer mitochondrial membrane by allowing diffusion of small hydrophilic molecules (Naghdi & Hajnoczky, 2016). Cortisol, a glucocorticoid, is produced and metabolized by mitochondria (Picard, McEwen, Epel, & Sandi, 2018). Furthermore, acute and chronic stress influence various aspects of mitochondrial biology (Picard & McEwen, 2018). This provides biological support for our finding of cg26801646 as a mediator of childhood trauma and cortisol reactive response.
Discussion
We proposed a penalized SEM to conduct mediation analysis when there are multiple mediators, and demonstrated by simulations and application to a study of childhood trauma and reactive cortisol stress the benefits and potential limitations of our approach. A significant benefit is the ability to simultaneously analyze multiple mediators while accounting for their correlations. This provides an advancement over examining one mediator a time, and extends the use of SEM to a large number of mediators, relative to the sample size. Although others have proposed penalized SEMs (Jacobucci et al., 2016), their penalty function is not tailored to mediation analysis, as we have done by use of sparse group lasso. Furthermore, by capitalizing on the special structure of the implied covariance matrix imposed by our mediation model, we were able to tailor our software to be much more computationally efficient than the generic regsem R software for penalized SEMs (Jacobucci et al., 2016). Finally, our approach provides a way to directly evaluate whether a biomarker directly mediates the effect of exposure on an outcome, in contrast to penalized models that assume latent factors as mediators (Derkach et al., 2019).
By simulations, we found that our approach can have a higher chance of false inference of mediators when non-mediators have very strong effects, the mediators are highly correlated, sample size is small (relative to the number of mediators), and more emphasis is placed on grouping mediators than encouraging sparse models. One way to overcome some of these limitations is to encourage sparse models by using a larger value of the sparseness penalty (we suggest ). A critical factor is the number of parameters to estimate relative to the sample size. In the study of childhood trauma and reactive cortisol stress with 85 subjects, we started with 40 potential mediators in the penalized model, which means 80 parameters for the and , a large number of parameters. By applying our methods to this study and a wide variety of simulated data, we found that the number of iterations required for models to converge increased as the number of model parameters increased, which would be expected for a relatively flat likelihood surface. When using our method, if a large number of iterations fails to converge, it might be wise to reduce the number of mediators. We also found that centering and standardizing the variables improved model convergence.
Although we derived ways to optimize our code, the iterative algorithm still requires matrix multiplication which can dominate the computation time. For example, we show in the Appendix how to efficiently compute , and how to efficiently compute , but taking the product of these matrices to compute requires on the order of operations, where is the number of mediators. This means that the our iterative algorithm is dominated by approximately the cube of . For this reason, we proposed reducing the dimension of the number of mediators to fit in a model by using sure independence screening (Fan & Lv, 2008). This entails computing the correlation of with each of the mediators, and the correlation of with each of the mediators. Then, the absolute values of the products of these correlations are ranked from largest to smallest, and the top mediators from this ranked list are chosen for modeling with our regularized mediation model. As recommended by Fan and Lv, the value of is chosen to be less than sample size , such as , or , depending on sample size (Fan & Lv, 2008).
By applying our regularized mediation model to the study of DNA methylation and reactive cortisol stress following childhood trauma, we found four methylation loci as potential mediators between childhood trauma and reactive cortisol stress, mediators not discovered in the original study by Hautepen et al. (Houtepen et al., 2016). An alternative approach for exploratory mediation analysis has been recently proposed based on two iterative loops (van Kesteren & Obersk, 2019). In the outer loop, a subset of mediators is randomly selected from all mediators. In the inner loop, a lasso penalized model is fit to select a subset of the randomly chosen mediators. Then, in the outer loop, another random subset of potential mediators is selected. Within the inner loop, to determine if any of the newly sampled potential mediators should be added to the model, the outcome (y) and exposure (x) are regressed on the prior selected mediators, to adjust both y and x for the prior selected mediators. This two-loop procedure is repeated until the frequency of selecting the different mediators stabilizes. Although this approach is appealing because it handles a large number of mediators by analyzing only a subset at a time, some limitation are that the number of selected mediators must be less than the sample size, to have a full rank matrix for regression to create residuals, and the computational intensity of algorithm. The authors of this approach also applied their method to the same data as our application data, and selected 5 methylation loci, of which only one (cg25626453 in the gene PRRC2A) overlapped with our results. Although further studies are needed to functionally validate our results, the prior biological support for our findings is strong.
Software
Software implementing the proposed tests for mediation for quantitative traits is available as an R package called “regmed” in the Comprehensive R Archive Network.
Acknowledgements
This research was supported by the U.S. Public Health Service, National Institutes of Health, contract grant number GM065450.
Appendix.
The algorithm for gradient decent depends on gradients of the , and a step-size multiplier () that moderates the updates so that the optimization function decreases. The details for sub-gradients of the penalized with respect to each of the parameters, as well as methods to update parameter estimates, are provided below. The depends on the implied variance matrix , which in turn depends on the parameters of interest. As shown elsewhere (von Oertzen & Brick, 2014), the derivative of with respect to a parameter that resides in is
where
| (A1) |
, , , and . Note that is a sparse matrix, with entries 0 except for a single value of 1 where a parameter resides (for parameters , , and ). Because the matrix does not depend on , , or , drops out of the derivatives in expression (A1) for these parameters. Likewise, is a sparse matrix, with entries 0 except for a single value of 1 where either or resides, and drops out of the derivatives in expression (A1) for and .
Methods for
Based on the current value for and expression (A1), the derivative requires the matrix , which is a square matrix of dimension with a value of 1 in the position , and 0’s elsewhere. To update , we need to consider the direction guided by the derivative , the step size , the penalty and weight , and the soft-thresholding required for the lasso L1 penalty,
The soft-thresholding function is , the same kind of update for the usual lasso estimation.
Methods for
For the mediator, we drop the subscript , and assume that and are the current values, and and are the updated values. The derivative requires the square matrix with a value of 1 in the position , and 0’s elsewhere. Similarly, the derivative requires the square matrix with a value of 1 in the position , and 0’s elsewhere. The updates for and follow the updates for grouped variables derived by Friedman et al. (Simon et al., 2013). First, apply the soft-thresholding function to each parameter, , and . Then, compute the group-wise shrinkage multiplier, , where . The updated estimates can be expressed as and . Note that when , will be zero, implying that both and. This results from the sub-gradient conditions imposed by the square-root used to group parameters.
Methods for
The derivative requires the square matrix with a value of 1 in the position , and 0’s elsewhere. Similarly, the derivative requires the square matrix with a value of 1 in the position , and 0’s elsewhere. Because and are not penalized, their updates follow the usual gradient updates: and .
Computational Efficiency
Potential computational bottlenecks in the iterative algorithms are computations of the inverse matrices and , and of . Naive inverse operations would be on the order of . However, because of the structure of these matrices, computations can be rapidly completed by closed formulas.
Computation of
The matrix is a lower triangular matrix arranged as
By noting that and using forward substitution, the solution for can be shown to be
Computation of
Because , its inverse is . The matrix is block diagonal; so too is its inverse, with diagonal blocks rapid to compute: , , and . Because does not change over iterations, it only needs to be computed once.
Computation of
Naive calculation of for the lnlike would be on the order of operations. However we show how it can be rapidly updated with just two additions. Note that . By Cholesky decomposition, , and , where are the diagonal terms of the lower triangular matrix . Now, if we take the Cholesky decomposition of as , and because is a lower triangular matrix, we can see that . By noting that is block diagonal, is also block diagonal with diagonal blocks , , and , where comes from the Cholesky decomposition . Because of the structure of B (illustrated above), the diagonal terms of are , , and . This means that
Since is constant over iterations, needs to be computed only once, and reused over iterations.
Footnotes
Data Availability Statement
The data by Hautepen et al. (Houtepen et al., 2016) that support the findings of this study are openly available from ArrayExpress at https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-77445, reference identifier E-GEOD-77445
References
- Albert JM, Geng C, & Nelson S. (2016). Causal mediation analysis with a latent mediator. Biom J, 58(3), 535–548. doi: 10.1002/bimj.201400124 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Derkach A, Pfeiffer RM, Chen TH, & Sampson JN (2019). High dimensional mediation analysis with latent variables. Biometrics, 75(3), 745–756. doi: 10.1111/biom.13053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dill H, Linder B, Fehr A, & Fischer U. (2012). Intronic miR-26b controls neuronal differentiation by repressing its host transcript, ctdsp2. Genes & development, 26(1), 25–30. doi: 10.1101/gad.177774.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan J, & Lv J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Statist. Soc.B, 70, 849–911. [Google Scholar]
- Friedman J, Hastie T, & Tibshirani R. (2008). Sparse inverse covariance estimation with the lasso. Biostatistics, 9, 432–441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gottesman II, & Gould TD (2003). The endophenotype concept in psychiatry: etymology and strategic intentions. The American journal of psychiatry, 160(4), 636–645. doi: 10.1176/appi.ajp.160.4.636 [DOI] [PubMed] [Google Scholar]
- Guyon I, & Ellisseff A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. [Google Scholar]
- Houtepen LC, Vinkers CH, Carrillo-Roa T, Hiemstra M, van Lier PA, Meeus W, . . . Boks MP (2016). Genome-wide DNA methylation levels and altered cortisol stress reactivity following childhood trauma in humans. Nature communications, 7, 10967. doi: 10.1038/ncomms10967 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoyle R. (Ed.) (2012). Handbook of structural equation modeling. Newe York: The Guilford Press. [Google Scholar]
- Huang YT, & Pan WC (2016). Hypothesis test of mediation effect in causal mediation model with high-dimensional continuous mediators. Biometrics, 72(2), 402–413. doi: 10.1111/biom.12421 [DOI] [PubMed] [Google Scholar]
- Hunsberger JG, Austin DR, Chen G, & Manji HK (2009). MicroRNAs in mental health: from biological underpinnings to potential therapies. Neuromolecular Med, 11(3), 173–182. doi: 10.1007/s12017-009-8070-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Imai K, Keele L, & Tingley D. (2010). A general approach to causal mediation analysis. Psychological methods, 15(4), 309–334. doi: 10.1037/a0020761 [DOI] [PubMed] [Google Scholar]
- Imai K, Keele L, & Yamamoto T. (2010). Identification, Inference and Sensitivity Analysis for Causal Mediation Effects. Statistical Science, 25, 51–71. [Google Scholar]
- Jacobucci R, Grimm KJ, & McArdle JJ (2016). Regularized Structural Equation Modeling. Structural equation modeling, 23(4), 555–566. doi: 10.1080/10705511.2016.1154793 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koller D, & Friedman N. (2009). Probabilistic graphical models, Principles and Techniques. Cambridge: The MIT Press. [Google Scholar]
- Lange K. (2004). Optimization. New York: Springer. [Google Scholar]
- Li C. (1975). Path Analysis - a primer. Pacific Grove, CA: Boxwood Press. [Google Scholar]
- MacKinnon D. (2008). Inroduction to Statistical Mediation Analysis. New York: Taylor and Francis Group. [Google Scholar]
- MacKinnon DP, Lockwood CM, Hoffman JM, West SG, & Sheets V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological methods, 7(1), 83–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malacards (2020). Expressive Language Disorder (https://www.malacards.org/card/expressive_language_disorder). [Google Scholar]
- McArdle J. (2005). The development of the ram rules for latent variable structural equation modeling In Maydeu-Olivares A. & McArdle J. (Eds.), Contemporary psychometrics: A festschrift for Roderick P. McDonald (pp. 225–273). Mahwah, NJ: Lawrence Erlbaum. [Google Scholar]
- Meinshausen N. (2007). Relaxed lasso. Computational Statistics & Data Analysis, 52(1), 374–393. [Google Scholar]
- Naghdi S, & Hajnoczky G. (2016). VDAC2-specific cellular functions and the underlying structure. Biochim Biophys Acta, 1863(10), 2503–2514. doi: 10.1016/j.bbamcr.2016.04.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Picard M, & McEwen BS (2018). Psychological Stress and Mitochondria: A Systematic Review. Psychosom Med, 80(2), 141–153. doi: 10.1097/PSY.0000000000000545 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Picard M, McEwen BS, Epel ES, & Sandi C. (2018). An energetic view of stress: Focus on mitochondria. Front Neuroendocrinol, 49, 72–85. doi: 10.1016/j.yfrne.2018.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Preacher KJ, & Hayes AF (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behav Res Methods, 40(3), 879–891. doi: 10.3758/brm.40.3.879 [DOI] [PubMed] [Google Scholar]
- Serang S, Jacobucci R, Brimhall KC, & Grimm KJ (2017). Exploratory Mediation Analysis via Regularization. Structural equation modeling : a multidisciplinary journal, 24(5), 733–744. doi: 10.1080/10705511.2017.1311775 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simon N, Friedman J, Hastie T, & Tibshirani R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2), 231–245. [Google Scholar]
- Singal DP, Li J, & Zhu Y. (2000). HLA class III region and susceptibility to rheumatoid arthritis. Clin Exp Rheumatol, 18(4), 485–491. [PubMed] [Google Scholar]
- van Kesteren E-J, & Obersk DL (2019). Exploratory Mediation Analysis with Many Potential Mediators. Structural Equation Modeling: A Multidisciplinary Journal, 26(5), 710–723. [Google Scholar]
- Vanderweele T. (2015). Explanation in Causal Inference. New York: Oxford University Press. [Google Scholar]
- VanderWeele T, Asomaning K, Tchetgen Tchetgen E, Han Y, Spitz MR, Shete S, . . . Lin X. (2012). Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction. American journal of epidemiology, 175(10), 1013–1020. doi: 10.1093/aje/kwr467 [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanderWeele TJ (2016). Mediation Analysis: A Practitioner’s Guide. Annual review of public health, 37, 17–32. doi: 10.1146/annurev-publhealth-032315-021402 [DOI] [PubMed] [Google Scholar]
- VanderWeele TJ, & Vansteelandt S. (2014a). Mediation Analysis with Multiple Mediators. Epidemiol Methods, 2(1), 95–115. doi: 10.1515/em-2012-0010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanderWeele TJ, & Vansteelandt S. (2014b). Mediation Analysis with Multiple Mediators. Epidemiologic methods, 2(1), 95–115. doi: 10.1515/em-2012-0010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- von Oertzen T, & Brick T. (2014). Efficient Hessian computation using sparse matrix derivatives in RAM notation. Behav Res Methods, 46(2), 385–395. [DOI] [PubMed] [Google Scholar]
- Wiegand C, Heusser P, Klinger C, Cysarz D, Bussing A, Ostermann T, & Savelsbergh A. (2018). Stress-associated changes in salivary microRNAs can be detected in response to the Trier Social Stress Test: An exploratory study. Sci Rep, 8(1), 7112. doi: 10.1038/s41598-018-25554-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright S. (1921). Correlation and causation. Journal of Agricultural Research, 20, 557–585. [Google Scholar]
- Wright S. (1923). The theory of path coefficients: A reply to Niles’s criticism. Genetics,, 8, 239–255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu R, Li A, Sun B, Sun J, Zhang J, Zhang T, . . . Yuan Z. (2019). A novel m(6)A reader Prrc2a controls oligodendroglial specification and myelination. Cell Res, 29(1), 23–41. doi: 10.1038/s41422-018-0113-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu T, & Lange K. (2008). Coordinate descent algorithms for lasso penalized regression. The annals of applied statistics, 2, 224–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zellner A. (1962). An efficient method of estimating seemingly unrelated regression equations and tests for aggregation bias. Journal of the American Statistical Association, 57(298), 348–368. [Google Scholar]
- Zhao Y, & Luo X. (2016). Pathway lasso: Estimate and select sparse mediationpPathways with high dimensional mediators. arXiv preprint: arXiv:1603.07749. [Google Scholar]




