Abstract
Mediation analysis aims to assess if, and how, a certain exposure influences an outcome of interest through intermediate variables. This problem has recently gained a surge of attention due to the tremendous need for such analyses in scientific fields. Testing for the mediation effect (ME) is greatly challenged by the fact that the underlying null hypothesis (i.e. the absence of MEs) is composite. Most existing mediation tests are overly conservative and thus underpowered. To overcome this significant methodological hurdle, we develop an adaptive bootstrap testing framework that can accommodate different types of composite null hypotheses in the mediation pathway analysis. Applied to the product of coefficients test and the joint significance test, our adaptive testing procedures provide type I error control under the composite null, resulting in much improved statistical power compared to existing tests. Both theoretical properties and numerical examples of the proposed methodology are discussed.
Keywords: bootstrap, composite hypothesis, mediation analysis, structural equation model
1. Introduction
Mediation analysis plays a crucial role in investigating the underlying mechanism or pathway between an exposure and an outcome through an intermediate variable called a mediator (MacKinnon, 2008; VanderWeele, 2015). It decomposes the ‘total effect’ of an exposure on an outcome into an indirect effect that is through a given mediator and a direct effect, not through the mediator. The former holds the key to uncovering the exposure-outcome mechanism and is often known as the mediation effect (ME). The ME was initially studied under structural equation models (SEMs) in social sciences (Baron & Kenny, 1986; Sobel, 1982) and has been given formal causal definitions (Imai et al., 2010; Pearl, 2001; Robins & Greenland, 1992) within the counterfactual framework (Imbens & Rubin, 2015). Examining the presence or absence of the mediation effect can facilitate a deeper understanding of the underlying causal pathway from the exposure to the outcome and can give essential insights into intervention consequences, e.g. manipulating the mediator to change the exposure-outcome mechanism. As a result, it is of interest to apply mediation analysis in many scientific fields, such as psychology (MacKinnon & Fairchild, 2009; Valeri & VanderWeele, 2013), genomics (Guo et al., 2022; Huang, 2018; Huang & Pan, 2016; Zhao et al., 2014), and epidemiology (Barfield et al., 2017; Fulcher et al., 2019), among others.
To analyse the ME, one classical setting models the relationship between the exposure, the potential mediator, and the outcome as a directed acyclic graph; see Figure 1. Specifically, let parametrise the causal effect of the exposure on the mediator, and parametrise the causal effect of the mediator on the outcome conditioning on the exposure. Then in the classical linear SEM (Baron & Kenny, 1986; Sobel, 1982), the causal ME is proportional to under suitable identification assumptions (Imai et al., 2010). More generally, this product expression may also appear in the causal ME under many other models, such as generalised linear models and survival analysis models (Huang & Cai, 2016; VanderWeele, 2011; VanderWeele & Vansteelandt, 2010). Therefore, the important scientific question of whether or not the ME is absent can be formulated as the hypothesis testing problem against and (MacKinnon, 2008). Note that holds if and only if or , corresponding to two parameter sets and , respectively. It follows that the parameter set of is the union of two sets and . We visualise , , and their union in Figure 2a–c, respectively.
Figure 1.
Directed acyclic graph for mediation analysis. The exposure is S; the mediator is M; the outcome is Y; the potential confounders are X.
Figure 2.
Visualisation of parameter spaces of under different constraints. (a) . (b) . (c) , and (d) .
To test , a broad class of methods is based on the product of coefficients (PoC) , where and denote the sample estimates of parameters and , respectively. One popular PoC method is Sobel’s (1982) test, which is a Wald-type test and approximates the variance of by the first-order Delta method. In addition, the joint significance (JS) test (Fritz & MacKinnon, 2007), also known as the MaxP test, is another widely used test that rejects the of no ME if both and pass a certain cut-off of statistical significance. Liu et al. (2021) pointed out that the MaxP test is a kind of likelihood ratio test under normality assumptions.
Although there are various procedures available for testing MEs, properly controlling the type I error remains a challenge due to the intrinsic structure of the null parameter space. In particular, is composed of three different parameter cases: (i) and ; (ii) and ; and (iii) and . Case (iii), illustrated in Figure 2d, is a singleton given by the intersection set . Under case (iii), both parameters and are fixed at 0, whereas cases (i) and (ii) have one fixed parameter and the other parameter to be estimated. This intrinsic difference leads to distinct asymptotic behaviours of test statistics. Since the underlying truth is typically unknown in practice, it is difficult to obtain proper p-values under the composite null hypothesis.
Particularly, in the popular Sobel’s test and the MaxP test, the asymptotic distributions of the test statistics under cases (i) and (ii) are known to be different from those under case (iii). These tests have been shown to be overly conservative in case (iii), because statistical inference is carried out according to the asymptotic distributions determined in cases (i) and (ii) (Fritz & MacKinnon, 2007; MacKinnon et al., 2002). This issue has gained a surge of attention in recent genome-wide epidemiological studies, where for the majority of omics markers, it holds that , and the classical tests are generally underpowered (Barfield et al., 2017). Some recent work (Dai et al., 2020; Du et al., 2022; Liu et al., 2021) utilised the relative proportions of the cases (i)–(iii) in the population, but they rely on accurate estimation of the true proportions. Huang (2019a, 2019b) adjusted the composition of through the variances of test statistics but required that the non-zero coefficients are weak and sparse, which can be violated when the sample size is large. Another line of related research (Derkach et al., 2020; Djordjilović et al., 2020, 2019; Sampson et al., 2018) used a screening step to control the family-wise error rate or the false discovery rate (FDR) for a large group of hypotheses, but they did not directly provide proper p-values for each of the composite null hypotheses. Van Garderen and Van Giersbergen (2022) proposed to construct a critical region for testing that can nearly control the type I error at one prespecified significance level. Miles and Chambaz (2021) construct a rejection region that can achieve type I error control at significance level ω with a positive integer. Despite these developments, the fundamental issue of correctly characterising the distributions of test statistics to obtain well-calibrated p-values under a composite null hypothesis remains an important challenging problem in the current literature of mediation analyses.
In this paper, we develop a new hypothesis testing methodology to address the challenge of obtaining uniformly distributed p-values under the composite null hypothesis of no ME. Particularly, we propose an adaptive bootstrap procedure that can flexibly accommodate different types of null hypotheses. In the current literature, the non-parametric bootstrap is directly applied to the PoC test statistic , which has been, unfortunately, found numerically to be overly conservative when (Barfield et al., 2017; Fritz & MacKinnon, 2007). This paper unveils analytically the reason for the failure of the conventional non-parametric bootstrap method, which stems from non-regular limiting behaviours of the PoC test statistic at the neighbourhood of . To overcome the non-regularity near , we derive an explicit representation of the asymptotic distribution of the PoC test statistic through a local model, and perform a consistent bootstrap estimation by incorporating suitable thresholds. In addition, for the JS test, we show that the conventional non-parametric bootstrap also fails to control type I error properly, which can be fixed by an adaptive bootstrap test similar to the procedure of the PoC test. For both the PoC test and the JS test, the proposed methods can circumvent the non-standard limiting behaviours of the test statistics and therefore uniformly adapt to different types of null cases of no ME.
The structure of this paper is as follows. In Section 2, we briefly review the basic problem setting and several popular testing methods in the literature. In Section 3, we introduce the adaptive bootstrap method that can be applied to the representative PoC and JS tests under classical linear SEMs. In Section 4, we conduct extensive simulation studies to compare the finite-sample performances of the proposed tests with popular counterparts. In Section 5, we develop extensions of the adaptive bootstrap, including joint testing of multivariate mediators and testing MEs under non-linear models. In Section 6, we apply our adaptive bootstrap tests to investigate the mediation pathways of metabolites on the association of environmental exposures with a health outcome. We conclude the paper and discuss interesting extensions in Section 7.
1.1. Notation
For two sequences of real numbers and , we write if . We let denote convergence in distribution. We let denote bootstrap consistency relative to the Kolmogorov–Smirnov distance; see an introduction of this consistency notion in Section 23 of van der Vaart (2000). To ensure clarity, we also provide the definitions of all the convergence modes in Section A of the online supplementary material.
2. Hypothesis tests of no ME
To examine the ME of the exposure S on the outcome Y through the intermediate variable M, the causal inference literature utilises the counterfactual framework (VanderWeele, 2015). In particular, let denote the potential value of the mediator under the exposure , and let denote the potential outcome that would have been observed if S and M had been set to s and m, respectively. Throughout the paper, we adopt the stable unit treatment value assumption (Rubin, 1980), so that and . Then the ME or the natural indirect effect of vs. (Imai et al., 2010) is defined as
(1) |
For ease of illustration, we consider the popular linear SEM (MacKinnon, 2008; VanderWeele, 2015):
(2) |
where denotes a vector of confounders with the first element being 1 for the intercept, and and are independent error terms with mean zero and finite variances and , respectively. We assume that there are n independent and identically distributed (i.i.d.) observations, , sampled from Model (2). The independence of and holds under the following no unmeasured confounding assumptions. In particular, let denote the independence of A and B conditional on C, and we assume that for all levels of , and m, (i) , no confounder for the relation of Y and S; (ii) , no confounder for the relation of Y and M conditioning on ; (iii) , no confounder for the relation of M and S; (iv) , no confounder for the M–Y relation that is affected by S (VanderWeele & Vansteelandt, 2009). Under these assumptions, the model can be visualised by the directed acyclic graph in Figure 1, and the ME (1) equals .
Therefore, the scientific goal of detecting the presence of an ME can be formulated as the following hypothesis testing problem:
This null hypothesis is composite and can be decomposed into three disjoint cases:
(3) |
and the alternative hypothesis is
Remark 1
Composite null problems similar to equation (3) can occur in settings other than Model (2); the latter is considered to demonstrate the essential analytic details useful for possible generalisations. Similar issues have also been observed in many other scenarios, including partially linear models (Hines et al., 2021), survival analysis (VanderWeele, 2011), and high-dimensional models (Zhou et al., 2020). The analytic details of the methodology development in this paper can pave the path for useful generalisations to other important statistical models and applications.
To test the composite null (3), various methods have been proposed, and a comprehensive survey can be found in MacKinnon et al. (2002). There are two representative classes of tests: (I) the PoC test, which corresponds to a Wald-type test and (II) the JS test, which is the likelihood ratio test under normality of the error terms (Liu et al., 2021). (I) The first class of methods examine the PoC: , where and denote consistent estimates of and , respectively. One common practice is to apply a normal approximation to divided by its standard error, where Sobel (1982) derives the standard error formula by the first-order Delta method. In addition to the large-sample approximation, the bootstrap has also been used to evaluate the distribution of (Fritz & MacKinnon, 2007; MacKinnon et al., 2004). (II) The JS test, also known as the MaxP test, rejects if , where ω is a prespecified significance level, and and denote the p-values for (the link ) and (the link ), respectively. Despite their popularity, these methods have been found numerically to be overly conservative under in equation (3) (Barfield et al., 2017; MacKinnon et al., 2002). See a further discussion on the non-regular asymptotic behaviours of statistics underlying the conservatism in Section 3.
Similar issues have also been broadly recognised for Wald tests in various statistical problems including three-way contingency table analysis and factor analysis (Drton & Xiao, 2016; Dufour et al., 2013; Glonek, 1993). However, characterising non-regular asymptotic behaviours under the singular null hypothesis is still insufficient to address intrinsic technical challenges in testing (3). In particular, the composite null (3) includes not only the singular case but also the other two non-singular cases and . Since a test statistic follows different distributions under different null cases, and the underlying true null case is unknown, it is difficult to obtain uniformly distributed p-values through one simple asymptotic distribution under equation (3). To address this technical difficulty, we adopt, justify, and evaluate an adaptive bootstrap procedure. For both Wald-type PoC test and non-Wald JS test, we will show that the proposed procedure can naturally adapt to the three types of null hypotheses in equation (3) and yield uniformly distributed p-values across all different null cases.
3. Adaptive bootstrap tests
In this section, we propose a general Adaptive Bootstrap (AB) procedure for testing the composite null hypothesis (3). For illustration, we apply the adaptive bootstrap to the representative PoC test and show it can address the non-regularity issue. We emphasise that this general strategy can be applied in a wide range of scenarios. We also derive adaptive bootstrap procedure in other examples, including the JS test (Section B in the online supplementary material), joint testing of multivariate mediators (Section 5.1) and testing ME under the generalised linear models (Sections 5.2 and 5.3.).
To conduct hypothesis testing or estimate confidence intervals for statistics whose limiting distributions deviate from the normal, a simple and powerful approach is to apply the bootstrap resampling technique. However, the classical bootstrap is not a panacea, and on some occasions it can fail to work properly, including unfortunately the non-regular scenarios considered in this paper. In particular, it has been observed through simulation studies that the classical bootstrap technique is overly conservative under (Barfield et al., 2017; MacKinnon et al., 2002). We next unveil the key insights underlying the failure of the classical bootstrap, which motivates our use of the adaptive bootstrap.
3.1. Non-regularity of the PoC test
When , one of the first-order gradients and is non-zero. Thus, the Delta method can be applied to support the use of Sobel’s test (based on asymptotic normality) and classical bootstrap (Barfield et al., 2017). However, under , the gradients , and validity of Sobel’s test and the classical bootstrap cannot be obtained as above.
We next illustrate the non-regular limiting behaviour of the PoC under . For ease of exposition, consider a special case of equation (2): and Let denote the ordinary least squares estimators of , and let the corresponding non-parametric bootstrap estimators. Here and throughout this paper, we use the superscript ‘’ to indicate estimators obtained from the non-parametric bootstrap, namely ‘bootstrap in pairs’ in the regression setting. By classical asymptotic theory (van der Vaart, 2000), under mild conditions,
(4) |
where denotes a mean-zero normal random vector with a covariance matrix given by that of the random vector , , .
Moreover,
(5) |
where is an independent copy of in equation (4) under the same distribution. By equation (4), under , with the convergence rate n different from the standard parametric rate. By equations (4)–(5), we have . We can see that the limit of is different from that of , implying inconsistency of the classical non-parametric bootstrap.
3.2. Adaptive bootstrap of the PoC test
To address the challenge of correctly evaluating the distribution of the PoC statistic, we utilise the local asymptotic analysis framework. Intuitively, the goal is to evaluate if a small change in the target parameters leads to little change on the limit of the statistics. To this end, given targeted parameters , we define their locally perturbed counterparts as , and , respectively, where denote the local parameters of perturbations from our targeted coefficients We then frame the problem under the local linear SEM as follows:
(6) |
where and are independent error terms with mean zero and finite variances. Fixing the target parameters , according to van der Vaart (2000) the formulation given in equation (6) may also be viewed as a local statistical experiment with local parameters under which we are interested in examining the limit of test statistics. Note that with the local parameters , equation (6) reduces to the original model (2) with the parameters . Our inference goal remains the same: that is, to test the underlying true coefficients . Technically, we consider a -vicinity of local neighbouring values only for the theoretical investigation of local asymptotic behaviours. Such an idea has also been used for studying other non-regularity issues (McKeague & Qian, 2015; Wang et al., 2018). To examine the limit of under equation (6), we assume the following general regularity Condition 1.
Condition 1
(C1.1) and . (C1.2) is a positive definite matrix with bounded eigenvalues, where . (C1.3) The second moments of are finite, where with , and with and .
Similarly to our above discussions under the simplified model, Theorem 1 establishes the limits of and when and , respectively.
Theorem 1 Asymptotic Property —
Assume Condition 1. Under the local model (6),
when ,
when ,
where is a mean-zero normal random vector with a covariance matrix given by that of the random vector with , and .
Theorem 1 suggests the limit of is not uniform with respect to , and the non-uniformity occurs around . On the other hand, in the neighbourhood of , the limit of is continuous as a function of into the space of distribution functions. Therefore, using this local limit in the bootstrap, we expect better finite-sample accuracy, compared to the classical non-parametric bootstrap that does not take into account the local asymptotic behaviour.
Moreover, to discern the null cases, we will consider a decomposition of the statistic. The idea is to isolate the possibility that by comparing the absolute value of the standardised statistics and to some thresholds, where and denote the sample standard deviations of and , respectively. Specifically, we decompose
(7) |
with the indicators and , where represents the indicator function of an event E, and is a certain threshold to be specified. When , the classical bootstrap is consistent for the first term in equation (7). For the second term in equation (7), we next develop a bootstrap statistic motivated by Theorem 1(ii).
To construct the bootstrap statistic, we introduce some notation following the convention in the empirical process literature (van der Vaart, 2000). Particularly, throughout the paper, denotes the population probability measure of , denotes the empirical measure with respect to the i.i.d. observations , and denotes the non-parametric bootstrap version of . For any measurable function , we define the empirical process , and its non-parametric bootstrap version is . With the above notation, we define the sample versions of and in Condition 1 as , , , and respectively, where we use ‘^’ to denote the sample counterparts in this paper. Similarly, we define their non-parametric bootstrap counterparts by replacing with in the above definitions.
When , motivated by Theorem 1(ii), we construct a bootstrap statistic as a bootstrap counterpart of . In particular, where , , and and denote the sample residuals obtained from the ordinary least squares regressions of the two models in equation (6). When , we still consider the classical non-parametric bootstrap estimator . To develop an adaptive bootstrap test, we utilise the decomposition (7) and propose to replace the indicators and in equation (7) by
(8) |
where and denote the classical non-parametric bootstrap versions of and , respectively. Following the decomposition in equation (7), we define a statistic
termed as Adaptive Bootstrap (AB) test statistic in this paper. Theorem 2 below establishes the bootstrap consistency of .
Theorem 2 Adaptive Bootstrap Consistency —
Assume the conditions of Theorem 1 are satisfied. When and as ,
where is a non-random scaling factor satisfying
(9)
Theorem 2 suggests that under the original model (2), i.e. , the AB statistic is a consistent bootstrap estimator for with a proper scaling. Moreover, for any fixed targeted parameters , in their local neighbourhoods, i.e. , the bootstrap consistency still holds as a smooth function of . Intuitively, this suggests that a small change in the target parameters does not affect the consistency property, and is ‘regular’ under the local model. In practice, without knowing which case is the true null we rely on as the bootstrap statistic for generally. This strategy is viable because with a given finite sample size n, using for bootstrapping is equivalent to using for bootstrapping . Therefore, as desired, will approximate well the distribution of regardless of the underlying null case.
Remark 2
As a comparison, we also discuss the naive non-parametric bootstrap when . Specifically, we obtain the following expression (in Remark C.4 of the online supplementary material),
(10) where , and . In addition to the term , equation (10) has two extra random terms , which suggests that using equation (10) in the bootstrap would not be consistent. The issue of the classical bootstrap being inconsistent at is circumvented by the proposed local bootstrap statistic .
3.2.1. Adaptive bootstrap test procedure
We introduce a consistent bootstrap test procedure for based on Theorem 2. Given a nominal level ω, let and denote the lower and upper quantiles, respectively, of the bootstrap estimates . If falls outside the interval , we reject the composite null (3), and conclude that the ME is statistically significant at the level ω. We clarify that the goal is to test the underlying true coefficients . The reason to consider their -local coefficients is merely for theoretical investigation of local asymptotic behaviours. Therefore, to test (3) under the original model (2), it suffices to calculate with . We point out that the rejection region in the adaptive procedure may also be constructed through the asymptotic distribution as an alternative to the bootstrap; nevertheless, the proposed bootstrap procedure is more flexible and does not rely on a particular form of the limiting distributions, and therefore, it can be easily extended under various mediation models; see more examples in Section 5.
3.2.2. Choice of the tuning parameters
Under the conditions of Theorem 2, which specify and as we have , suggesting that can provide a consistent test for . If remains bounded as , asymptotically reduces to , i.e. the classical non-parametric bootstrap procedure, which may be conservative. In the simulation experiments, we set and find that a fixed constant λ, e.g. can give a good performance. In general settings, we can choose the tuning parameter through the double bootstrap (Chen, 2016); see Section F.1 of the online supplementary material for more implementation details.
Remark 3
Our proposed adaptive procedures examine the non-regular asymptotic behaviours of test statistics through local models. In effect, the idea of local models may be traced back to econometrics (Andrews, 2001) and was utilised in other statistical problems, such as classification and post-selection inference (Laber & Murphy, 2011; McKeague & Qian, 2015, 2019; Wang et al., 2018). Nevertheless, we emphasise that there are unique statistical challenges of testing mediation effects. First, in terms of the parameter space, the null hypothesis of no ME is essentially a union of individual hypotheses. This results in a non-standard shape of the null parameter space, on which both regular and non-regular asymptotic behaviours can occur, as illustrated in Figure 2c. Second, in terms of the behaviour of the estimator, we unveil a fundamental zero-gradient phenomenon. This is caused by the special form of the product statistic and cannot be directly addressed by the existing adaptive procedures mentioned above. Third, in terms of the models, the mediation analysis involves a system of structural equations. Ignoring the model structure in the implementation could lead to slow computation; see Section F.2 of the online supplementary material for more details on computation. Due to these unique challenges, new developments in methodology, theory, and computation are necessary.
3.3. Adaptive bootstrap for the JS test
In addition to the Wald-type PoC test, we also address the non-regularity issue of the non-Wald JS/maxP test through our proposed adaptive bootstrap. It is noteworthy that non-regular behaviours of the JS and PoC tests under the singleton are distinct, as the two statistics take different forms. Particularly, PoC statistic has the zero-gradient issue discussed above, whereas JS statistic has a certain inconsistent convergence issue. Despite that difference, we can similarly develop an adaptive bootstrap for the JS test and obtain uniformly distributed p-values. Refer to the detail in Section B of the online supplementary material. This suggests that our proposed adaptive bootstrap is not restricted to the Wald-type test and may be further generalised to other tests with similar circumstances.
3.4. On multivariate mediators
It is worth noting that the proposed strategy can be generalised to deal with multiple mediators under suitable identifiability conditions. In the following, we delve into three scenarios of practical importance.
We consider the group-level joint ME via a set of mediators shown by the red path in Figure 3a. This type of joint ME has been considered in the literature by Huang and Pan (2016) and Hao and Song (2023), among others. We generalise the AB method to test the joint ME in Section 5.1.
We consider multiple mediators that are causally uncorrelated (Jérolon et al., 2020) or governed by the parallel path model (Hayes, 2017). In this case, the indirect effect of one single mediator can be identified under the known identifiability assumptions outlined in Imai and Yamamoto (2013). In particular, under the multivariate linear SEM (13) with no causal interplay between mediators, the null hypothesis of no individual indirect effect via one mediator, say, , could be formed as , illustrated in Figure 3b. To apply the AB test to , we note that equation (13) can be equivalently rewritten as and , where and . This form resembles equation (2), and the AB method in Section 3 can be employed to test by adjusting in the outcome model. We provide details including the identification assumptions in Section D.1.1 of the online supplementary material.
When the mediators are causally correlated, evaluating individual indirect effects along different posited paths requires correct specification of the mediators’ causal structure (VanderWeele et al., 2014). To relax such stringent assumptions, researchers have proposed alternative methods, one of which is to examine the interventional indirect effects specific to each distinct mediator (Loh et al., 2021). Intuitively, the interventional indirect effect via a target mediator is supposed to capture all of the exposure-outcome effects that are mediated by as well as any other mediators causally preceding ; see a diagram in Figure 3c. Under a typical class of linear and additive mean models, estimators of interventional indirect effects take the same product form of coefficients as that in the above case (ii). Thus, the proposed AB method in Section 3 can be similarly applied with little effort. We provide relevant details including the definition and identification assumptions of the interventional indirect effects in Section D.1.2 of the online supplementary material.
Figure 3.
Path diagram of the mediation model with multiple mediators: dashed lines represent possible non-causal correlations or independence, and solid arrowed lines represent possible causal relationships. (a) Joint ME via a group of mediators . (b) Individual ME via one mediator . (c) Interventional ME via one mediator . In Panel (c), represents mediators that are causally preceding .
4. Numerical experiments
In this section, we conduct simulation experiments to evaluate the finite-sample performance of the proposed adaptive bootstrap PoC and JS tests. Particularly, we generate data through the following model:
(11) |
In the model (11), the exposure variable S is simulated from the Bernoulli distribution with the success probability 0.5; the covariate is continuous and simulated from a standard normal distribution ; the covariate is discrete and simulated from the Bernoulli distribution with the success probability 0.5; two error terms and are simulated independently from and , respectively. We set the parameters , , , and . Moreover, we consider sample sizes , and set the bootstrap sample size at 500.
In simulation studies, we compare eight testing methods: the adaptive bootstrap for the PoC test (PoC-AB), the classical non-parametric bootstrap for the PoC test (PoC-B), Sobel’s test (PoC-Sobel), the adaptive bootstrap for the JS test (JS-AB), the classical non-parametric bootstrap for the JS test (JS-B), the MaxP test (JS-MaxP), the non-parametric bootstrap method in the causal mediation analysis R package Tingley et al. (2014) (CMA), and the method in Huang (2019a) (MT-Comp). It is noteworthy that Huang’s (2019a) MT-Comp made specific model assumptions, which are not fully compatible with our simulation settings, and we include this method just for the purpose of comparison. Some other methods (e.g. Dai et al., 2020; Liu et al., 2021) relied on estimating the relative proportions of the three cases, which is not directly applicable here and thus not included.
4.1. Null hypotheses: type I error rates
4.1.1. Setting 1: Under a fixed type of null
In the first setting, we simulate data under a fixed null hypothesis over 2,000 Monte Carlo replications to estimate the distribution of p-values. Particularly, we consider three types of null hypotheses below:
(12) |
We draw the Q–Q plots with in Figure 4. Q–Q plots under are similar and presented in Figure G.1 of the online supplementary material. In Figure 4, three sub-figures in the first row present the results of the PoC tests under three fixed nulls , and , respectively, and three sub-figures in the second row present the corresponding results of the JS tests, respectively.
Figure 4.
Q–Q plots of p-values under the fixed null with .
Figure 4 shows that for the PoC type of tests, under or , the PoC-AB, the PoC-B, and the PoC-Sobel can correctly approximate the distribution of the PoC test statistic. However, under , the PoC-B and the PoC-Sobel become conservative, while the proposed PoC-AB still approximates the distribution of the PoC statistic well. Similarly, for the JS type of tests, under or , the JS-AB, the JS-B, and the JS-MaxP all work well. In contrast, under , the JS-B inflates, and the JS-MaxP becomes conservative, while the JS-AB still exhibits a good performance. In addition, Figure 4 and Figure G.1 in the online supplementary material also display the results of both Huang (2019a)’s MT-Comp and the causal mediation analysis R package CMA (Tingley et al., 2014) for comparison. We observe that the MT-Comp properly controls the type I error under , but fails to do so under and with inflated type I errors. This may be because the models considered in Huang (2019a) are not compatible with our simulation settings. On the other hand, the causal mediation R package (Tingley et al., 2014) produces uniformly distributed p-values under and , but is conservative under . This means that the R package CMA test is underpowered.
4.1.2. Setting 2: Under a random type of null
In the second setting, we simulate data over 2,000 Monte Carlo replications, where in each replication, the null hypothesis is not fixed but randomly selected from – in (12). Specifically, for , we consider three selection probabilities (I) , (II) , and (III) , respectively. We provide Q–Q plots of p-values with in Figure 5, and Q–Q plots under are similar and provided in Figure G.2 of the online supplementary material. In Figure 5, three sub-figures in the first row present the results of the PoC tests with three null selection probabilities (I)–(III), respectively, and three sub-figures in the second row present the corresponding results of the JS test, respectively.
Figure 5.
Q–Q plots of p-values under the mixture of nulls: .
Figure 5 shows that the adaptive bootstrap procedures for the PoC and JS tests perform well under different settings. The PoC-B test, PoC-Sobel’s test, the JS-MaxP test, and the R package CMA (Tingley et al., 2014) are conservative, and they become more conservative as the probability of choosing increases. We mention that in many biological studies such as genomics, predominates the null cases, hence these tests that are conservative under may not be preferred. Moreover, the JS-B test and the MT-Comp method can have inflated type I errors. The performance of JS-B becomes worse as the proportion of rises, while the MT-Comp method deteriorates as the proportions of and increase.
4.2. Alternative hypotheses: statistical power
In this subsection, we evaluate the statistical power of the proposed AB tests under alternative hypotheses. Particularly, we simulate data under two settings: (I) fix for the convenience of pictorial presentation, which takes various values beginning from zero and (II) fix the size of the ME , and vary the ratio . In the setting (I), we consider , and then plot the empirical rejection rates, based on 500 Monte Carlo replications, vs. the signal size of , which is equal to in the setting (I). In the setting (II), we fix when , and when . Then we plot the empirical rejection rates vs. the ratio . The results in the two settings (I) and (II) are shown in Figures 6 and 7, respectively.
Figure 6.
Empirical rejection rates (power) vs. the signal strength of .
Figure 7.
Empirical rejection rates (power) vs. the ratio .
Figures 6 and 7 show that for the three PoC tests, the PoC-AB has higher power than that of the classical non-parametric bootstrap, and both are more powerful than the Sobel’s test. Similarly, for the JS tests, the JS-AB has higher power than that of the classical bootstrap, and both have higher power than the MaxP test. In addition, the JS-B test has slightly inflated type I errors when , which is consistent with the results in Figure 4. Among the three classical methods (Sobel’s test, the MaxP test, and the PoC-B), the MaxP test seems to achieve the best balance between the type I error and the statistical power, while Sobel’s test has the lowest power. These findings are consistent with those reported in the current literature (Barfield et al., 2017; MacKinnon et al., 2002). Huang’s (2019a) MT-Comp test has shown seriously inflated type I errors in Figure 4, and therefore is not a fair competitor in our considered settings despite its high power. Overall, it is clear that the proposed PoC-AB and JS-AB tests are superior over these existing methods, with the most robust control of type I error and highest power.
5. Extensions
The adaptive bootstrap in Section 3 offers a general strategy that can be extended in a wide range of scenarios beyond the model (2). We next examine three examples, including testing the joint ME of multivariate mediators in Section 5.1, testing the ME in terms of odds ratio for a binary outcome in Section 5.2, and testing the ME in terms of risk difference when the outcome is continuous, and the mediator follows a generalised linear model in Section 5.3. In each scenario, we present details in the order of (1) Model, (2) Non-regularity issue, (3) Asymptotic theory and adaptive bootstrap, and (4) Numerical results.
5.1. Testing joint ME of multivariate mediators
When the number of mediators is large, it can also be of interest to conduct group-based mediation analyses for a set of mediators (Daniel et al., 2015; Hao and Song 2023; Huang & Pan, 2016; Sohn & Li, 2019; VanderWeele & Vansteelandt, 2014); also see a review in Blum et al. (2020). In this section, we show that the proposed AB method can be generalised to test joint MEs.
(1) Model. As an extension of equation (2), we consider the multivariate linear SEM (Hao and Song 2023; Huang & Pan, 2016; VanderWeele & Vansteelandt, 2014),
(13) |
where denotes a vector of confounders with the first element being 1 for the intercept, and are independent error terms with mean zero, , and . Assume identification conditions similar to those in Section 2 (see Condition D.2 in the online supplementary material). The joint ME through the group of mediators is (Huang & Pan, 2016), where and .
(2) Non-regularity issue. We are interested in joint ME , which is equivalent to . Similarly to Section 3, when , i.e. there exists at least one coefficients or , we have or . However, when , i.e. for all , for all . We expect that a non-regularity issue similar to that in Section 3 would occur when . This issue is also illustrated by numerical experiments in Section D.2.4 of the online supplementary material.
(3) Asymptotic theory and adaptive bootstrap. To better understand the non-regularity issue, we similarly consider a local linear SEM and , where and .
Theorem 3 Asymptotic Property —
Under online supplementary Conditions D.2 and D.3 (the latter is a regularity condition on the design matrix similar to Condition 1), and the local model,
when ,
when ,
where are defined to be multivariate counterparts of in Theorem 1, and the detailed definitions are given in Section D.2.2 of the online supplementary material.
To present the theory of bootstrap consistency, we define the multivariate counterparts of in Section 3 as . The detailed forms are given in Section D.2.3 of the online supplementary material. Similarly to in Section 3, we define the AB statistic under the multivariate setting as
where where and denote the sample T-statistics of the two coefficients and , respectively, and and denote the bootstrap counterparts of the two sample T-statistics. We establish bootstrap consistency for the joint AB statistic below.
Theorem 4 Adaptive Bootstrap Consistency —
Under the conditions of Theorem 3, when the tuning parameter satisfies and as
where is specified as in equation (9).
Based on Theorem 4, we can develop an AB test similar to that in Section 3.
(4) Numerical results. To evaluate the performance of the joint AB test, we conduct numerical experiments, detailed in Section D.2.4 of the online supplementary material. We compare the AB test with the classical bootstrap and two tests in Huang and Pan (2016): the product test based on normal product distribution (PT-NP) and the product test based on normality (PT-N). We observe results similar to those in Section 4. Specifically, under , when , both the proposed AB test and the compared methods yield uniformly distributed p-values. However, when , the compared methods become overly conservative, whereas the AB test still produces uniformly distributed p-values. Under , the AB test can achieve higher empirical power than the compared methods. Besides simulations, we also provide an exemplary data analysis in Section G.3.2 of the online supplementary Material.
5.2. Non-linear Scenario I: Binary outcome and general mediator
(1) Model. Suppose the outcome is binary, and consider the model
(14) |
where is the inverse of a canonical link function in generalised linear models. Under Model (14), since the outcome is binary, it is conventional to define the ME as the odds ratio (VanderWeele & Vansteelandt, 2010). Specifically, under the identification assumption given in Section 2, the conditional natural indirect effect (ME) can be identified as
where denotes the potential value of the mediator under the exposure , and denotes the potential outcome that would have been observed if S and M had been set to s and m, respectively. Under of no ME,
(15) |
where the second equivalence follows from the strict increasing monotonicity of the function when .
Remark 4
We consider natural indirect/MEs conditioning on covariates following VanderWeele and Vansteelandt (2010). Alternatively, Imai et al. (2010) proposed to examine the average NIE that marginalises the distribution of . Examining the conditional NIE is mainly for technical convenience. The conditional NIE for all can give a sufficient condition for the average NIE . Conclusions of conditional NIE may be obtained for average NIE similarly. Please see Remark E.1 in the online supplementary material for more details.
(2) Non-regularity issue. The null hypothesis of no ME (15) looks different from under the linear SEMs in Section 2. Nevertheless, we can show that the non-regularity issue similar to that in Section 2 would still arise. This is formally stated as Proposition 5 below.
Proposition 5
Under the model (14), online supplementary Condition E.1 (a general regularity condition on the link function and the distribution of M), and identification conditions in Section 2,
(15) holds for if and only if or .
For simplicity of notation, let be a shorthand for . We have
, (ii) , (iii) .
It is interesting to see that even though the conditional ME does not take a product form, a non-regularity issue caused by zero gradient can still arise under in equation (15), which is similar to the PoC statistic in Section 3. Specifically, Proposition 5 implies that when , the first-order Delta method cannot be directly applied to the inference of , which is different from the scenarios when or . Therefore, we expect that the ordinary estimator of NIE can behave differently under different types of null hypotheses, and a non-regularity issue can occur. This phenomenon is indeed demonstrated by numerical experiments in Section E.2 of the online supplementary material.
(3) Asymptotic theory and adaptive bootstrap. For ease of presentation, we next derive asymptotic theory under a special case of equation (14), where the mediator is binary and follows a logistic regression model. We point out that the analysis in this section can be readily extended to cases where the mediator M follows a linear model or other canonical generalised linear models. Specifically, let M and Y be Bernoulli random variables with mean values in equation (14), and . In this case, where , and . Similarly to Section 3, we are interested in understanding how the local limiting behaviours of and coefficients change. To this end, we consider a general local logistic model:
(16) |
where , , and . Under the local model (16), we have for ,
(17) |
where and . (Please see the proof of Theorem 6 for the derivations.) For simplicity of notation, let NIE be a shorthand of , and by equation (15), NIE . Let denote an estimator of NIE, where and are defined similarly to equation (17) with replaced by their corresponding sample regression coefficient estimators .
Theorem 6 Asymptotic Property —
Assume and and Condition E.2 in the online supplementary material (a regularity condition on the design matrix similar to Condition 1). Under the local model (16) and ,
when ,
when ,
where , , , , represent bivariate mean-zero normal distributions specified in online supplementary Lemma E.2, and is a non-zero constant with given in equation (17).
We next study consistency of bootstrap estimators. Let denote the classical non-parametric bootstrap estimator of . Specifically, , where , and are defined similarly to equation (17) with replaced by their classical non-parametric bootstrap estimators Motivated by Theorem 6, we define the AB statistic
where and are defined similarly to (8). The following theorem proves consistency of the AB statistic , based on which we can develop an AB test similar to that in Section 3.
Theorem 7 Adaptive Bootstrap Consistency —
Under the conditions of Theorem 6, when the tuning parameter satisfies and as where is specified as in equation (9).
(4) Numerical results. We conduct simulation studies to compare the AB and the classical non-parametric bootstrap under the model (14). The detailed results are provided in Section E.2 of the online supplementary material. Our findings align closely with those presented in Section 4. Specifically, under , when , both the proposed AB test and the classical non-parametric bootstrap yield uniformly distributed p-values. However, when , the classical bootstrap becomes overly conservative, whereas the AB test still yields uniformly distributed p-values. Under , the AB test can achieve higher empirical power than the classical bootstrap.
5.3. Non-linear Scenario II: Linear outcome and general mediator
(1) Model. Suppose the outcome follows a linear model, and consider
(18) |
where can be the inverse of a canonical link function. Similarly to the non-linear Scenario I, we examine the conditional natural indirect effect/ME defined as the risk difference:
(19) |
(2) Non-regularity issue. We are interested in testing , which looks different from in Section 2. Nevertheless, we can show that the non-regularity issue similar to that in Section 2 would arise. This is formally stated as Proposition 8 below.
Proposition 8
Under the model (18), assume is strictly monotone, and the identification conditions in Section 2 hold. Let be a shorthand for in equation (19). Then
(19) = 0 holds if and only if or .
(i) . (ii) . (iii) .
Similarly to Proposition 5, Proposition 8 implies that a non-regularity issue caused by zero gradient would arise under . Specifically, the ordinary estimator of NIE can behave differently when , and when one of and . This is similar to the PoC statistic in Section 3 and the odds ratio in Section 5.2.
(3) Asymptotic theory and adaptive bootstrap. For ease of presentation, we next derive asymptotic theory under a specific instance of equation (18). Specifically, the mediator M is a Bernoulli random variable with its conditional mean given in equation (14) and , and the outcome Y follows the linear model in equation (2). The analysis in this section can be readily extended when the mediator M follows other canonical generalised linear models. As we are interested in how the local limiting behaviour of and coefficients change, we consider the following general local model:
(20) |
where , and .
Theorem 9 Asymptotic Property —
Assume Condition E.3 in the online supplementary material (a regularity condition on the design matrix similar to Condition 1). Under model (20),
when , ;
when , ,
where , , represents a normal distribution specified in online supplementary Lemma E.2, and is redefined to be a mean-zero normal distribution with a covariance same as the random vector , where and are defined in Theorem 1.
We next establish bootstrap consistency theory. Similarly to Section 5.2, let denote the non-parametric bootstrap estimator of . In particular, we redefine , where denotes the classical non-parametric bootstrap estimators of . Motivated by Theorem 9, we define the AB statistic
where and are defined similarly to equation (8). The following theorem establishes consistency of the AB statistic .
Theorem 10 Adaptive Bootstrap Consistency —
Under conditions of Theorem 9, when the tuning parameter satisfies and as where is specified as in equation (9).
(4) Numerical results. We conduct simulation studies to compare the AB and the classical non-parametric bootstrap under the model (18). The detailed results are provided in Section E.2 of the online supplementary material. The obtained results are very similar to those in Sections 4 and 5.2 Part 4, and therefore, we refrain from repeating the details here.
6. Data analysis
We illustrate an application of our proposed method to the analysis of data from a cohort study ‘Early Life Exposures in Mexico to ENvironmental Toxicants’ (ELEMENT) (Perng et al., 2019). One of the central interests in this scientific study concerns the MEs of metabolites, in particular, the family of lipids, on the association between environmental exposure and children growth and development. In the literature of environmental health sciences, exposure to endocrine-disrupting chemicals (EDCs) such as phthalates have been found to be detrimental to children’s health outcomes. Such findings of direct associations need to be further assessed for possible MEs through metabolites, because environmental toxicants such as phthalates can alter metabolic profiles at the molecular level.
Our illustration focuses on the outcome of body mass index (BMI) and exposure to one phthalate, MEOHP (Mono-(2-ethyl-5-oxohexyl) phthalate), which is a chemical in food production and storage. Body mass index is a widely used biomarker in paediatric research to measure childhood obesity. The dataset contains 382 adolescents aged 10–18 years old living in Mexico City. Our mediation analysis involves a set of 149 lipids that are hypothesised to have potential MEs on children’s growth and development. Our goal is to identify the mediation pathways of exposure to MEOHP lipids BMI. Two key potential confounders, gender and age, are included throughout the analyses. It is worth noting that adjusting for gender and age may not be sufficient for proper confounding adjustments. To conduct a more plausible causal analysis and interpretation, a further investigation is deemed necessary to rigorously assess the underlying causal assumptions such as a sensitivity analysis for the sequential ignorability assumption. In our analyses, we compare the results of six tests: JS-AB, JS-MaxP, PoC-AB, PoC-B, PoC-Sobel, and CMA, which have been compared in our simulation studies in Section 4. In particular, all the bootstrap methods (including JS-AB, PoC-AB, PoC-B, CMA) are conducted based on bootstrap resamples. Here, we no longer include the JS-B test and the MT-Comp method, as they are known to have inflated type I errors according to our simulation studies in Section 4.
As the sample size is limited compared to the large number of mediators, we first apply a screening analysis to identify a subset of lipids as potential candidates. We then jointly model the chosen lipids in the second step of our analysis. To mitigate the potential issues arising from double dipping the data, we adopt a random data-splitting approach by dividing the dataset into two distinct parts, each dedicated to one of the two respective analytic tasks. In the first screening step, we examine the effect along the path MEOHP lipid BMI for one lipid at a time, and the corresponding p-values are obtained with the six tests, respectively. For each test, we select a proportion () of lipids with the smallest p-values. The second step examines the path MEOHP selected lipids BMI, with the selected lipids being modelled jointly. To test the ME through a target lipid M within the selected set, we adjust for non-target mediators within the outcome model, following the discussions in Section 3.4; please see more details in Section G.3.3 of the online supplementary material. Subsequently, we select lipids based on their p-values obtained in the second step, after adjusting for multiple comparisons with controlled FDR (Benjamini & Hochberg, 1995). In our analysis, we explore a range of q values and observe very similar results, indicating the robustness of our approach to the choice of the screening threshold in the first step. We next present the results obtained with (i.e. 15 selected lipids based on their p-values), while results for other q values are detailed in Section G.3.1 of the online supplementary material.
As an illustrative example, we first present the results from a single random split in Table 1. Table 1 provides the corresponding p-values for the lipids selected by at least one test in the second step of the analysis. In this instance, the non-AB tests fail to detect any lipids. In contrast, the PoC-AB test identifies lauric acid (L.A) and FA.7.0-OH_1 (FA.7) while controlling the FDR at 0.10, and the JS-AB test selects both L.A and FA.7 when the FDR is controlled at 0.05 and 0.10, respectively. To gauge the variability of results across random splits, we repeat the data-splitting analysis 400 times. As shown in Figure 8, L.A and FA.7 are the two most frequently selected mediators in our analysis. Furthermore, the AB tests exhibit a notably higher chance of selecting L.A compared to the non-AB tests. This aligns with our observations from simulations in Section 4, suggesting that the AB tests can attain higher power than their non-AB counterparts. Lauric acid is a saturated fatty acid and is found in many vegetable fats and in coconut and palm kernel oils (Dayrit, 2015). The results suggest that the exposure to MEOHP may influence the process of breaking down fat tissue in the human body, leading to obesity and other adverse health outcomes.
Table 1.
Lipids selected in the second step
Lipids | JS-AB | JS-MaxP | PoC-AB | PoC-B | PoC-Sobel | CMA |
---|---|---|---|---|---|---|
L.A | 0.0017 () | 0.0399 | 0.0043 | 0.0406 | 0.1254 | 0.0426 |
FA.7 | 0.0008 () | 0.0146 | 0.0090 | 0.0236 | 0.0937 | 0.0208 |
Note. L.A = LAURIC.ACID; FA.7 = FA.7.0-OH_1; CMA = R package "causal mediation analysis". p-values with and indicate that the lipid specified by the row is selected by the method specified by the column under 0.05 and 0.10 FDR levels, respectively.)
Figure 8.
Times of mediators being selected in Step 2 by the six tests with FDR over 400 random splits of the data. FDR = false discovery rate.
Since the first screening step considers one mediator at a time, we also conduct sensitivity analyses to evaluate the effects of the unadjusted mediators similarly to Liu et al. (2021). We use the procedure proposed by Imai et al. (2010), which utilised the idea that the error term in the M-S model and that in the Y-M model are likely to be correlated if the sequential ignorability assumption is violated and vice versa. The detailed results are provided in Section G.3.4 in the online supplementary material. As a brief summary, the sensitivity analysis suggests that our first screening analysis could be robust to unadjusted mediators.
7. Discussion
This paper proposes a new adaptive framework for testing composite null hypotheses in mediation pathway analysis. The method incorporates a consistent pre-test threshold into the bootstrap procedure, which helps circumvent the non-regularity issue arising from the composite null hypotheses. If at least one of the two coefficients is significant, the procedure would reduce to the classical non-parametric bootstrap; otherwise, it approximates the local asymptotic behaviour of the statistics. Our proposed strategy accommodates different types of null hypotheses under various models. Particularly, we have established similar results for both the individual and joint MEs under classical linear SEMs, and we have generalised the conclusions under generalised linear models. Through comprehensive simulation studies, we have demonstrated that the adaptive tests can properly and robustly control the type I error under different types of null hypotheses and improve the statistical power.
The proposed methodology offers an exemplary analytic toolbox that can be broadly extended to handle other problems of similar types involving composite null hypotheses. There are several interesting future research directions that are worth exploration. First, the non-regularity issue can similarly arise in other scenarios, such as survival analysis (Huang & Pan, 2016; VanderWeele, 2011), different data types (Sohn & Li, 2019), partially linear models (Hines et al., 2021), and models with exposure–mediator interactions; see more discussions in Section H of the online supplementary material. These complicated models require special care in the causal interpretation of MEs and in the implementation of the bootstrap procedure, warranting further investigation. Second, when the dimension of mediators and covariates becomes high, it is of interest to extend the adaptive bootstrap under high-dimensional mediation models for both individual and joint MEs (Zhou et al., 2020). Similarly to our discussions on adjusting multivariate mediators at the end of Section 3, we might apply the adaptive bootstrap after properly adjusting high-dimensional covariates. In the data analysis, we have applied the marginal screening to reduce the dimension of mediators, which might potentially overlook the complicated causal dependence among mediators. When mediators have potential causal dependence, Shi and Li (2022) proposed to first estimate a directed acyclic graph of mediators and develop a testing procedure that can control the type I error to be less than or equal to the nominal level. It would be of interest to extend our proposed AB under such settings to mitigate potential conservatism. Third, the proposed AB strategy can also be utilised to examine the replicability across independent studies (Bogomolov & Heller, 2018), which is fundamental to scientific discovery. Specifically, let , denote the true signals from K independent studies, respectively. Testing whether the signals in these K studies are all significant corresponds to vs. . Moreover, for two studies with true signals and , to investigate whether the effects of both studies are significant in the same direction, one can formulate the hypothesis testing problem as vs. . For these testing problems, the null hypotheses are composite. To properly control the type I error, the adaptive strategy proposed in this paper may serve as a valuable building block, while additional effort is needed to analyse those different cases carefully. Last, in our data analysis, all measurements are obtained cross-sectionally at one given clinical visit within a time window of approximately three months. To further study potential long-term influences of toxicant exposures, it may be of interest to investigate how the MEs might vary over time. Such time-varying MEs may be naturally analysed in the scenario of longitudinal studies that collect time-varying measurements. This is a very challenging research field with only minimal investigation in the current literature (Bind et al., 2016). Extending the proposed AB method to analyse time-varying MEs would be a compelling future direction.
Supplementary Material
Contributor Information
Yinqiu He, Department of Statistics, University of Wisconsin, Madison, WI, USA.
Peter X K Song, Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
Gongjun Xu, Department of Statistics, University of Michigan, Ann Arbor, MI, USA.
Acknowledgments
We are grateful to the joint editors, Dr. Daniela Witten and Dr. Aurore Delaigle, an associate editor, and three anonymous referees for their helpful comments and suggestions. This work is partially supported by NSF DMS-1811734, DMS-2113564, SES-1846747, SES-2150601, NIH R01ES024732, R01ES033656, and Wisconsin Alumni Research Foundation.
Data availability
Due to privacy restrictions, we are unable to directly share the raw data publicly but they may be obtained offline according to a formal data request procedure outlined in the University of Michigan Data Use Agreement protocol. To satisfy the need of reproducibility, instead, we have introduced a pseudo-dataset with added noise on the GitHub repository: He et al. (2023).
Supplementary material
Supplementary material is available online at Journal of the Royal Statistical Society: Series B.
References
- Andrews D. W. (2001). Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica, 69(3), 683–734. 10.1111/1468-0262.00210 [DOI] [Google Scholar]
- Barfield R., Shen J., Just A. C., Vokonas P. S., Schwartz J., Baccarelli A. A., VanderWeele T. J., & Lin X. (2017). Testing for the indirect effect under the null for genome-wide mediation analyses. Genetic Epidemiology, 41(8), 824–833. 10.1002/gepi.22084 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baron R. M., & Kenny D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. 10.1037/0022-3514.51.6.1173 [DOI] [PubMed] [Google Scholar]
- Basu, D. (1980). Randomization analysis of experimental data: The Fisher randomization test. Journal of the American statistical association, 75(371), 575–582. [Google Scholar]
- Benjamini Y., & Hochberg Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300. 10.1111/j.2517-6161.1995.tb02031.x [DOI] [Google Scholar]
- Bind M.-A., Vanderweele T., Coull B., & Schwartz J. (2016). Causal mediation analysis for longitudinal data with exogenous exposure. Biostatistics, 17(1), 122–134. 10.1093/biostatistics/kxv029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blum M. G., Valeri L., François O., Cadiou S., Siroux V., Lepeule J., & Slama R. (2020). Challenges raised by mediation analysis in a high-dimension setting. Environmental Health Perspectives, 128(5), 055001. 10.1289/EHP6240 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bogomolov M., & Heller R. (2018). Assessing replicability of findings across two studies of multiple features. Biometrika, 105(3), 505–516. 10.1093/biomet/asy029 [DOI] [Google Scholar]
- Chen S. X. (2016). Peter Hall’s contributions to the bootstrap. The Annals of Statistics, 44, 1821–1836. 10.1214/16-AOS1489 [DOI] [Google Scholar]
- Dai J. Y., Stanford J. L., & LeBlanc M. (2020). A multiple-testing procedure for high-dimensional mediation hypotheses. Journal of the American Statistical Association, 1–16. 10.1080/01621459.2020.1765785 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daniel R., De Stavola B., Cousens S., & Vansteelandt S. (2015). Causal mediation analysis with multiple mediators. Biometrics, 71(1), 1–14. 10.1111/biom.v71.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dayrit F. M. (2015). The properties of lauric acid and their significance in coconut oil. Journal of the American Oil Chemists’ Society, 92(1), 1–15. 10.1007/s11746-014-2562-7 [DOI] [Google Scholar]
- Derkach A., Moore S. C., Boca S. M., & Sampson J. N. (2020). Group testing in mediation analysis. Statistics in Medicine, 39(18), 2423–2436. 10.1002/sim.v39.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Djordjilović V., Hemerik J., & Thoresen M. (2020). ‘On optimal two-stage testing of multiple mediators’, arXiv, arXiv:2007.02844, preprint. [DOI] [PMC free article] [PubMed]
- Djordjilović V., Page C. M., Gran J. M., Nøst T. H., Sandanger T. M., Veierød M. B., & Thoresen M. (2019). Global test for high-dimensional mediation: Testing groups of potential mediators. Statistics in Medicine, 38(18), 3346–3360. 10.1002/sim.8199 [DOI] [PubMed] [Google Scholar]
- Drton M., & Xiao H. (2016). Wald tests of singular hypotheses. Bernoulli, 22(1), 38–59. 10.3150/14-BEJ620 [DOI] [Google Scholar]
- Du J., Zhou X., Hao W., Liu Y., Jennifer S., & Mukherjee B. (2022). ‘Methods for large-scale single mediator hypothesis testing: Possible choices and comparisons’, arXiv, arXiv:2203.13293, preprint.
- Dufour J.-M., Renault E., & Zinde-Walsh V. (2013). ‘Wald tests when restrictions are locally singular’, arXiv, arXiv:1312.0569, preprint.
- Fritz M. S., & MacKinnon D. P. (2007). Required sample size to detect the mediated effect. Psychological Science, 18(3), 233–239. 10.1111/j.1467-9280.2007.01882.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fulcher I. R., Shi X., & Tchetgen E. J. T. (2019). Estimation of natural indirect effects robust to unmeasured confounding and mediator measurement error. Epidemiology, 30(6), 825–834. 10.1097/EDE.0000000000001084 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Glonek G. (1993). On the behaviour of Wald statistics for the disjunction of two regular hypotheses. Journal of the Royal Statistical Society: Series B (Methodological), 55(3), 749–755. 10.1111/j.2517-6161.1993.tb01938.x [DOI] [Google Scholar]
- Guo X., Li R., Liu J., & Zeng M. (2022). High-dimensional mediation analysis for selecting DNA methylation loci mediating childhood trauma and cortisol stress reactivity. Journal of the American Statistical Association, 1–32. 10.1080/01621459.2022.205313635757777 [DOI] [Google Scholar]
- Hao W., & Song P. X.-K. (2023). A simultaneous likelihood test for joint mediation effects of multiple mediators. Statistica Sinica, 33(4), 2305–2326. [Google Scholar]
- Hayes A. F. (2017). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Publications. [Google Scholar]
- He Y., Song P. X.-K., & Xu G. (2023). ABtest. https://github.com/yinqiuhe/ABtest.
- Hines O., Vansteelandt S., & Diaz-Ordaz K. (2021). Robust inference for mediated effects in partially linear models. Psychometrika, 86, 595–618. 10.1007/s11336-021-09768-z [DOI] [PubMed] [Google Scholar]
- Huang Y.-T. (2018). Joint significance tests for mediation effects of socioeconomic adversity on adiposity via epigenetics. The Annals of Applied Statistics, 12(3), 1535–1557. 10.1214/17-AOAS1120 [DOI] [Google Scholar]
- Huang Y.-T. (2019a). Genome-wide analyses of sparse mediation effects under composite null hypotheses. The Annals of Applied Statistics, 13, 60–84. 10.1214/18-AOAS1181 [DOI] [Google Scholar]
- Huang Y.-T. (2019b). Variance component tests of multivariate mediation effects under composite null hypotheses. Biometrics, 75(4), 1191–1204. 10.1111/biom.v75.4 [DOI] [PubMed] [Google Scholar]
- Huang Y.-T., & Cai T. (2016). Mediation analysis for survival data using semiparametric probit models. Biometrics, 72(2), 563–574. 10.1111/biom.v72.2 [DOI] [PubMed] [Google Scholar]
- Huang Y.-T., & Pan W.-C. (2016). Hypothesis test of mediation effect in causal mediation model with high-dimensional continuous mediators. Biometrics, 72(2), 402–413. 10.1111/biom.v72.2 [DOI] [PubMed] [Google Scholar]
- Imai K., Keele L., & Tingley D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309–334. 10.1037/a0020761 [DOI] [PubMed] [Google Scholar]
- Imai K., Keele L., & Yamamoto T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25(1), 51–71. 10.1214/10-STS321 [DOI] [Google Scholar]
- Imai K., & Yamamoto T. (2013). Identification and sensitivity analysis for multiple causal mechanisms: Revisiting evidence from framing experiments. Political Analysis, 21(2), 141–171. 10.1093/pan/mps040 [DOI] [Google Scholar]
- Imbens G. W., & Rubin D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press. [Google Scholar]
- Jérolon A., Baglietto L., Birmelé E., Alarcon F., & Perduca V. (2020). Causal mediation analysis in presence of multiple mediators uncausally related. The International Journal of Biostatistics, 17(2), 191–221. 10.1515/ijb-2019-0088 [DOI] [PubMed] [Google Scholar]
- Laber E. B., & Murphy S. A. (2011). Adaptive confidence intervals for the test error in classification. Journal of the American Statistical Association, 106(495), 904–913. 10.1198/jasa.2010.tm10053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Z., Shen J., Barfield R., Schwartz J., Baccarelli A. A., & Lin X. (2021). Large-scale hypothesis testing for causal mediation effects with applications in genome-wide epigenetic studies. Journal of the American Statistical Association, 117(537), 67–81. 10.1080/01621459.2021.1914634[AQ11] [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loh W. W., Moerkerke B., Loeys T., & Vansteelandt S. (2021). Disentangling indirect effects through multiple mediators without assuming any causal structure among the mediators. Psychological Methods, 27(6), 982–999. 10.1037/met0000314[AQ12] [DOI] [PubMed] [Google Scholar]
- MacKinnon D. (2008). Introduction to statistical mediation analysis. Multivariate applications book series. Taylor & Francis. [Google Scholar]
- MacKinnon D. P., & Fairchild A. J. (2009). Current directions in mediation analysis. Current Directions in Psychological Science, 18(1), 16–20. 10.1111/j.1467-8721.2009.01598.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacKinnon D. P., Lockwood C. M., Hoffman J. M., West S. G., & Sheets V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7(1), 83–104. 10.1037/1082-989X.7.1.83 [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacKinnon D. P., Lockwood C. M., & Williams J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1), 99–128. 10.1207/s15327906mbr3901_4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKeague I. W., & Qian M. (2015). An adaptive resampling test for detecting the presence of significant predictors. Journal of the American Statistical Association, 110(512), 1422–1433. 10.1080/01621459.2015.1095099 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKeague I. W., & Qian M. (2019). Marginal screening of tables in large-scale case-control studies. Biometrics, 75(1), 163–171. 10.1111/biom.v75.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miles C., & Chambaz A. (2021). ‘Optimal tests of the composite null hypothesis arising in mediation analysis’, arXiv, arXiv:2107.07575, preprint .
- Pearl J. (2001). Direct and indirect effects. In Probabilistic and causal inference: the works of Judea Pearl (pp. 373–392).
- Perng W., Tamayo-Ortiz M., Tang L.Sánchez B. N., Cantoral A., Meeker J. D., Dolinoy D. C., Roberts E. F., Martinez-Mier E. A., Lamadrid-Figueroa H., Song P. X. K., Ettinger A. S., Wright R., Arora M., Schnaas L., Watkins D. J., Goodrich J. M., Garcia R. C., Solano-Gonzalez M., Peterson K. E. (2019). The early life exposure in Mexico to environmental toxicants (ELEMENT) project. British Medical Journal Open, 9(8), 10.1136/bmjopen-2019-030427 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robins J. M., & Greenland S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3(2), 143–155. 10.1097/00001648-199203000-00013 [DOI] [PubMed] [Google Scholar]
- Sampson J. N., Boca S. M., Moore S. C., & Heller R. (2018). FWER and FDR control when testing multiple mediators. Bioinformatics, 34(14), 2418–2424. 10.1093/bioinformatics/bty064 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi C., & Li L. (2022). Testing mediation effects using logic of boolean matrices. Journal of the American Statistical Association, 117(540), 2014–2027. 10.1080/01621459.2021.1895177 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sobel M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13, 290–312. 10.2307/270723 [DOI] [Google Scholar]
- Sohn M. B., & Li H. (2019). Compositional mediation analysis for microbiome studies. The Annals of Applied Statistics, 13(1), 661–681. 10.1214/18-AOAS1210 [DOI] [Google Scholar]
- Tingley D., Yamamoto T., Hirose K., Keele L., & Imai K. (2014). Mediation: R package for causal mediation analysis. Journal of Statistical Software, 59(5), 1–38. 10.18637/jss.v059.i0526917999 [DOI] [Google Scholar]
- Valeri L., & VanderWeele T. J. (2013). Mediation analysis allowing for exposure–mediator interactions and causal interpretation: Theoretical assumptions and implementation with SAS and SPSS macros. Psychological Methods, 18(2), 137–150. 10.1037/a0031034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Vaart A. W. (2000). Asymptotic statistics. Cambridge Series in Statistical and Probabilistic Mathematics (Vol. 3). Cambridge University Press. [Google Scholar]
- VanderWeele T. J. (2011). Causal mediation analysis with survival data. Epidemiology, 22(4), 582–585. 10.1097/EDE.0b013e31821db37e [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanderWeele T. J. (2015). Explanation in causal inference: Methods for mediation and interaction. Oxford University Press. [Google Scholar]
- VanderWeele T. J., & Vansteelandt S. (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and its Interface, 2(4), 457–468. 10.4310/SII.2009.v2.n4.a7 [DOI] [Google Scholar]
- VanderWeele T. J., & Vansteelandt S. (2010). Odds ratios for mediation analysis for a dichotomous outcome. American Journal of Epidemiology, 172(12), 1339–1348. 10.1093/aje/kwq332 [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanderWeele T. J., & Vansteelandt S. (2014). Mediation analysis with multiple mediators. Epidemiologic Methods, 2(1), 95–115. 10.1515/em-2012-0010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanderWeele T. J., Vansteelandt S., & Robins J. M. (2014). Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology, 25(2), 300–306. 10.1097/EDE.0000000000000034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Garderen K., & Van Giersbergen N. (2022). A nearly similar powerful test for mediation. arXiv, arXiv:2012.11342, preprint.
- Wang H. J., McKeague I. W., & Qian M. (2018). Testing for marginal linear effects in quantile regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(2), 433–452. 10.1111/rssb.12258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao S. D., Cai T. T., & Li H. (2014). More powerful genetic association testing via a new statistical framework for integrative genomics. Biometrics, 70(4), 881–890. 10.1111/biom.v70.4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou R. R., Wang L., & Zhao S. D. (2020). Estimation and inference for the indirect effect in high-dimensional linear mediation models. Biometrika, 107(3), 573–589. 10.1093/biomet/asaa016 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Due to privacy restrictions, we are unable to directly share the raw data publicly but they may be obtained offline according to a formal data request procedure outlined in the University of Michigan Data Use Agreement protocol. To satisfy the need of reproducibility, instead, we have introduced a pseudo-dataset with added noise on the GitHub repository: He et al. (2023).