Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2023 Nov 14;86(2):411–434. doi: 10.1093/jrsssb/qkad129

Adaptive bootstrap tests for composite null hypotheses in the mediation pathway analysis

Yinqiu He 1, Peter X K Song 2, Gongjun Xu 3,✉,2
PMCID: PMC11090400  PMID: 38746015

Abstract

Mediation analysis aims to assess if, and how, a certain exposure influences an outcome of interest through intermediate variables. This problem has recently gained a surge of attention due to the tremendous need for such analyses in scientific fields. Testing for the mediation effect (ME) is greatly challenged by the fact that the underlying null hypothesis (i.e. the absence of MEs) is composite. Most existing mediation tests are overly conservative and thus underpowered. To overcome this significant methodological hurdle, we develop an adaptive bootstrap testing framework that can accommodate different types of composite null hypotheses in the mediation pathway analysis. Applied to the product of coefficients test and the joint significance test, our adaptive testing procedures provide type I error control under the composite null, resulting in much improved statistical power compared to existing tests. Both theoretical properties and numerical examples of the proposed methodology are discussed.

Keywords: bootstrap, composite hypothesis, mediation analysis, structural equation model

1. Introduction

Mediation analysis plays a crucial role in investigating the underlying mechanism or pathway between an exposure and an outcome through an intermediate variable called a mediator (MacKinnon, 2008; VanderWeele, 2015). It decomposes the ‘total effect’ of an exposure on an outcome into an indirect effect that is through a given mediator and a direct effect, not through the mediator. The former holds the key to uncovering the exposure-outcome mechanism and is often known as the mediation effect (ME). The ME was initially studied under structural equation models (SEMs) in social sciences (Baron & Kenny, 1986; Sobel, 1982) and has been given formal causal definitions (Imai et al., 2010; Pearl, 2001; Robins & Greenland, 1992) within the counterfactual framework (Imbens & Rubin, 2015). Examining the presence or absence of the mediation effect can facilitate a deeper understanding of the underlying causal pathway from the exposure to the outcome and can give essential insights into intervention consequences, e.g. manipulating the mediator to change the exposure-outcome mechanism. As a result, it is of interest to apply mediation analysis in many scientific fields, such as psychology (MacKinnon & Fairchild, 2009; Valeri & VanderWeele, 2013), genomics (Guo et al., 2022; Huang, 2018; Huang & Pan, 2016; Zhao et al., 2014), and epidemiology (Barfield et al., 2017; Fulcher et al., 2019), among others.

To analyse the ME, one classical setting models the relationship between the exposure, the potential mediator, and the outcome as a directed acyclic graph; see Figure 1. Specifically, let αS parametrise the causal effect of the exposure on the mediator, and βM parametrise the causal effect of the mediator on the outcome conditioning on the exposure. Then in the classical linear SEM (Baron & Kenny, 1986; Sobel, 1982), the causal ME is proportional to αSβM under suitable identification assumptions (Imai et al., 2010). More generally, this product expression αSβM may also appear in the causal ME under many other models, such as generalised linear models and survival analysis models (Huang & Cai, 2016; VanderWeele, 2011; VanderWeele & Vansteelandt, 2010). Therefore, the important scientific question of whether or not the ME is absent can be formulated as the hypothesis testing problem H0:αSβM=0 against HA:αS0 and βM0 (MacKinnon, 2008). Note that H0:αSβM=0 holds if and only if αS=0 or βM=0, corresponding to two parameter sets Pα={(αS,βM):αS=0} and Pβ={(αS,βM):βM=0}, respectively. It follows that the parameter set of H0:αSβM=0 is the union of two sets Pα and Pβ. We visualise Pα, Pβ, and their union  PαPβ in Figure 2a–c, respectively.

Figure 1.

Figure 1.

Directed acyclic graph for mediation analysis. The exposure is S; the mediator is M; the outcome is Y; the potential confounders are X.

Figure 2.

Figure 2.

Visualisation of parameter spaces of (αS,βM) under different constraints. (a) αS=0. (b) βM=0. (c) αSβM=0, and (d) αS=βM=0.

To test H0:αSβM=0, a broad class of methods is based on the product of coefficients (PoC) α^S,nβ^M,n, where α^S,n and β^M,n denote the sample estimates of parameters αS and βM, respectively. One popular PoC method is Sobel’s (1982) test, which is a Wald-type test and approximates the variance of α^S,nβ^M,n by the first-order Delta method. In addition, the joint significance (JS) test (Fritz & MacKinnon, 2007), also known as the MaxP test, is another widely used test that rejects the H0 of no ME if both α^S,n and β^M,n pass a certain cut-off of statistical significance. Liu et al. (2021) pointed out that the MaxP test is a kind of likelihood ratio test under normality assumptions.

Although there are various procedures available for testing MEs, properly controlling the type I error remains a challenge due to the intrinsic structure of the null parameter space. In particular, H0:αSβM=0 is composed of three different parameter cases: (i) αS=0 and βM0; (ii) αS0 and βM=0; and (iii) αS=0 and βM=0. Case (iii), illustrated in Figure 2d, is a singleton given by the intersection set PαPβ. Under case (iii), both parameters αS and βM are fixed at 0, whereas cases (i) and (ii) have one fixed parameter and the other parameter to be estimated. This intrinsic difference leads to distinct asymptotic behaviours of test statistics. Since the underlying truth is typically unknown in practice, it is difficult to obtain proper p-values under the composite null hypothesis.

Particularly, in the popular Sobel’s test and the MaxP test, the asymptotic distributions of the test statistics under cases (i) and (ii) are known to be different from those under case (iii). These tests have been shown to be overly conservative in case (iii), because statistical inference is carried out according to the asymptotic distributions determined in cases (i) and (ii) (Fritz & MacKinnon, 2007; MacKinnon et al., 2002). This issue has gained a surge of attention in recent genome-wide epidemiological studies, where for the majority of omics markers, it holds that αS=βM=0, and the classical tests are generally underpowered (Barfield et al., 2017). Some recent work (Dai et al., 2020; Du et al., 2022; Liu et al., 2021) utilised the relative proportions of the cases (i)–(iii) in the population, but they rely on accurate estimation of the true proportions. Huang (2019a, 2019b) adjusted the composition of H0:αSβM=0 through the variances of test statistics but required that the non-zero coefficients are weak and sparse, which can be violated when the sample size is large. Another line of related research (Derkach et al., 2020; Djordjilović et al., 2020, 2019; Sampson et al., 2018) used a screening step to control the family-wise error rate or the false discovery rate (FDR) for a large group of hypotheses, but they did not directly provide proper p-values for each of the composite null hypotheses. Van Garderen and Van Giersbergen (2022) proposed to construct a critical region for testing that can nearly control the type I error at one prespecified significance level. Miles and Chambaz (2021) construct a rejection region that can achieve type I error control at significance level ω with ω1 a positive integer. Despite these developments, the fundamental issue of correctly characterising the distributions of test statistics to obtain well-calibrated p-values under a composite null hypothesis remains an important challenging problem in the current literature of mediation analyses.

In this paper, we develop a new hypothesis testing methodology to address the challenge of obtaining uniformly distributed p-values under the composite null hypothesis of no ME. Particularly, we propose an adaptive bootstrap procedure that can flexibly accommodate different types of null hypotheses. In the current literature, the non-parametric bootstrap is directly applied to the PoC test statistic α^S,nβ^M,n, which has been, unfortunately, found numerically to be overly conservative when αS=βM=0 (Barfield et al., 2017; Fritz & MacKinnon, 2007). This paper unveils analytically the reason for the failure of the conventional non-parametric bootstrap method, which stems from non-regular limiting behaviours of the PoC test statistic at the neighbourhood of (αS,βM)=(0,0). To overcome the non-regularity near (αS,βM)=(0,0), we derive an explicit representation of the asymptotic distribution of the PoC test statistic through a local model, and perform a consistent bootstrap estimation by incorporating suitable thresholds. In addition, for the JS test, we show that the conventional non-parametric bootstrap also fails to control type I error properly, which can be fixed by an adaptive bootstrap test similar to the procedure of the PoC test. For both the PoC test and the JS test, the proposed methods can circumvent the non-standard limiting behaviours of the test statistics and therefore uniformly adapt to different types of null cases of no ME.

The structure of this paper is as follows. In Section 2, we briefly review the basic problem setting and several popular testing methods in the literature. In Section 3, we introduce the adaptive bootstrap method that can be applied to the representative PoC and JS tests under classical linear SEMs. In Section 4, we conduct extensive simulation studies to compare the finite-sample performances of the proposed tests with popular counterparts. In Section 5, we develop extensions of the adaptive bootstrap, including joint testing of multivariate mediators and testing MEs under non-linear models. In Section 6, we apply our adaptive bootstrap tests to investigate the mediation pathways of metabolites on the association of environmental exposures with a health outcome. We conclude the paper and discuss interesting extensions in Section 7.

1.1. Notation

For two sequences of real numbers {an} and {bn}, we write an=o(bn) if limnan/bn=0. We let d denote convergence in distribution. We let d* denote bootstrap consistency relative to the Kolmogorov–Smirnov distance; see an introduction of this consistency notion in Section 23 of van der Vaart (2000). To ensure clarity, we also provide the definitions of all the convergence modes in Section A of the online supplementary material.

2. Hypothesis tests of no ME

To examine the ME of the exposure S on the outcome Y through the intermediate variable M, the causal inference literature utilises the counterfactual framework (VanderWeele, 2015). In particular, let M(s) denote the potential value of the mediator under the exposure S=s, and let Y(s,m) denote the potential outcome that would have been observed if S and M had been set to s and m, respectively. Throughout the paper, we adopt the stable unit treatment value assumption (Rubin, 1980), so that M=M(S) and Y=Y(S,M(S)). Then the ME or the natural indirect effect of S=s vs. s* (Imai et al., 2010) is defined as

E{Y(s,M(s))Y(s,M(s*))}. (1)

For ease of illustration, we consider the popular linear SEM (MacKinnon, 2008; VanderWeele, 2015):

M=αSS+XαX+ϵM,Y=βMM+XβX+τSS+ϵY, (2)

where X denotes a vector of confounders with the first element being 1 for the intercept, and ϵY and ϵM are independent error terms with mean zero and finite variances σϵY2 and σϵM2, respectively. We assume that there are n independent and identically distributed (i.i.d.) observations, {(Si,Xi,Mi,Yi):i=1,,n}, sampled from Model (2). The independence of ϵY and ϵM holds under the following no unmeasured confounding assumptions. In particular, let ABC denote the independence of A and B conditional on C, and we assume that for all levels of s,s*, and m, (i) Y(s,m)S{X=x}, no confounder for the relation of Y and S; (ii) Y(s,m)M{S=s,X=x}, no confounder for the relation of Y and M conditioning on S=s; (iii) M(s)S{X=x}, no confounder for the relation of M and S; (iv) Y(s,m)M(s*){X=x}, no confounder for the MY relation that is affected by S (VanderWeele & Vansteelandt, 2009). Under these assumptions, the model can be visualised by the directed acyclic graph in Figure 1, and the ME (1) equals αSβM(ss*).

Therefore, the scientific goal of detecting the presence of an ME can be formulated as the following hypothesis testing problem:

H0:αSβM=0vs.HA:αSβM0.

This null hypothesis is composite and can be decomposed into three disjoint cases:

H0:{H0,1:αS=0,βM0;H0,2:αS0,βM=0;H0,3:αS=βM=0, (3)

and the alternative hypothesis is HA:αS0 and βM0.

Remark 1

Composite null problems similar to equation (3) can occur in settings other than Model (2); the latter is considered to demonstrate the essential analytic details useful for possible generalisations. Similar issues have also been observed in many other scenarios, including partially linear models (Hines et al., 2021), survival analysis (VanderWeele, 2011), and high-dimensional models (Zhou et al., 2020). The analytic details of the methodology development in this paper can pave the path for useful generalisations to other important statistical models and applications.

To test the composite null (3), various methods have been proposed, and a comprehensive survey can be found in MacKinnon et al. (2002). There are two representative classes of tests: (I) the PoC test, which corresponds to a Wald-type test and (II) the JS test, which is the likelihood ratio test under normality of the error terms (Liu et al., 2021). (I) The first class of methods examine the PoC: α^S,nβ^M,n, where α^S,n and β^M,n denote consistent estimates of αS and βM, respectively. One common practice is to apply a normal approximation to α^S,nβ^M,n divided by its standard error, where Sobel (1982) derives the standard error formula by the first-order Delta method. In addition to the large-sample approximation, the bootstrap has also been used to evaluate the distribution of α^S,nβ^M,n (Fritz & MacKinnon, 2007; MacKinnon et al., 2004). (II) The JS test, also known as the MaxP test, rejects H0:αSβM=0 if max{pαS,pβM}<ω, where ω is a prespecified significance level, and pαS and pβM denote the p-values for H0:αS=0 (the link SM) and H0:βM=0 (the link MY), respectively. Despite their popularity, these methods have been found numerically to be overly conservative under H0,3 in equation (3) (Barfield et al., 2017; MacKinnon et al., 2002). See a further discussion on the non-regular asymptotic behaviours of statistics underlying the conservatism in Section 3.

Similar issues have also been broadly recognised for Wald tests in various statistical problems including three-way contingency table analysis and factor analysis (Drton & Xiao, 2016; Dufour et al., 2013; Glonek, 1993). However, characterising non-regular asymptotic behaviours under the singular null hypothesis H0,3 is still insufficient to address intrinsic technical challenges in testing (3). In particular, the composite null (3) includes not only the singular case H0,3 but also the other two non-singular cases H0,1 and H0,2. Since a test statistic follows different distributions under different null cases, and the underlying true null case is unknown, it is difficult to obtain uniformly distributed p-values through one simple asymptotic distribution under equation (3). To address this technical difficulty, we adopt, justify, and evaluate an adaptive bootstrap procedure. For both Wald-type PoC test and non-Wald JS test, we will show that the proposed procedure can naturally adapt to the three types of null hypotheses in equation (3) and yield uniformly distributed p-values across all different null cases.

3. Adaptive bootstrap tests

In this section, we propose a general Adaptive Bootstrap (AB) procedure for testing the composite null hypothesis (3). For illustration, we apply the adaptive bootstrap to the representative PoC test and show it can address the non-regularity issue. We emphasise that this general strategy can be applied in a wide range of scenarios. We also derive adaptive bootstrap procedure in other examples, including the JS test (Section B in the online supplementary material), joint testing of multivariate mediators (Section 5.1) and testing ME under the generalised linear models (Sections 5.2 and 5.3.).

To conduct hypothesis testing or estimate confidence intervals for statistics whose limiting distributions deviate from the normal, a simple and powerful approach is to apply the bootstrap resampling technique. However, the classical bootstrap is not a panacea, and on some occasions it can fail to work properly, including unfortunately the non-regular scenarios considered in this paper. In particular, it has been observed through simulation studies that the classical bootstrap technique is overly conservative under H0,3:αS=βM=0 (Barfield et al., 2017; MacKinnon et al., 2002). We next unveil the key insights underlying the failure of the classical bootstrap, which motivates our use of the adaptive bootstrap.

3.1. Non-regularity of the PoC test

When (αS,βM)(0,0), one of the first-order gradients αSβMαS=βM and αSβMβM=αS is non-zero. Thus, the Delta method can be applied to support the use of Sobel’s test (based on asymptotic normality) and classical bootstrap (Barfield et al., 2017). However, under H0,3:(αS,βM)=(0,0), the gradients αSβMαS=αSβMβM=0, and validity of Sobel’s test and the classical bootstrap cannot be obtained as above.

We next illustrate the non-regular limiting behaviour of the PoC α^S,nβ^M,n under H0,3. For ease of exposition, consider a special case of equation (2): M=αSS+ϵM and Y=βMM+ϵY. Let (α^S,n,β^M,n) denote the ordinary least squares estimators of (αS,βM), and let (α^S,n*,β^M,n*) the corresponding non-parametric bootstrap estimators. Here and throughout this paper, we use the superscript ‘*’ to indicate estimators obtained from the non-parametric bootstrap, namely ‘bootstrap in pairs’ in the regression setting. By classical asymptotic theory (van der Vaart, 2000), under mild conditions,

n(α^S,nαS,β^M,nβM)d(ZS,0,ZM,0), (4)

where (ZS,0,ZM,0) denotes a mean-zero normal random vector with a covariance matrix given by that of the random vector (ϵMS/VS,0,ϵYM/VM,0), VS,0=E(S2), VM,0=E(M2).

Moreover,

n(α^S,n*α^S,n,β^M,n*β^M,n)d(ZS,0,ZM,0), (5)

where (ZS,0,ZM,0) is an independent copy of (ZS,0,ZM,0) in equation (4) under the same distribution. By equation (4), n(α^Sβ^MαSβM)dZS,0ZM,0 under H0,3, with the convergence rate n different from the standard parametric n rate. By equations (4)–(5), we have n(α^S,n*β^M,n*α^S,nβ^M,n)=n{(α^S,n*α^S,n)β^M,n+(β^M,n*β^M,n)α^S,n+(α^S,n*α^S)  (β^M,n*β^M,n)}dZS,0M,0+ZS,0ZM,0+ZS,0M,0. We can see that the limit of n(α^S,n*β^M,n*α^S,nβ^M,n) is different from that of n(α^S,nβ^M,nαSβM), implying inconsistency of the classical non-parametric bootstrap.

3.2. Adaptive bootstrap of the PoC test

To address the challenge of correctly evaluating the distribution of the PoC statistic, we utilise the local asymptotic analysis framework. Intuitively, the goal is to evaluate if a small change in the target parameters leads to little change on the limit of the statistics. To this end, given targeted parameters (αS,βM), we define their locally perturbed counterparts as αS,n=αS+n1/2bα, and βM,n=βM+n1/2bβ, respectively, where (bα,bβ) denote the local parameters of perturbations from our targeted coefficients (αS,βM). We then frame the problem under the local linear SEM as follows:

M=αS,nS+XαX+ϵM,Y=βM,nM+XβX+τSS+ϵY, (6)

where ϵM and ϵY are independent error terms with mean zero and finite variances. Fixing the target parameters (αS,βM), according to van der Vaart (2000) the formulation given in equation (6) may also be viewed as a local statistical experiment with local parameters (bα,bβ) under which we are interested in examining the limit of test statistics. Note that with the local parameters (bα,bβ)=(0,0), equation (6) reduces to the original model (2) with the parameters (αS,βM). Our inference goal remains the same: that is, to test the underlying true coefficients (αS,βM). Technically, we consider a n-vicinity of local neighbouring values (αS,n,βM,n) only for the theoretical investigation of local asymptotic behaviours. Such an idea has also been used for studying other non-regularity issues (McKeague & Qian, 2015; Wang et al., 2018). To examine the limit of α^S,nβ^M,nαS,nβM,n under equation (6), we assume the following general regularity Condition 1.

Condition 1

(C1.1) E(ϵM|X,S)=0 and E(ϵY|X,S,M)=0. (C1.2) E(DD) is a positive definite matrix with bounded eigenvalues, where D=(X,M,S). (C1.3) The second moments of (ϵM,ϵY,S,M,ϵMS,ϵYM) are finite, where S=SXQ1,S with Q1,S={E(XX)}1×E(XS), and M=MX~Q2,M with X~=(X,S) and Q2,M={E(X~X~)}1×E(X~M).

Similarly to our above discussions under the simplified model, Theorem 1 establishes the limits of n×(α^S,nβ^M,nαSβM) and n×(α^S,nβ^M,nαSβM) when (αS,βM)(0,0) and (αS,βM)=(0,0), respectively.

Theorem 1 Asymptotic Property —

Assume Condition 1. Under the local model (6),

  1. when (αS,βM)(0,0), n×(α^S,nβ^M,nαS,nβM,n)dαSZM+βMZS;

  2. when (αS,βM)=(0,0), n×(α^S,nβ^M,nαS,nβM,n)dbαZM+bβZS+ZMZS,

where (ZS,ZM) is a mean-zero normal random vector with a covariance matrix given by that of the random vector (ϵMS/VS,ϵYM/VM) with VS=E(S2), and VM=E(M2).

Theorem 1 suggests the limit of n(α^S,nβ^M,nαS,nβM,n) is not uniform with respect to (αS,βM), and the non-uniformity occurs around (αS,βM)=(0,0). On the other hand, in the neighbourhood of (αS,βM)=(0,0), the limit of n(α^S,nβ^M,nαS,nβM,n) is continuous as a function of (bα,bβ)R2 into the space of distribution functions. Therefore, using this local limit in the bootstrap, we expect better finite-sample accuracy, compared to the classical non-parametric bootstrap that does not take into account the local asymptotic behaviour.

Moreover, to discern the null cases, we will consider a decomposition of the statistic. The idea is to isolate the possibility that (αS,βM)(0,0) by comparing the absolute value of the standardised statistics Tα,n=nα^S,n/σ^αS,n and Tβ,n=nβ^M,n/σ^βM,n to some thresholds, where σ^αS,n and σ^βM,n denote the sample standard deviations of α^S,n and β^M,n, respectively. Specifically, we decompose

α^S,nβ^M,nαS,nβM,n=(α^S,nβ^M,nαS,nβM,n)×(1IαS,λnIβM,λn)+(α^S,nβ^M,nαS,nβM,n)×IαS,λnIβM,λn (7)

with the indicators IαS,λn=I{|Tα,n|λn,αS=0} and IβM,λn=I{|Tβ,n|λn,βM=0}, where I{E} represents the indicator function of an event E, and λn is a certain threshold to be specified. When (αS,βM)(0,0), the classical bootstrap is consistent for the first term in equation (7). For the second term in equation (7), we next develop a bootstrap statistic motivated by Theorem 1(ii).

To construct the bootstrap statistic, we introduce some notation following the convention in the empirical process literature (van der Vaart, 2000). Particularly, throughout the paper, Pn denotes the population probability measure of (S,X,M,Y), Pn denotes the empirical measure with respect to the i.i.d. observations {(Si,Xi,Mi,Yi):i=1,,n}, and Pn* denotes the non-parametric bootstrap version of Pn. For any measurable function f(S,X,M,Y), we define the empirical process Gnf=n(PnPn)f=n[n1i=1nf(Si,Xi,Mi,Yi)E{f(S,X,M,Y)}], and its non-parametric bootstrap version is Gn*=n(Pn*Pn). With the above notation, we define the sample versions of Q1,S,Q2,M,S, and M in Condition 1 as Q^1,S={Pn(XX)}1Pn(XS), Q^2,M={Pn(X~X~)}1  Pn(X~M), S^=SXQ^1,S, and M^=MX~Q^2,M, respectively, where we use ‘^’ to denote the sample counterparts in this paper. Similarly, we define their non-parametric bootstrap counterparts (Q1,S*,Q2,M*,S*,M*) by replacing Pn with Pn* in the above definitions.

When (αS,βM)=(0,0), motivated by Theorem 1(ii), we construct a bootstrap statistic Rn*(bα,bβ) as a bootstrap counterpart of bαZM+bβZS+ZMZS. In particular, Rn*(bα,bβ)=bαZM,n*+bβZS,n*+ZS,n*ZM,n*, where ZS,n*=Gn*(ϵ^M,nS*)/VS,n*, ZM,n*=Gn*(ϵ^Y,nM*)/VM,n*,  VS,n*=Pn*{(S*)2},  VM,n*=Pn*{(M*)2}, and ϵ^M,n and ϵ^Y,n denote the sample residuals obtained from the ordinary least squares regressions of the two models in equation (6). When (αS,βM)(0,0), we still consider the classical non-parametric bootstrap estimator α^S,n*β^M,n*. To develop an adaptive bootstrap test, we utilise the decomposition (7) and propose to replace the indicators IαS,λn and IβM,λn in equation (7) by

IαS,λn*=I{|Tα,n*|λn,|Tα,n|λn}andIβM,λn*=I{|Tβ,n*|λn,|Tβ,n|λn}, (8)

where Tα,n*=nα^S,n*/σ^αS,n* and Tβ,n*=nβ^M,n*/σ^βM,n* denote the classical non-parametric bootstrap versions of Tα,n and Tβ,n, respectively. Following the decomposition in equation (7), we define a statistic

U*=(α^S,n*β^M,n*α^S,nβ^M,n)×(1IαS,λn*IβM,λn*)+n1Rn*(bα,bβ)×IαS,λn*IβM,λn*,

termed as Adaptive Bootstrap (AB) test statistic in this paper. Theorem 2 below establishes the bootstrap consistency of U*.

Theorem 2 Adaptive Bootstrap Consistency —

Assume the conditions of Theorem 1 are satisfied. When λn=o(n) and λn as n,

cnU*d*cn(α^S,nβ^M,nαSβM),

where cn is a non-random scaling factor satisfying

cn={n, when (αS,βM)(0,0)n, when (αS,βM)=(0,0). (9)

Theorem 2 suggests that under the original model (2), i.e. (bα,bβ)=(0,0), the AB statistic U* is a consistent bootstrap estimator for α^S,nβ^M,nαSβM with a proper scaling. Moreover, for any fixed targeted parameters (αS,βM), in their local neighbourhoods, i.e. (bα,bβ)(0,0), the bootstrap consistency still holds as a smooth function of (bα,bβ). Intuitively, this suggests that a small change in the target parameters does not affect the consistency property, and U* is ‘regular’ under the local model. In practice, without knowing which case is the true null we rely on U* as the bootstrap statistic for α^S,nβ^M,nαS,nβM,n generally. This strategy is viable because with a given finite sample size n, using nU* for bootstrapping n(α^S,nβ^M,nαS,nβM,n) is equivalent to using nU* for bootstrapping n(α^S,nβ^M,nαS,nβM,n). Therefore, as desired, U* will approximate well the distribution of α^S,nβ^M,nαS,nβM,n regardless of the underlying null case.

Remark 2

As a comparison, we also discuss the naive non-parametric bootstrap when (αS,βM)=(0,0). Specifically, we obtain the following expression (in Remark C.4 of the online supplementary material),

n(α^S,n*β^M,n*α^S,nβ^M,n)=Rn*(bα,bβ)+ZS,nZM,n*+ZM,nZS,n*, (10)

where ZS,n=Gn(ϵMS^)/VS,n, ZM,n=Gn(ϵYM^)/VM,n,  VS,n=Pn(S^2), and VM,n=Pn(M^2). In addition to the term Rn*(bα,bβ), equation (10) has two extra random terms ZS,nZM,n*+ZM,nZS,n*, which suggests that using equation (10) in the bootstrap would not be consistent. The issue of the classical bootstrap being inconsistent at (αS,βM)=(0,0) is circumvented by the proposed local bootstrap statistic Rn*(bα,bβ).

3.2.1. Adaptive bootstrap test procedure

We introduce a consistent bootstrap test procedure for α^S,nβ^M,n based on Theorem 2. Given a nominal level ω, let qω/2 and q1ω/2 denote the lower and upper ω/2 quantiles, respectively, of the bootstrap estimates U*. If α^S,nβ^M,n falls outside the interval (qω/2,q1ω/2), we reject the composite null (3), and conclude that the ME is statistically significant at the level ω. We clarify that the goal is to test the underlying true coefficients (αS,βM). The reason to consider their n-local coefficients (αS,n,βM,n) is merely for theoretical investigation of local asymptotic behaviours. Therefore, to test (3) under the original model (2), it suffices to calculate U* with bα=bβ=0. We point out that the rejection region in the adaptive procedure may also be constructed through the asymptotic distribution as an alternative to the bootstrap; nevertheless, the proposed bootstrap procedure is more flexible and does not rely on a particular form of the limiting distributions, and therefore, it can be easily extended under various mediation models; see more examples in Section 5.

3.2.2. Choice of the tuning parameters

Under the conditions of Theorem 2, which specify λn=o(n) and λn as n, we have limnPr(|Tα,n|>λn,|Tβ,n|>λnαS=βM=0)=0, suggesting that IαS,λnIβM,λn can provide a consistent test for αS=βM=0. If λn remains bounded as n, U* asymptotically reduces to α^S,n*β^M,n*α^S,nβ^M,n, i.e. the classical non-parametric bootstrap procedure, which may be conservative. In the simulation experiments, we set λn=λn/logn and find that a fixed constant λ, e.g. λ=2 can give a good performance. In general settings, we can choose the tuning parameter through the double bootstrap (Chen, 2016); see Section F.1 of the online supplementary material for more implementation details.

Remark 3

Our proposed adaptive procedures examine the non-regular asymptotic behaviours of test statistics through local models. In effect, the idea of local models may be traced back to econometrics (Andrews, 2001) and was utilised in other statistical problems, such as classification and post-selection inference (Laber & Murphy, 2011; McKeague & Qian, 2015, 2019; Wang et al., 2018). Nevertheless, we emphasise that there are unique statistical challenges of testing mediation effects. First, in terms of the parameter space, the null hypothesis of no ME is essentially a union of individual hypotheses. This results in a non-standard shape of the null parameter space, on which both regular and non-regular asymptotic behaviours can occur, as illustrated in Figure 2c. Second, in terms of the behaviour of the estimator, we unveil a fundamental zero-gradient phenomenon. This is caused by the special form of the product statistic and cannot be directly addressed by the existing adaptive procedures mentioned above. Third, in terms of the models, the mediation analysis involves a system of structural equations. Ignoring the model structure in the implementation could lead to slow computation; see Section F.2 of the online supplementary material for more details on computation. Due to these unique challenges, new developments in methodology, theory, and computation are necessary.

3.3. Adaptive bootstrap for the JS test

In addition to the Wald-type PoC test, we also address the non-regularity issue of the non-Wald JS/maxP test through our proposed adaptive bootstrap. It is noteworthy that non-regular behaviours of the JS and PoC tests under the singleton H0,3 are distinct, as the two statistics take different forms. Particularly, PoC statistic has the zero-gradient issue discussed above, whereas JS statistic has a certain inconsistent convergence issue. Despite that difference, we can similarly develop an adaptive bootstrap for the JS test and obtain uniformly distributed p-values. Refer to the detail in Section B of the online supplementary material. This suggests that our proposed adaptive bootstrap is not restricted to the Wald-type test and may be further generalised to other tests with similar circumstances.

3.4. On multivariate mediators

It is worth noting that the proposed strategy can be generalised to deal with multiple mediators under suitable identifiability conditions. In the following, we delve into three scenarios of practical importance.

  1. We consider the group-level joint ME via a set of mediators M=(M1,,MJ) shown by the red path in Figure 3a. This type of joint ME has been considered in the literature by Huang and Pan (2016) and Hao and Song (2023), among others. We generalise the AB method to test the joint ME in Section 5.1.

  2. We consider multiple mediators that are causally uncorrelated (Jérolon et al., 2020) or governed by the parallel path model (Hayes, 2017). In this case, the indirect effect of one single mediator can be identified under the known identifiability assumptions outlined in Imai and Yamamoto (2013). In particular, under the multivariate linear SEM (13) with no causal interplay between mediators, the null hypothesis of no individual indirect effect via one mediator, say, M1, could be formed as H0:αS,1βM,1=0, illustrated in Figure 3b. To apply the AB test to αS,1βM,1, we note that equation (13) can be equivalently rewritten as M1=αS,1S+XαX,1+ϵM,1, and Y=βM,1M1+M(1)β(1)+XβX+τSS+ϵY, where β(1)=(βM,2,,βM,J) and M(1)=(M2,,MJ). This form resembles equation (2), and the AB method in Section 3 can be employed to test αS,1βM,1=0 by adjusting (M(1),X) in the outcome model. We provide details including the identification assumptions in Section D.1.1 of the online supplementary material.

  3. When the mediators are causally correlated, evaluating individual indirect effects along different posited paths requires correct specification of the mediators’ causal structure (VanderWeele et al., 2014). To relax such stringent assumptions, researchers have proposed alternative methods, one of which is to examine the interventional indirect effects specific to each distinct mediator (Loh et al., 2021). Intuitively, the interventional indirect effect via a target mediator M1 is supposed to capture all of the exposure-outcome effects that are mediated by M1 as well as any other mediators causally preceding M1; see a diagram in Figure 3c. Under a typical class of linear and additive mean models, estimators of interventional indirect effects take the same product form of coefficients as that in the above case (ii). Thus, the proposed AB method in Section 3 can be similarly applied with little effort. We provide relevant details including the definition and identification assumptions of the interventional indirect effects in Section D.1.2 of the online supplementary material.

Figure 3.

Figure 3.

Path diagram of the mediation model with multiple mediators: dashed lines represent possible non-causal correlations or independence, and solid arrowed lines represent possible causal relationships. (a) Joint ME via a group of mediators (M1,,MJ). (b) Individual ME via one mediator M1. (c) Interventional ME via one mediator M1. In Panel (c), M1, pre represents mediators that are causally preceding M1.

4. Numerical experiments

In this section, we conduct simulation experiments to evaluate the finite-sample performance of the proposed adaptive bootstrap PoC and JS tests. Particularly, we generate data through the following model:

M=αSS+αI+αX,1X1+αX,2X2+ϵM,Y=βMM+βI+βX,1X1+βX,2X2+τSS+ϵY. (11)

In the model (11), the exposure variable S is simulated from the Bernoulli distribution with the success probability 0.5; the covariate X1 is continuous and simulated from a standard normal distribution N(0,1); the covariate X2 is discrete and simulated from the Bernoulli distribution with the success probability 0.5; two error terms ϵM and ϵY are simulated independently from N(0,σϵM2) and N(0,σϵY2), respectively. We set the parameters (αI,αX,1,αX,2)=(1,1,1), (βI,βX,1,βX,2)=(1,1,1), τS=1, and σϵY=σϵM=0.5. Moreover, we consider sample sizes n{200,500}, and set the bootstrap sample size at 500.

In simulation studies, we compare eight testing methods: the adaptive bootstrap for the PoC test (PoC-AB), the classical non-parametric bootstrap for the PoC test (PoC-B), Sobel’s test (PoC-Sobel), the adaptive bootstrap for the JS test (JS-AB), the classical non-parametric bootstrap for the JS test (JS-B), the MaxP test (JS-MaxP), the non-parametric bootstrap method in the causal mediation analysis R package Tingley et al. (2014) (CMA), and the method in Huang (2019a) (MT-Comp). It is noteworthy that Huang’s (2019a) MT-Comp made specific model assumptions, which are not fully compatible with our simulation settings, and we include this method just for the purpose of comparison. Some other methods (e.g. Dai et al., 2020; Liu et al., 2021) relied on estimating the relative proportions of the three cases, which is not directly applicable here and thus not included.

4.1. Null hypotheses: type I error rates

4.1.1. Setting 1: Under a fixed type of null

In the first setting, we simulate data under a fixed null hypothesis over 2,000 Monte Carlo replications to estimate the distribution of p-values. Particularly, we consider three types of null hypotheses below:

H0,1:(αS,βM)=(0,0.5),H0,2:(αS,βM)=(0.5,0),H0,3:(αS,βM)=(0,0). (12)

We draw the Q–Q plots with n=200 in Figure 4. Q–Q plots under n=500 are similar and presented in Figure G.1 of the online supplementary material. In Figure 4, three sub-figures in the first row present the results of the PoC tests under three fixed nulls H0,1,H0,2, and H0,3, respectively, and three sub-figures in the second row present the corresponding results of the JS tests, respectively.

Figure 4.

Figure 4.

Q–Q plots of p-values under the fixed null with n=200.

Figure 4 shows that for the PoC type of tests, under H0,1 or H0,2, the PoC-AB, the PoC-B, and the PoC-Sobel can correctly approximate the distribution of the PoC test statistic. However, under H0,3, the PoC-B and the PoC-Sobel become conservative, while the proposed PoC-AB still approximates the distribution of the PoC statistic well. Similarly, for the JS type of tests, under H0,1 or H0,2, the JS-AB, the JS-B, and the JS-MaxP all work well. In contrast, under H0,3, the JS-B inflates, and the JS-MaxP becomes conservative, while the JS-AB still exhibits a good performance. In addition, Figure 4 and Figure G.1 in the online supplementary material also display the results of both Huang (2019a)’s MT-Comp and the causal mediation analysis R package CMA (Tingley et al., 2014) for comparison. We observe that the MT-Comp properly controls the type I error under H0,3, but fails to do so under H0,2 and H0,3 with inflated type I errors. This may be because the models considered in Huang (2019a) are not compatible with our simulation settings. On the other hand, the causal mediation R package (Tingley et al., 2014) produces uniformly distributed p-values under H0,1 and H0,2, but is conservative under H0,3. This means that the R package CMA test is underpowered.

4.1.2. Setting 2: Under a random type of null

In the second setting, we simulate data over 2,000 Monte Carlo replications, where in each replication, the null hypothesis is not fixed but randomly selected from H0,1H0,3 in (12). Specifically, for (H0,1,H0,2,H0,3), we consider three selection probabilities (I) (1/3,1/3,1/3), (II) (0.2,0.2,0.6), and (III) (0.05,0.05,0.9), respectively. We provide Q–Q plots of p-values with n=200 in Figure 5, and Q–Q plots under n=500 are similar and provided in Figure G.2 of the online supplementary material. In Figure 5, three sub-figures in the first row present the results of the PoC tests with three null selection probabilities (I)–(III), respectively, and three sub-figures in the second row present the corresponding results of the JS test, respectively.

Figure 5.

Figure 5.

Q–Q plots of p-values under the mixture of nulls: n=200.

Figure 5 shows that the adaptive bootstrap procedures for the PoC and JS tests perform well under different settings. The PoC-B test, PoC-Sobel’s test, the JS-MaxP test, and the R package CMA (Tingley et al., 2014) are conservative, and they become more conservative as the probability of choosing H0,3 increases. We mention that in many biological studies such as genomics, H0,3 predominates the null cases, hence these tests that are conservative under H0,3 may not be preferred. Moreover, the JS-B test and the MT-Comp method can have inflated type I errors. The performance of JS-B becomes worse as the proportion of H0,3 rises, while the MT-Comp method deteriorates as the proportions of H0,1 and H0,2 increase.

4.2. Alternative hypotheses: statistical power

In this subsection, we evaluate the statistical power of the proposed AB tests under alternative hypotheses. Particularly, we simulate data under two settings: (I) fix αS=βM for the convenience of pictorial presentation, which takes various values beginning from zero and (II) fix the size of the ME αSβM, and vary the ratio αS/βM. In the setting (I), we consider n{200,500}, and then plot the empirical rejection rates, based on 500 Monte Carlo replications, vs. the signal size of αS, which is equal to βM in the setting (I). In the setting (II), we fix αSβM=0.04 when n=200, and αSβM=0.015 when n=500. Then we plot the empirical rejection rates vs. the ratio αS/βM. The results in the two settings (I) and (II) are shown in Figures 6 and 7, respectively.

Figure 6.

Figure 6.

Empirical rejection rates (power) vs. the signal strength of αS=βM.

Figure 7.

Figure 7.

Empirical rejection rates (power) vs. the ratio αS/βM.

Figures 6 and 7 show that for the three PoC tests, the PoC-AB has higher power than that of the classical non-parametric bootstrap, and both are more powerful than the Sobel’s test. Similarly, for the JS tests, the JS-AB has higher power than that of the classical bootstrap, and both have higher power than the MaxP test. In addition, the JS-B test has slightly inflated type I errors when (αS,βM)=(0,0), which is consistent with the results in Figure 4. Among the three classical methods (Sobel’s test, the MaxP test, and the PoC-B), the MaxP test seems to achieve the best balance between the type I error and the statistical power, while Sobel’s test has the lowest power. These findings are consistent with those reported in the current literature (Barfield et al., 2017; MacKinnon et al., 2002). Huang’s (2019a) MT-Comp test has shown seriously inflated type I errors in Figure 4, and therefore is not a fair competitor in our considered settings despite its high power. Overall, it is clear that the proposed PoC-AB and JS-AB tests are superior over these existing methods, with the most robust control of type I error and highest power.

5. Extensions

The adaptive bootstrap in Section 3 offers a general strategy that can be extended in a wide range of scenarios beyond the model (2). We next examine three examples, including testing the joint ME of multivariate mediators in Section 5.1, testing the ME in terms of odds ratio for a binary outcome in Section 5.2, and testing the ME in terms of risk difference when the outcome is continuous, and the mediator follows a generalised linear model in Section 5.3. In each scenario, we present details in the order of (1) Model, (2) Non-regularity issue, (3) Asymptotic theory and adaptive bootstrap, and (4) Numerical results.

5.1. Testing joint ME of multivariate mediators

When the number of mediators is large, it can also be of interest to conduct group-based mediation analyses for a set of mediators (Daniel et al., 2015; Hao and Song 2023; Huang & Pan, 2016; Sohn & Li, 2019; VanderWeele & Vansteelandt, 2014); also see a review in Blum et al. (2020). In this section, we show that the proposed AB method can be generalised to test joint MEs.

(1) Model. As an extension of equation (2), we consider the multivariate linear SEM (Hao and Song 2023; Huang & Pan, 2016; VanderWeele & Vansteelandt, 2014),

Mj=αS,jS+XαX,j+ϵM,j,Y=j=1JβM,jMj+XβX+τSS+ϵY, (13)

where X denotes a vector of confounders with the first element being 1 for the intercept, ϵY and ϵM:=(ϵM,1,,ϵM,J) are independent error terms with mean zero, var(ϵY)=σϵY2, and cov(ϵM)=ΣM. Assume identification conditions similar to those in Section 2 (see Condition D.2 in the online supplementary material). The joint ME through the group of mediators M is E{Y(s,M(s))Y(s,M(s*))}=(ss*)αSβM (Huang & Pan, 2016), where αS=(αS,1,,αS,J) and βM=(βM,1,,βM,J).

(2) Non-regularity issue. We are interested in H0: joint ME =0, which is equivalent to H0:αSβM=0. Similarly to Section 3, when (αS,βM)0, i.e. there exists at least one coefficients αS,j0 or βM,j0, we have (αSβM)/αS,j=βM,j0 or (αSβM)/βM,j=αS,j0. However, when (αS,βM)=0, i.e. αS,j=βM,j=0 for all j{1,,J}, (αSβM)/αS,j=(αSβM)/βM,j=0 for all j{1,,J}. We expect that a non-regularity issue similar to that in Section 3 would occur when (αS,βM)=0. This issue is also illustrated by numerical experiments in Section D.2.4 of the online supplementary material.

(3) Asymptotic theory and adaptive bootstrap. To better understand the non-regularity issue, we similarly consider a local linear SEM Mj=αS,j,nS+XαX,j+ϵM,j, and Y=j=1JβM,j,nMj  +XβX+τSS+ϵY, where αS,j,n=αS,j+n1/2bα,j and βM,j,n=βM,j+n1/2bβ,j.

Theorem 3 Asymptotic Property —

Under online supplementary Conditions D.2 and D.3 (the latter is a regularity condition on the design matrix similar to Condition 1), and the local model,

  1. when (αS,βM)0, n×(α^S,nβ^M,nαS,nβM,n)dαSZM+βMZS;

  2. when (αS,βM)=0, n×(α^S,nβ^M,nαS,nβM,n)db0,αZM+b0,βZS+ZSZM,

where (ZS,ZM) are defined to be multivariate counterparts of (ZS,ZM) in Theorem 1, and the detailed definitions are given in Section D.2.2 of the online supplementary material.

To present the theory of bootstrap consistency, we define the multivariate counterparts of Rn*(bα,bβ) in Section 3 as Rn*(bα,bβ). The detailed forms are given in Section D.2.3 of the online supplementary material. Similarly to U* in Section 3, we define the AB statistic under the multivariate setting as

U*=(α^S,n*β^M,n*α^S,nβ^M,n)×(1Iλn*)+n1Rn*(bα,bβ)×Iλn*,

where Iλn*=I{max{|Tα,j,n|,|Tα,j,n*|,|Tβ,j,n|,|Tβ,j,n*|:1jJ}λn}, where Tα,j,n=nα^S,j/σ^αS,j,n and Tβ,j,n=nβ^M,j,n/σ^βM,j,n denote the sample T-statistics of the two coefficients αS,j and βM,j, respectively, and Tα,j,n*=nα^S,j*/σ^αS,j,n* and Tβ,j,n*=nβ^M,j,n*/σ^βM,j,n* denote the bootstrap counterparts of the two sample T-statistics. We establish bootstrap consistency for the joint AB statistic U* below.

Theorem 4 Adaptive Bootstrap Consistency —

Under the conditions of Theorem 3, when the tuning parameter λn satisfies λn=o(n) and λn as n,  

cnU*d*cn(α^S,nβ^M,nαS,nβM,n),

where cn is specified as in equation (9).

Based on Theorem 4, we can develop an AB test similar to that in Section 3.

(4) Numerical results. To evaluate the performance of the joint AB test, we conduct numerical experiments, detailed in Section D.2.4 of the online supplementary material. We compare the AB test with the classical bootstrap and two tests in Huang and Pan (2016): the product test based on normal product distribution (PT-NP) and the product test based on normality (PT-N). We observe results similar to those in Section 4. Specifically, under H0:αSβM=0, when (αS,βM)0, both the proposed AB test and the compared methods yield uniformly distributed p-values. However, when (αS,βM)=0, the compared methods become overly conservative, whereas the AB test still produces uniformly distributed p-values. Under HA, the AB test can achieve higher empirical power than the compared methods. Besides simulations, we also provide an exemplary data analysis in Section G.3.2 of the online supplementary Material.

5.2. Non-linear Scenario I: Binary outcome and general mediator

(1) Model. Suppose the outcome is binary, and consider the model

P(Y=1S,M,X)=logit1(βMM+XβX+τSS),E(MS,X)=h1(αSS+XαX), (14)

where h1() is the inverse of a canonical link function in generalised linear models. Under Model (14), since the outcome is binary, it is conventional to define the ME as the odds ratio (VanderWeele & Vansteelandt, 2010). Specifically, under the identification assumption given in Section 2, the conditional natural indirect effect (ME) can be identified as

ORss*NIE(s,x)=P{Y(s,M(s))=1X=x}/{1P(Y(s,M(s))=1X=x)}P{Y(s,M(s*))=1X=x}/{1P{Y(s,M(s*))=1X=x}},

where M(s) denotes the potential value of the mediator under the exposure S=s, and Y(s,m) denotes the potential outcome that would have been observed if S and M had been set to s and m, respectively. Under H0 of no ME,

H0:ORss*NIE(s,x)=1logORss*NIE(s,x)=0P{Y(s,M(s))=1X=x}P{Y(s,M(s*))=1X=x}=0, (15)

where the second equivalence follows from the strict increasing monotonicity of the function x/(1x) when 0<x<1.

Remark 4

We consider natural indirect/MEs conditioning on covariates X=x following VanderWeele and Vansteelandt (2010). Alternatively, Imai et al. (2010) proposed to examine the average NIE that marginalises the distribution of X. Examining the conditional NIE is mainly for technical convenience. The conditional NIE =0 for all x can give a sufficient condition for the average NIE =0. Conclusions of conditional NIE may be obtained for average NIE similarly. Please see Remark E.1 in the online supplementary material for more details.

(2) Non-regularity issue. The null hypothesis of no ME (15) looks different from H0:αSβM=0 under the linear SEMs in Section 2. Nevertheless, we can show that the non-regularity issue similar to that in Section 2 would still arise. This is formally stated as Proposition 5 below.

Proposition 5

Under the model (14), online supplementary Condition E.1 (a general regularity condition on the link function h1() and the distribution of M), and identification conditions in Section 2,

  1. H0 (15) holds for ss* if and only if αS=0 or βM=0.

  2. For simplicity of notation, let NIE be a shorthand for logORss*NIE(s,x). We have

    1. NIEαS|βM=0=NIEβM|αS=0=0, (ii) NIEαS|αS=0,βM00, (iii) NIEβM|αS0,βM=00.

It is interesting to see that even though the conditional ME ORss*NIE(s,x) does not take a product form, a non-regularity issue caused by zero gradient can still arise under H0 in equation (15), which is similar to the PoC statistic in Section 3. Specifically, Proposition 5 implies that when αS=βM=0, the first-order Delta method cannot be directly applied to the inference of NIE, which is different from the scenarios when αS0 or βM0. Therefore, we expect that the ordinary estimator of NIE can behave differently under different types of null hypotheses, and a non-regularity issue can occur. This phenomenon is indeed demonstrated by numerical experiments in Section E.2 of the online supplementary material.

(3) Asymptotic theory and adaptive bootstrap. For ease of presentation, we next derive asymptotic theory under a special case of equation (14), where the mediator is binary and follows a logistic regression model. We point out that the analysis in this section can be readily extended to cases where the mediator M follows a linear model or other canonical generalised linear models. Specifically, let M and Y be Bernoulli random variables with mean values in equation (14), and h1(x)=logit1(x). In this case, logORss*NIE(s,x)=l(Ps)l(Ps*), where Ps:=P{Y(s,M(s))=1  X=x}, Ps*:=P{Y(s,M(s*))=1X=x}, and l(x)=logx1x. Similarly to Section 3, we are interested in understanding how the local limiting behaviours of αS and βM coefficients change. To this end, we consider a general local logistic model:

E(MS,X)=g(αS,nS+XαX),E(YS,M,X)=g(βM,nM+XβX+τSS), (16)

where αS,n=αS+bα/n, βM,n=βM+bβ/n, and g(x)=logit1(x)=ex/(1+ex). Under the local model (16), we have for ι{s,s*},

Pι=g(ι×αS,n+xαX)×dβ,n+P*, (17)

where dβ,n=g(βM,n+xβX+τSs)g(xβX+τSs) and P*=g(xβX+τSs). (Please see the proof of Theorem 6 for the derivations.) For simplicity of notation, let NIE be a shorthand of logORss*NIE(s,x), and by equation (15), H0 NIE =0. Let NIE^=l(P^s)l(P^s*) denote an estimator of NIE, where P^s and P^s* are defined similarly to equation (17) with (αS,n,αX,βM,n,βX,τS) replaced by their corresponding sample regression coefficient estimators (α^S,α^X,β^M,β^X,τ^S).

Theorem 6 Asymptotic Property —

Assume Ps and Ps*(0,1) and Condition E.2 in the online supplementary material (a regularity condition on the design matrix similar to Condition 1). Under the local model (16) and H0:  αSβM=0,

  1. when (αS,βM)0, n(NIE^NIE)d(dαZβ+dβZα)γ0;

  2. when (αS,βM)=0, n(NIE^NIE)d(dbαZβ+dbβZα+ZαZβ)γ0,

where dα=g(αSs+xαX)g(αS,ns*+xαX), dβ=g(βM+xβX+τSs)  g(xβX+τSs), dbα=g(xαX)(ss*)bα, dbβ=g(xβX+τSs)(ss*)bβ, (Zα,Zβ) represent bivariate mean-zero normal distributions specified in online supplementary Lemma E.2, and γ0={P*(1P*)}1 is a non-zero constant with P* given in equation (17).

We next study consistency of bootstrap estimators. Let NIE^* denote the classical non-parametric bootstrap estimator of NIE. Specifically, NIE^*=l(P^s*)l(P^s**), where P^s*, and P^s** are defined similarly to equation (17) with (αS,n,αX,βM,n,βX,τS) replaced by their classical non-parametric bootstrap estimators (α^S*,α^X*,β^M*,β^X*,τ^S*). Motivated by Theorem 6, we define the AB statistic

Ue,1*=(NIE^*NIE^)×(1IαS,λn*IβM,λn*)+n1(dbαZβ*+dbβZα*+ZαZβ*)γ^0*×IαS,λn*IβM,λn*,

where IαS,λn* and IβM,λn* are defined similarly to (8). The following theorem proves consistency of the AB statistic Ue,1*, based on which we can develop an AB test similar to that in Section 3.

Theorem 7 Adaptive Bootstrap Consistency —

Under the conditions of Theorem 6, when the tuning parameter λn satisfies λn=o(n) and λn as n,  cnUe,1*d*cn(NIE^NIE), where cn is specified as in equation (9).

(4) Numerical results. We conduct simulation studies to compare the AB and the classical non-parametric bootstrap under the model (14). The detailed results are provided in Section E.2 of the online supplementary material. Our findings align closely with those presented in Section 4. Specifically, under H0:αSβM=0, when (αS,βM)0, both the proposed AB test and the classical non-parametric bootstrap yield uniformly distributed p-values. However, when (αS,βM)=0, the classical bootstrap becomes overly conservative, whereas the AB test still yields uniformly distributed p-values. Under HA, the AB test can achieve higher empirical power than the classical bootstrap.

5.3. Non-linear Scenario II: Linear outcome and general mediator

(1) Model. Suppose the outcome follows a linear model, and consider

E(MS,X)=h1(αSS+XαX),E(YS,M,X)=βMM+XβX+τSS, (18)

where h1() can be the inverse of a canonical link function. Similarly to the non-linear Scenario I, we examine the conditional natural indirect effect/ME defined as the risk difference:

NIEss*(s,x):=E{Y(s,M(s))Y(s,M(s*))X=x}=βM{h1(αSs+xαX)h1(αSs*+xαX)}. (19)

(2) Non-regularity issue. We are interested in testing H0:NIEss*(s,x)=0, which looks different from H0:αSβM=0 in Section 2. Nevertheless, we can show that the non-regularity issue similar to that in Section 2 would arise. This is formally stated as Proposition 8 below.

Proposition 8

Under the model (18), assume h1() is strictly monotone, and the identification conditions in Section 2 hold. Let NIE be a shorthand for NIEss*(s,x) in equation (19). Then

  1. H0:  NIE=(19) = 0 holds if and only if αS=0 or βM=0.

  2. (i) NIEαS|βM=0=NIEβM|αS=0=0. (ii) NIEαS|αS=0,βM00. (iii) NIEβM|αS0,βM=00.

Similarly to Proposition 5, Proposition 8 implies that a non-regularity issue caused by zero gradient would arise under H0. Specifically, the ordinary estimator of NIE can behave differently when αS=βM=0, and when one of αS and βM0. This is similar to the PoC statistic in Section 3 and the odds ratio in Section 5.2.

(3) Asymptotic theory and adaptive bootstrap. For ease of presentation, we next derive asymptotic theory under a specific instance of equation (18). Specifically, the mediator M is a Bernoulli random variable with its conditional mean given in equation (14) and h1(x)=logit1(x)=ex/(1+ex), and the outcome Y follows the linear model in equation (2). The analysis in this section can be readily extended when the mediator M follows other canonical generalised linear models. As we are interested in how the local limiting behaviour of αS and βM coefficients change, we consider the following general local model:

E(MS,X)=logit1(αS,nS+XαX),Y=βM,nM+XβX+τSS+ϵY, (20)

where αS,n=αS+bα/n, and βM,n=βM+bβ/n.

Theorem 9 Asymptotic Property —

Assume Condition E.3 in the online supplementary material (a regularity condition on the design matrix similar to Condition 1). Under model (20),

  1. when (αS,βM)0, n(NIE^NIE)ddαZβ+βMZα;

  2. when (αS,βM)=0, n(NIE^NIE)ddbαZβ+bβZα+ZαZβ,

where dαS=g(αSs+xαX)g(αS,ns*+xαX), dbα=g(xαX)(ss*)bα, Zα represents a normal distribution specified in online supplementary Lemma E.2, and Zβ is redefined to be a mean-zero normal distribution with a covariance same as the random vector VM1ϵYM, where VM and M are defined in Theorem 1.

We next establish bootstrap consistency theory. Similarly to Section 5.2, let NIE^* denote the non-parametric bootstrap estimator of NIE. In particular, we redefine NIE^*=β^M*{g(α^S*s+  xα^X*)g(α^S*s*+xα^X*)}, where (α^S*,α^X*,β^M*) denotes the classical non-parametric bootstrap estimators of (αS,αX,βM). Motivated by Theorem 9, we define the AB statistic

Ue,2*=(NIE^*NIE^)(1IαS,λn*IβM,λn*)+n1(dbαZβ*+bβZα*+Zα*Zβ*)IαS,λn*IβM,λn*,

where IαS,λn* and IβM,λn* are defined similarly to equation (8). The following theorem establishes consistency of the AB statistic Ue,2*.

Theorem 10 Adaptive Bootstrap Consistency —

Under conditions of Theorem 9, when the tuning parameter λn satisfies λn=o(n) and λn as n,  cnUe,2*d*cn(NIE^NIE), where cn is specified as in equation (9).

(4) Numerical results. We conduct simulation studies to compare the AB and the classical non-parametric bootstrap under the model (18). The detailed results are provided in Section E.2 of the online supplementary material. The obtained results are very similar to those in Sections 4 and 5.2 Part 4, and therefore, we refrain from repeating the details here.

6. Data analysis

We illustrate an application of our proposed method to the analysis of data from a cohort study ‘Early Life Exposures in Mexico to ENvironmental Toxicants’ (ELEMENT) (Perng et al., 2019). One of the central interests in this scientific study concerns the MEs of metabolites, in particular, the family of lipids, on the association between environmental exposure and children growth and development. In the literature of environmental health sciences, exposure to endocrine-disrupting chemicals (EDCs) such as phthalates have been found to be detrimental to children’s health outcomes. Such findings of direct associations need to be further assessed for possible MEs through metabolites, because environmental toxicants such as phthalates can alter metabolic profiles at the molecular level.

Our illustration focuses on the outcome of body mass index (BMI) and exposure to one phthalate, MEOHP (Mono-(2-ethyl-5-oxohexyl) phthalate), which is a chemical in food production and storage. Body mass index is a widely used biomarker in paediatric research to measure childhood obesity. The dataset contains 382 adolescents aged 10–18 years old living in Mexico City. Our mediation analysis involves a set of 149 lipids that are hypothesised to have potential MEs on children’s growth and development. Our goal is to identify the mediation pathways of exposure to MEOHP lipids BMI. Two key potential confounders, gender and age, are included throughout the analyses. It is worth noting that adjusting for gender and age may not be sufficient for proper confounding adjustments. To conduct a more plausible causal analysis and interpretation, a further investigation is deemed necessary to rigorously assess the underlying causal assumptions such as a sensitivity analysis for the sequential ignorability assumption. In our analyses, we compare the results of six tests: JS-AB, JS-MaxP, PoC-AB, PoC-B, PoC-Sobel, and CMA, which have been compared in our simulation studies in Section 4. In particular, all the bootstrap methods (including JS-AB, PoC-AB, PoC-B, CMA) are conducted based on 104 bootstrap resamples. Here, we no longer include the JS-B test and the MT-Comp method, as they are known to have inflated type I errors according to our simulation studies in Section 4.

As the sample size is limited compared to the large number of mediators, we first apply a screening analysis to identify a subset of lipids as potential candidates. We then jointly model the chosen lipids in the second step of our analysis. To mitigate the potential issues arising from double dipping the data, we adopt a random data-splitting approach by dividing the dataset into two distinct parts, each dedicated to one of the two respective analytic tasks. In the first screening step, we examine the effect along the path MEOHP lipid BMI for one lipid at a time, and the corresponding p-values are obtained with the six tests, respectively. For each test, we select a proportion (q%) of lipids with the smallest p-values. The second step examines the path MEOHP selected lipids BMI, with the selected lipids being modelled jointly. To test the ME through a target lipid M within the selected set, we adjust for non-target mediators within the outcome model, following the discussions in Section 3.4; please see more details in Section G.3.3 of the online supplementary material. Subsequently, we select lipids based on their p-values obtained in the second step, after adjusting for multiple comparisons with controlled FDR (Benjamini & Hochberg, 1995). In our analysis, we explore a range of q values {5,10,15,20,25} and observe very similar results, indicating the robustness of our approach to the choice of the screening threshold in the first step. We next present the results obtained with q=10 (i.e. 15 selected lipids based on their p-values), while results for other q values are detailed in Section G.3.1 of the online supplementary material.

As an illustrative example, we first present the results from a single random split in Table 1. Table 1 provides the corresponding p-values for the lipids selected by at least one test in the second step of the analysis. In this instance, the non-AB tests fail to detect any lipids. In contrast, the PoC-AB test identifies lauric acid (L.A) and FA.7.0-OH_1 (FA.7) while controlling the FDR at 0.10, and the JS-AB test selects both L.A and FA.7 when the FDR is controlled at 0.05 and 0.10, respectively. To gauge the variability of results across random splits, we repeat the data-splitting analysis 400 times. As shown in Figure 8, L.A and FA.7 are the two most frequently selected mediators in our analysis. Furthermore, the AB tests exhibit a notably higher chance of selecting L.A compared to the non-AB tests. This aligns with our observations from simulations in Section 4, suggesting that the AB tests can attain higher power than their non-AB counterparts. Lauric acid is a saturated fatty acid and is found in many vegetable fats and in coconut and palm kernel oils (Dayrit, 2015). The results suggest that the exposure to MEOHP may influence the process of breaking down fat tissue in the human body, leading to obesity and other adverse health outcomes.

Table 1.

Lipids selected in the second step

Lipids JS-AB JS-MaxP PoC-AB PoC-B PoC-Sobel CMA
L.A 0.0017 (×*) 0.0399 0.0043 (*) 0.0406 0.1254 0.0426
FA.7 0.0008 (×*) 0.0146 0.0090 (*) 0.0236 0.0937 0.0208

Note. L.A = LAURIC.ACID; FA.7 = FA.7.0-OH_1; CMA = R package "causal mediation analysis". p-values with (×) and (*) indicate that the lipid specified by the row is selected by the method specified by the column under 0.05 and 0.10 FDR levels, respectively.)

Figure 8.

Figure 8.

Times of mediators being selected in Step 2 by the six tests with FDR =0.10 over 400 random splits of the data. FDR = false discovery rate.

Since the first screening step considers one mediator at a time, we also conduct sensitivity analyses to evaluate the effects of the unadjusted mediators similarly to Liu et al. (2021). We use the procedure proposed by Imai et al. (2010), which utilised the idea that the error term in the M-S model and that in the Y-M model are likely to be correlated if the sequential ignorability assumption is violated and vice versa. The detailed results are provided in Section G.3.4 in the online supplementary material. As a brief summary, the sensitivity analysis suggests that our first screening analysis could be robust to unadjusted mediators.

7. Discussion

This paper proposes a new adaptive framework for testing composite null hypotheses in mediation pathway analysis. The method incorporates a consistent pre-test threshold into the bootstrap procedure, which helps circumvent the non-regularity issue arising from the composite null hypotheses. If at least one of the two coefficients is significant, the procedure would reduce to the classical non-parametric bootstrap; otherwise, it approximates the local asymptotic behaviour of the statistics. Our proposed strategy accommodates different types of null hypotheses under various models. Particularly, we have established similar results for both the individual and joint MEs under classical linear SEMs, and we have generalised the conclusions under generalised linear models. Through comprehensive simulation studies, we have demonstrated that the adaptive tests can properly and robustly control the type I error under different types of null hypotheses and improve the statistical power.

The proposed methodology offers an exemplary analytic toolbox that can be broadly extended to handle other problems of similar types involving composite null hypotheses. There are several interesting future research directions that are worth exploration. First, the non-regularity issue can similarly arise in other scenarios, such as survival analysis (Huang & Pan, 2016; VanderWeele, 2011), different data types (Sohn & Li, 2019), partially linear models (Hines et al., 2021), and models with exposure–mediator interactions; see more discussions in Section H of the online supplementary material. These complicated models require special care in the causal interpretation of MEs and in the implementation of the bootstrap procedure, warranting further investigation. Second, when the dimension of mediators and covariates becomes high, it is of interest to extend the adaptive bootstrap under high-dimensional mediation models for both individual and joint MEs (Zhou et al., 2020). Similarly to our discussions on adjusting multivariate mediators at the end of Section 3, we might apply the adaptive bootstrap after properly adjusting high-dimensional covariates. In the data analysis, we have applied the marginal screening to reduce the dimension of mediators, which might potentially overlook the complicated causal dependence among mediators. When mediators have potential causal dependence, Shi and Li (2022) proposed to first estimate a directed acyclic graph of mediators and develop a testing procedure that can control the type I error to be less than or equal to the nominal level. It would be of interest to extend our proposed AB under such settings to mitigate potential conservatism. Third, the proposed AB strategy can also be utilised to examine the replicability across independent studies (Bogomolov & Heller, 2018), which is fundamental to scientific discovery. Specifically, let βi,i=1,,K, denote the true signals from K independent studies, respectively. Testing whether the signals in these K studies are all significant corresponds to H0:i=1Kβi=0 vs. HA:i=1Kβi0. Moreover, for two studies with true signals β1 and β2, to investigate whether the effects of both studies are significant in the same direction, one can formulate the hypothesis testing problem as H0:β1β20 vs. HA:β1β2>0. For these testing problems, the null hypotheses are composite. To properly control the type I error, the adaptive strategy proposed in this paper may serve as a valuable building block, while additional effort is needed to analyse those different cases carefully. Last, in our data analysis, all measurements are obtained cross-sectionally at one given clinical visit within a time window of approximately three months. To further study potential long-term influences of toxicant exposures, it may be of interest to investigate how the MEs might vary over time. Such time-varying MEs may be naturally analysed in the scenario of longitudinal studies that collect time-varying measurements. This is a very challenging research field with only minimal investigation in the current literature (Bind et al., 2016). Extending the proposed AB method to analyse time-varying MEs would be a compelling future direction.

Supplementary Material

qkad129_Supplementary_Data

Contributor Information

Yinqiu He, Department of Statistics, University of Wisconsin, Madison, WI, USA.

Peter X K Song, Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.

Gongjun Xu, Department of Statistics, University of Michigan, Ann Arbor, MI, USA.

Acknowledgments

We are grateful to the joint editors, Dr. Daniela Witten and Dr. Aurore Delaigle, an associate editor, and three anonymous referees for their helpful comments and suggestions. This work is partially supported by NSF DMS-1811734, DMS-2113564, SES-1846747, SES-2150601, NIH R01ES024732, R01ES033656, and Wisconsin Alumni Research Foundation.

Data availability

Due to privacy restrictions, we are unable to directly share the raw data publicly but they may be obtained offline according to a formal data request procedure outlined in the University of Michigan Data Use Agreement protocol. To satisfy the need of reproducibility, instead, we have introduced a pseudo-dataset with added noise on the GitHub repository: He et al. (2023).

Supplementary material

Supplementary material is available online at Journal of the Royal Statistical Society: Series B.

References

  1. Andrews  D. W. (2001). Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica, 69(3), 683–734. 10.1111/1468-0262.00210 [DOI] [Google Scholar]
  2. Barfield  R., Shen  J., Just  A. C., Vokonas  P. S., Schwartz  J., Baccarelli  A. A., VanderWeele  T. J., & Lin  X. (2017). Testing for the indirect effect under the null for genome-wide mediation analyses. Genetic Epidemiology, 41(8), 824–833. 10.1002/gepi.22084 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baron  R. M., & Kenny  D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. 10.1037/0022-3514.51.6.1173 [DOI] [PubMed] [Google Scholar]
  4. Basu, D. (1980). Randomization analysis of experimental data: The Fisher randomization test. Journal of the American statistical association, 75(371), 575–582. [Google Scholar]
  5. Benjamini  Y., & Hochberg  Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300. 10.1111/j.2517-6161.1995.tb02031.x [DOI] [Google Scholar]
  6. Bind  M.-A., Vanderweele  T., Coull  B., & Schwartz  J. (2016). Causal mediation analysis for longitudinal data with exogenous exposure. Biostatistics, 17(1), 122–134. 10.1093/biostatistics/kxv029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Blum  M. G., Valeri  L., François  O., Cadiou  S., Siroux  V., Lepeule  J., & Slama  R. (2020). Challenges raised by mediation analysis in a high-dimension setting. Environmental Health Perspectives, 128(5), 055001. 10.1289/EHP6240 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bogomolov  M., & Heller  R. (2018). Assessing replicability of findings across two studies of multiple features. Biometrika, 105(3), 505–516. 10.1093/biomet/asy029 [DOI] [Google Scholar]
  9. Chen  S. X. (2016). Peter Hall’s contributions to the bootstrap. The Annals of Statistics, 44, 1821–1836. 10.1214/16-AOS1489 [DOI] [Google Scholar]
  10. Dai  J. Y., Stanford  J. L., & LeBlanc  M. (2020). A multiple-testing procedure for high-dimensional mediation hypotheses. Journal of the American Statistical Association, 1–16. 10.1080/01621459.2020.1765785 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Daniel  R., De Stavola  B., Cousens  S., & Vansteelandt  S. (2015). Causal mediation analysis with multiple mediators. Biometrics, 71(1), 1–14. 10.1111/biom.v71.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Dayrit  F. M. (2015). The properties of lauric acid and their significance in coconut oil. Journal of the American Oil Chemists’ Society, 92(1), 1–15. 10.1007/s11746-014-2562-7 [DOI] [Google Scholar]
  13. Derkach  A., Moore  S. C., Boca  S. M., & Sampson  J. N. (2020). Group testing in mediation analysis. Statistics in Medicine, 39(18), 2423–2436. 10.1002/sim.v39.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Djordjilović  V., Hemerik  J., & Thoresen  M. (2020). ‘On optimal two-stage testing of multiple mediators’, arXiv, arXiv:2007.02844, preprint. [DOI] [PMC free article] [PubMed]
  15. Djordjilović  V., Page  C. M., Gran  J. M., Nøst  T. H., Sandanger  T. M., Veierød  M. B., & Thoresen  M. (2019). Global test for high-dimensional mediation: Testing groups of potential mediators. Statistics in Medicine, 38(18), 3346–3360. 10.1002/sim.8199 [DOI] [PubMed] [Google Scholar]
  16. Drton  M., & Xiao  H. (2016). Wald tests of singular hypotheses. Bernoulli, 22(1), 38–59. 10.3150/14-BEJ620 [DOI] [Google Scholar]
  17. Du  J., Zhou  X., Hao  W., Liu  Y., Jennifer  S., & Mukherjee  B. (2022). ‘Methods for large-scale single mediator hypothesis testing: Possible choices and comparisons’, arXiv, arXiv:2203.13293, preprint.
  18. Dufour  J.-M., Renault  E., & Zinde-Walsh  V. (2013). ‘Wald tests when restrictions are locally singular’, arXiv, arXiv:1312.0569, preprint.
  19. Fritz  M. S., & MacKinnon  D. P. (2007). Required sample size to detect the mediated effect. Psychological Science, 18(3), 233–239. 10.1111/j.1467-9280.2007.01882.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Fulcher  I. R., Shi  X., & Tchetgen  E. J. T. (2019). Estimation of natural indirect effects robust to unmeasured confounding and mediator measurement error. Epidemiology, 30(6), 825–834. 10.1097/EDE.0000000000001084 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Glonek  G. (1993). On the behaviour of Wald statistics for the disjunction of two regular hypotheses. Journal of the Royal Statistical Society: Series B (Methodological), 55(3), 749–755. 10.1111/j.2517-6161.1993.tb01938.x [DOI] [Google Scholar]
  22. Guo  X., Li  R., Liu  J., & Zeng  M. (2022). High-dimensional mediation analysis for selecting DNA methylation loci mediating childhood trauma and cortisol stress reactivity. Journal of the American Statistical Association, 1–32. 10.1080/01621459.2022.205313635757777 [DOI] [Google Scholar]
  23. Hao  W., & Song  P. X.-K. (2023). A simultaneous likelihood test for joint mediation effects of multiple mediators. Statistica Sinica, 33(4), 2305–2326. [Google Scholar]
  24. Hayes  A. F. (2017). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Publications. [Google Scholar]
  25. He  Y., Song  P. X.-K., & Xu  G. (2023). ABtest. https://github.com/yinqiuhe/ABtest.
  26. Hines  O., Vansteelandt  S., & Diaz-Ordaz  K. (2021). Robust inference for mediated effects in partially linear models. Psychometrika, 86, 595–618. 10.1007/s11336-021-09768-z [DOI] [PubMed] [Google Scholar]
  27. Huang  Y.-T. (2018). Joint significance tests for mediation effects of socioeconomic adversity on adiposity via epigenetics. The Annals of Applied Statistics, 12(3), 1535–1557. 10.1214/17-AOAS1120 [DOI] [Google Scholar]
  28. Huang  Y.-T. (2019a). Genome-wide analyses of sparse mediation effects under composite null hypotheses. The Annals of Applied Statistics, 13, 60–84. 10.1214/18-AOAS1181 [DOI] [Google Scholar]
  29. Huang  Y.-T. (2019b). Variance component tests of multivariate mediation effects under composite null hypotheses. Biometrics, 75(4), 1191–1204. 10.1111/biom.v75.4 [DOI] [PubMed] [Google Scholar]
  30. Huang  Y.-T., & Cai  T. (2016). Mediation analysis for survival data using semiparametric probit models. Biometrics, 72(2), 563–574. 10.1111/biom.v72.2 [DOI] [PubMed] [Google Scholar]
  31. Huang  Y.-T., & Pan  W.-C. (2016). Hypothesis test of mediation effect in causal mediation model with high-dimensional continuous mediators. Biometrics, 72(2), 402–413. 10.1111/biom.v72.2 [DOI] [PubMed] [Google Scholar]
  32. Imai  K., Keele  L., & Tingley  D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309–334. 10.1037/a0020761 [DOI] [PubMed] [Google Scholar]
  33. Imai  K., Keele  L., & Yamamoto  T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25(1), 51–71. 10.1214/10-STS321 [DOI] [Google Scholar]
  34. Imai  K., & Yamamoto  T. (2013). Identification and sensitivity analysis for multiple causal mechanisms: Revisiting evidence from framing experiments. Political Analysis, 21(2), 141–171. 10.1093/pan/mps040 [DOI] [Google Scholar]
  35. Imbens  G. W., & Rubin  D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press. [Google Scholar]
  36. Jérolon  A., Baglietto  L., Birmelé  E., Alarcon  F., & Perduca  V. (2020). Causal mediation analysis in presence of multiple mediators uncausally related. The International Journal of Biostatistics, 17(2), 191–221. 10.1515/ijb-2019-0088 [DOI] [PubMed] [Google Scholar]
  37. Laber  E. B., & Murphy  S. A. (2011). Adaptive confidence intervals for the test error in classification. Journal of the American Statistical Association, 106(495), 904–913. 10.1198/jasa.2010.tm10053 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Liu  Z., Shen  J., Barfield  R., Schwartz  J., Baccarelli  A. A., & Lin  X. (2021). Large-scale hypothesis testing for causal mediation effects with applications in genome-wide epigenetic studies. Journal of the American Statistical Association, 117(537), 67–81. 10.1080/01621459.2021.1914634[AQ11] [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Loh  W. W., Moerkerke  B., Loeys  T., & Vansteelandt  S. (2021). Disentangling indirect effects through multiple mediators without assuming any causal structure among the mediators. Psychological Methods, 27(6), 982–999. 10.1037/met0000314[AQ12] [DOI] [PubMed] [Google Scholar]
  40. MacKinnon  D. (2008). Introduction to statistical mediation analysis. Multivariate applications book series. Taylor & Francis. [Google Scholar]
  41. MacKinnon  D. P., & Fairchild  A. J. (2009). Current directions in mediation analysis. Current Directions in Psychological Science, 18(1), 16–20. 10.1111/j.1467-8721.2009.01598.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. MacKinnon  D. P., Lockwood  C. M., Hoffman  J. M., West  S. G., & Sheets  V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7(1), 83–104. 10.1037/1082-989X.7.1.83 [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. MacKinnon  D. P., Lockwood  C. M., & Williams  J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1), 99–128. 10.1207/s15327906mbr3901_4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. McKeague  I. W., & Qian  M. (2015). An adaptive resampling test for detecting the presence of significant predictors. Journal of the American Statistical Association, 110(512), 1422–1433. 10.1080/01621459.2015.1095099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. McKeague  I. W., & Qian  M. (2019). Marginal screening of 2×2 tables in large-scale case-control studies. Biometrics, 75(1), 163–171. 10.1111/biom.v75.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Miles  C., & Chambaz  A. (2021). ‘Optimal tests of the composite null hypothesis arising in mediation analysis’, arXiv, arXiv:2107.07575, preprint .
  47. Pearl  J. (2001). Direct and indirect effects. In Probabilistic and causal inference: the works of Judea Pearl (pp. 373–392).
  48. Perng  W., Tamayo-Ortiz  M., Tang  L.Sánchez B. N., Cantoral A., Meeker J. D., Dolinoy D. C., Roberts E. F., Martinez-Mier E. A., Lamadrid-Figueroa H., Song P. X. K., Ettinger A. S., Wright R., Arora M., Schnaas L., Watkins D. J., Goodrich J. M., Garcia R. C., Solano-Gonzalez M., Peterson K. E. (2019). The early life exposure in Mexico to environmental toxicants (ELEMENT) project. British Medical Journal Open, 9(8), 10.1136/bmjopen-2019-030427 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Robins  J. M., & Greenland  S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3(2), 143–155. 10.1097/00001648-199203000-00013 [DOI] [PubMed] [Google Scholar]
  50. Sampson  J. N., Boca  S. M., Moore  S. C., & Heller  R. (2018). FWER and FDR control when testing multiple mediators. Bioinformatics, 34(14), 2418–2424. 10.1093/bioinformatics/bty064 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Shi  C., & Li  L. (2022). Testing mediation effects using logic of boolean matrices. Journal of the American Statistical Association, 117(540), 2014–2027. 10.1080/01621459.2021.1895177 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Sobel  M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13, 290–312. 10.2307/270723 [DOI] [Google Scholar]
  53. Sohn  M. B., & Li  H. (2019). Compositional mediation analysis for microbiome studies. The Annals of Applied Statistics, 13(1), 661–681. 10.1214/18-AOAS1210 [DOI] [Google Scholar]
  54. Tingley  D., Yamamoto  T., Hirose  K., Keele  L., & Imai  K. (2014). Mediation: R package for causal mediation analysis. Journal of Statistical Software, 59(5), 1–38. 10.18637/jss.v059.i0526917999 [DOI] [Google Scholar]
  55. Valeri  L., & VanderWeele  T. J. (2013). Mediation analysis allowing for exposure–mediator interactions and causal interpretation: Theoretical assumptions and implementation with SAS and SPSS macros. Psychological Methods, 18(2), 137–150. 10.1037/a0031034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. van der Vaart  A. W. (2000). Asymptotic statistics. Cambridge Series in Statistical and Probabilistic Mathematics (Vol. 3). Cambridge University Press. [Google Scholar]
  57. VanderWeele  T. J. (2011). Causal mediation analysis with survival data. Epidemiology, 22(4), 582–585. 10.1097/EDE.0b013e31821db37e [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. VanderWeele  T. J. (2015). Explanation in causal inference: Methods for mediation and interaction. Oxford University Press. [Google Scholar]
  59. VanderWeele  T. J., & Vansteelandt  S. (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and its Interface, 2(4), 457–468. 10.4310/SII.2009.v2.n4.a7 [DOI] [Google Scholar]
  60. VanderWeele  T. J., & Vansteelandt  S. (2010). Odds ratios for mediation analysis for a dichotomous outcome. American Journal of Epidemiology, 172(12), 1339–1348. 10.1093/aje/kwq332 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. VanderWeele  T. J., & Vansteelandt  S. (2014). Mediation analysis with multiple mediators. Epidemiologic Methods, 2(1), 95–115. 10.1515/em-2012-0010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. VanderWeele  T. J., Vansteelandt  S., & Robins  J. M. (2014). Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology, 25(2), 300–306. 10.1097/EDE.0000000000000034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Van Garderen  K., & Van Giersbergen  N. (2022). A nearly similar powerful test for mediation. arXiv, arXiv:2012.11342, preprint.
  64. Wang  H. J., McKeague  I. W., & Qian  M. (2018). Testing for marginal linear effects in quantile regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(2), 433–452. 10.1111/rssb.12258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Zhao  S. D., Cai  T. T., & Li  H. (2014). More powerful genetic association testing via a new statistical framework for integrative genomics. Biometrics, 70(4), 881–890. 10.1111/biom.v70.4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Zhou  R. R., Wang  L., & Zhao  S. D. (2020). Estimation and inference for the indirect effect in high-dimensional linear mediation models. Biometrika, 107(3), 573–589. 10.1093/biomet/asaa016 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

qkad129_Supplementary_Data

Data Availability Statement

Due to privacy restrictions, we are unable to directly share the raw data publicly but they may be obtained offline according to a formal data request procedure outlined in the University of Michigan Data Use Agreement protocol. To satisfy the need of reproducibility, instead, we have introduced a pseudo-dataset with added noise on the GitHub repository: He et al. (2023).


Articles from Journal of the Royal Statistical Society. Series B, Statistical Methodology are provided here courtesy of Oxford University Press

RESOURCES