Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Aug 15.
Published in final edited form as: Stat Med. 2020 May 4;39(18):2423–2436. doi: 10.1002/sim.8546

Group Testing in Mediation Analysis

Andriy Derkach 1,*, Steven C Moore 2, Simina M Boca 3, Joshua N Sampson 1
PMCID: PMC8262108  NIHMSID: NIHMS1668675  PMID: 32363646

Abstract

We consider the scenario where there is an exposure, multiple biologically-defined sets of biomarkers, and an outcome. We propose a new two-step procedure that tests if any of the sets of biomarkers mediate the exposure/outcome relationship, while maintaining a prespecified Family-Wise Error Rate (FWER). The first step of the proposed procedure is a screening step that removes all groups that are unlikely to be strongly associated with both the exposure and the outcome. The second step adapts recent advances in post-selection inference to test if there are true mediators in each of the remaining, candidate sets. We use simulation to show that this simple two-step procedure has higher statistical power to detect true mediating sets when compared with existing procedures. We then use our two-step procedure to identify a set of Lysine-related metabolites that potentially mediate the known relationship between increased BMI and the increased risk of ER+ breast cancer in post-menopausal women.

Keywords: group testing, high dimensional mediation, pathway analysis

1. Introduction

Mediation analysis explores how an exposure (E) is associated with an outcome (Y) 1,2. Traditionally, mediation analysis assumes the exposure influences a single mediating variable (M) which, in turn influences the outcome (Figure 1A), and then aims to decompose the total effect of the exposure into a direct and indirect (i.e. via M) effect 24. Initial methods explored this decomposition using parametric models 5,6, while more modern methods use a counterfactual framework 2,7. Recently, epidemiological studies have considered high-dimensional biomarkers as potential mediators linking an exposure and disease 810. Here, we focus on the scenario when the biomarkers can be split into disjoint, biologically defined sets (i.e. groups) and the specific goal is to test whether any of these predefined sets potentially mediate the exposure/outcome relationship (Figure 1B). By pooling signals from multiple biomarkers within a set, these group level tests may increase the power to detect true mediators. Group level tests have already been demonstrated to increase the power1820 for testing associations when looking at groups of rare-variants in genes12,13, groups of genetic variants in pathways14,15, and groups of biomarkers with shared function or structure16,17. We note, however, that we can only increase power when the mediating biomarkers belong to the same group, a condition that will strongly depend on the type and quality of set definitions.

Figure 1: Causal graphs of the mediation models.

Figure 1:

A) Traditional mediation analysis focus on the exposure influencing individual biomarkers, B) Mediation analysis tests whether any of these predefined q sets mediate the exposure/outcome relationship.

There is a growing body of literature exploring how high-dimensional biomarkers can be used in mediation analysis, specifically discussing how to identify sets of biomarkers that collectively mediate the exposure/outcome association9,21,22, test if individual biomarkers mediate the association 8,21,23,24, and test if predefined groups of biomarkers mediate the association 10,25. Here, we add to the literature by proposing a new procedure for testing groups of biomarkers. Our motivating study 26 is a 843-individual case-control study of ER+ positive breast cancer where the goal is to identify if one of the 38 biologically defined sets of metabolites mediates the relationship between higher body mass index (BMI) and the increased risk of breast cancer.

The first step of our two-step (TS) procedure is a screening step that removes all sets unlikely to be strongly associated with both the exposure and the outcome. The second step of the procedure adapts a method for post-selection inference 2729 to attach a corrected or conditional p-value to each biomarker in the remaining sets. We then claim a set to be a mediating set if one of the included biomarkers is a statistically significant mediator based on this conditional p-value. Specifically, we define our p-value for a set of biomarkers to be the minimum of these conditional p-values. This approach builds upon two recent advances in the genetics literature, improved group tests of rare variants30,31 and post-selection inference to identify specific associated variants within the group2729. This approach, screening by group and testing individual biomarkers, has higher power to detect mediating sets than the standard methods used to test groups. Moreover, our approach has the additional benefit of identifying the specific biomarkers in a set that are the actual mediators. We note that we defined our set-level p-value to be the minimum p-value, as opposed to using Fisher’s method, because the statistical distribution for the sum of the logged-conditional p-values could not be easily described.

The remainder of the paper is organized as follows. In the Materials and Methods, we describe the proposed TS procedure, the simulations used to evaluate the procedure, and the motivating study of breast cancer. In the Results, we compare the performance of the TS procedure with comparators in the simulations and report our findings from the breast cancer study. Finally, in the Discussion, we offer insights about the differences between the TS and other procedures.

2. Materials and methods

2.1. Notation

Let us consider n individuals. For individual i, let Ei be the exposure, Yi be the outcome, and Mi=Mi1,,Mim' be a vector of m biomarkers or potential mediators. For a given biomarker j, we denote the m-1 set of biomarkers without j by Mi\j=Mi1,,Mij-1,Mij+1,Mim'. For this paper, our potential mediators will always be biomarkers and we will use the terms interchangeably. We classify all m biomarkers into q predefined disjoint sets, where the ms biomarkers of set s{1,,q} are indexed by Gs={s1,,sms}{1,,m}, and we then define Mis=Mij':j'Gs and Mi\s=Mij':j'Gs. Finally, we let s(j) be the set containing biomarker j,Gs(j)\j be the indices for all biomarkers, other than j, in the set s(j) and Mis(j)\j={Mij':j'Gs(j)\j}.

2.2. Causal Inference

We introduce counterfactual notation. We define Mi(e)=Mi1(e),,Mim(e)' to be the value of the biomarkers in subject i if Ei is set to e and we define Yi(e,Mi(e')) to be the value of the outcome if Ei is set to e and Mi is set to Mi(e'). Given the number of biomarkers and their unknown and potentially bidirectional relationships, we cannot allow the biomarkers to be functions of each other (i.e. Mij() is only a function of e) when using counterfactual notation.

We can then define the total effect from changing e to e' to be TE=EYie',Mi(e')-Yie,Mie, the Natural Indirect Effect (NIE(s)) through a given set s to be NIEs=EYie,Mise',Mi\seYe,Mie, and the Natural Indirect Effect (NIE(j)) through a given biomarker j to be NIE(j)=E[Yi(e,Mije',Mi\j(e))Yi(e,Mi(e))]..

We would like to claim s to be a mediating set if NIE(j) ≠ 0 for at least one jGs. We emphasize that, as shown below, this statement would differ from claiming that s is a mediating set if NIE(s) ≠ 0. Our formal definition of mediating set is offered in Section 2.3.

2.3. Continuous Outcome

We will first assume that the biomarkers and outcome are continuous random variables defined by

Mij=β0j+βjEi+ϵMjiforj=1,,m, (3)
Yi=γ0*+γE*Ei+j=1mγj*Mij+ϵYi* (4)

where ϵMi=(ϵ1i,,ϵmi)'N0,ΣM,ϵYi*~N(0,σY*2), and ϵMiϵYi*. Equation (4) further implies that for any set s

Yi=γ0s+γEsEi+jGsγjMij+ϵYis (5)

We can define the causal effects from Section 2.2 in terms of the parameters from equations 35: TE=e'-e(γE*βj+j=1mγj*βj), NIE(s)=e'-ejGsγj*βj, and NIE(j)=e'-eγj*βj. When equations 35 hold, we also note that the following assumptions, provided by Imai et al 32, will also hold and allow all causal effects to be estimable.

Assumption 1 (Sequential ignorability)

Yie',m,MieEi|Xi=x,
Yie',mMie|Ei=e,Xi=x,
Mije'Mi\j(e)|Ei=e,Xi=xfor anyj=1,,m,

where X denotes the vector of observed pre-treatment covariates.

However, when m > n, we may not be able to estimate the parameters in equation (4). Therefore, we may not be able to estimate NIE(j) and test for its presence. Instead, as a pragmatic compromise, we will use the parametric models from equations (3) and (5), and formally define s to be a mediating-set if γjβj0 for at least one jGs, or equivalently, if jGsβjγj20. We offer three comments regarding this definition. First, although not easily stated using the language of causal inference, our definition for a mediating-set is still well defined. Second, when the biomarkers from different sets are independent given Ei,NIEj=e'-eγj*βj=e'-eγjβj and our definition has the desired meaning. Third, we note that there are other powerful methods for detecting NIEs=0. One approach10 would be transforming the biomarkers using spectral decomposition and then individually testing each of the resulting linear combinations

We will fit models (3) and (5) using linear regressions to obtain the Maximum Likelihood Estimates (MLE) for set s. We denote the MLE by β^s=β^s1,,β^sms' and γ^s=γ^s1,,γ^sms'; we denote the combined vector by θ^s=β^s1,,β^sms,γ^s1,,γ^sms'. Furthermore, we denote the estimates of the covariances for β^s,γ^s, and θ^s by Σ^βs,Σ^γs, and Σ^Θs, the standard errors of the jth element of β^s by σ^βsjand the standard error of the jth element of γ^s by σ^γsj. Note, the cor(β^j, γ^j')=0 for all j,j'Gs. The latter is true because the likelihood, fY,ME;β,γ, can be factored into two components, fY,ME;β,γ=fYM,E;γfME;β, each containing only one set of parameters, as described in previous literature 8,10,24.

We can define biomarker-level p-values to test the null hypotheses H0Ej:βj=0 and H0Yj:γj=0 by pE,j=FZ(-ZEj) and pY,j=FZ-ZYj, where ZE,j=β^j/σ^βj,ZY,j=γ^j/σ^γj, and FZ is the Cumulative Distribution Function (CDF) for the standard normal distribution. We choose the normal distribution as opposed to the t-distribution because relevant studies of high dimensional biomarkers typically have significantly more subjects than markers in per set, n >> ms.

We can further define a weighted set-level p-value to test the set’s association with the exposure and outcome using one of two variance component tests 30,31. For the first method, we test for an association between the group of biomarkers and E or Y using the following pooled test statistic

TY,βs=TE,γs=jGsβ^jγ^j2, (10)

where the complementary effect estimates are used as weights. Thus, when testing for the association between the set of biomarkers and the exposure, we treat γ^s as fixed weights and, similarly, when testing for the association with the outcome we treat β^s as fixed weights. Both test statistics TY,βsandTE,γs explicitly upweights biomarkers that have large effects with exposure or outcome. The corresponding p-values for the set s, pE,γs=1-FχE,s2TE,γs and pY,βs=1-FχY,s2TY,βs, are calculated from two functions, FχE,s2 and FχY,s2, that are the CDFs for a linear combinations of χ2 distributions with weights determined by γ^s and β^s. For the second method, we test for an association between the group of biomarkers and the exposure or outcome without any weighting, using the statistics

TE,1s=jGsβ^j2 (11)
TY,1s=jGsγ^j2. (12)

The corresponding p-values for the set s, pE,1s=1-FχE1,s2TE,1s and pY,1s=1-FχY1,s2TY,1s, are now calculated from two functions, FχE1,s2 and FχY1,s2, that are the CDFs for a linear combination of χ2 distributions with weights set to 1. Note that sets of biomarkers only associated with the outcome or exposure will have relatively small values of TY,βsandTE,γs, compared to TY,1sandTE,1s, making the former potentially more powerful at detecting true signals. However, we incorporate the unweighted tests TY,1sandTE,1s in parts of proposed methodology because of their statistical independence with each other.

2.4. Binary Outcome

Although we have thus-far considered continuous outcomes, we will also consider the scenario where the outcome is a binary random variable. We define the binary outcome, Yi*, by the probit model Yi*=1(Yi>0) and Yi following equations (4) and (5). Here, our definitions of the null hypotheses (i.e. equations 7–9), test statistics, and p-values remain essentially the same. The exception is that we estimate γ^ by probit regression and for retrospective sampling (i.e. case/control models), we estimate β^ by weighted linear regression, where the weights are proportional to the probability of being sampled. Note, the choice of the probit model ensures both that equations (4) and (5) are consistent and that the hypotheses stated in equations (2) and (7) are identical, although the only true requirement is for Eγ^j=0 for biomarkers satisfying H0Yj. We note that, in practice, the procedure performs equally satisfactorily when the coefficients and p-values for the outcome associations are calculated using logistic regression, a model more familiar in epidemiology 24.

2.5. Testing procedures for groups of biomarkers

We first describe existing procedures and then introduce our new, more powerful, two-step procedures for testing sets of biomarkers.

2.5.1. Minimum-P Procedure (MIN)

This procedure is a direct modification of the approach introduced by Sampson and others 24. We start by calculating the FWER-corrected p-value for each biomarker using the MCPS approach. Briefly, define ωE={j:pE,j0.025} and ωY={j:pY,j0.025}. Let |ωE| and |ωY| be the cardinality of each set (i.e. the number of elements in that set). We define a FWER-corrected p-value for each biomarker by pjFWER=2max(|ωE|pE,j,|ωY|pY,j) if jωEωY and 1 otherwise. The MIN procedure then claims a set s to be a mediating set if minjGs(pjFWER)<α. In other words, the MIN procedure claims a set to be a mediating set if one of the biomarkers included in that set qualifies as a mediator after adjusting for multiple testing.

2.5.2. Linear Procedures (LIN)

These procedures were introduced by Huang 25.They suggest two test statistics, each with a normal distribution under the null hypotheses and some additional assumptions. For set s, the two statistics are ZL1s=β^s'γ^s/(β^s'Σ^βs-1β^s+γ^s'Σ^γs-1γ^s)12 and ZL2s=β^s'γ^s/Vmax1/2, where Vmax=maxwΩwν^s'Σ^Θsν^s,ν^s=wY1γ^s1,,wYmsγ^sms,wE1β^s1,,wEmsβ^sms', and the max is over all binary 2ms-length vectors Ωw=wY1,,wYms,wE1,,wEms':wEj0,1,wYj0,1,wEj+wYj=1. The LIN-1 and LIN-2 procedures will, respectively, claim a set s to be a mediating set if pL1s=FZ(-|ZL1s|)<α/q and pL2s=FZ(-|ZL2s|)<α/q. For the comparison below, we consider only the more powerful, LIN-2, which we abbreviate by LIN. We note that the LIN procedure is designed to answer a slightly different problem and test the null hypothesis that NIEs=0.

2.5.3. Quadratic Procedure (QUAD)

This procedure was introduced by Huang and Pan 10. To account for the possibility of effects in different directions, they suggest the statistic TQs=jGs(β^jγ^j)2 and a novel parametric bootstrap to obtain the corresponding p-value. Specifically, they randomly generate B bootstrap replicates of θ^b={β^b,γ^b} from a normal distribution with mean θ^s and variance Σ^Θs. They then calculate TQ,bs=jGs(β^jbγ^jb-1Bbβ^jbγ^jb)2 for each boostrapped set and define the p-value to be pQs=1Bb1(TQ,bs<TQs). The QUAD procedure will claim a set s to be a mediating set if pQs<α/q. We note that the original QUAD procedure offered modified methods that could handle large sets, with ms>n, a scenario not considered here.

2.5.4. Marginal Procedure (MARG)

This procedure, the first to be introduced here, is a set-level modification of the MCPs statistic 24. Importantly, for reasons discussed at the end of this section, we do not suggest that this overly-simplified approach controls the FWER and include it because it provides a reference for the maximum possible power that can be reasonably expected from a test. MARG is based on p-values pE,1s and pY,1s calculated from the unweighted pooled association test statistics TY,1sandTE,1s. Define ωEG=s:pE,1s0.025 and ωYG=s:pY,1s0.025 so that they are the sets potentially associated with the exposure and outcome. Let ωEG and ωYG be the cardinality of each set (i.e. the number of elements in that set) and the marginal p-value be pMs=2max(|ωEG|pE,1s,|ωYG|pY,1s) if sωEGωYG and 1 otherwise. The MARG procedure will claim a set s to be a mediating set if pMs<α. However, the problem is that this procedure only marginally tests if the set is associated with both the exposure and the outcome; the procedure does not ensure that there is a common set of mediating biomarkers associated with both the exposure and the outcome (i.e. that s is a true mediating set). We do note that MARG uses p-values from group tests without weights (i.e. pE,1s and pY,1s) because the p-values pE,γs and pY,βs are not independent under the null hypothesis (see Proposition 1 in Supplemental Material) and therefore can occasionally have lower power than the TS method proposed below. We note that Huang proposed another marginal procedure, JTV-comp33, which again tests for sets associated with both the exposure and the outcome without ensuring that there are true mediators in that set. We offer comparisons with JTV-comp in Supplementary Material Section 4.6 and note that the method did tend to have higher statistical power, albeit with a slightly inflated type-I error rate.

2.5.5. Two-Step Procedure (TS)

This novel procedure is described in Figure 2. In parallel analyses, we identify biomarkers associated with the exposure and we identify biomarkers conditionally associated with the outcome. Note, in each of these analyses, we perform two steps: (i) a screening step to remove sets of biomarkers that are unlikely to be strongly associated with both the exposure and outcome (ii) a testing step that assigns individual p-values to each biomarker in the remaining sets. After these parallel analyses, we define the mediating sets to be those sets with biomarkers associated with both the exposure and outcome. Below we describe the links in Figure 2 for identifying the biomarkers associated with the exposure, while omitting the near identical descriptions for identifying biomarkers associated with the outcome.

Figure 2: Diagram of Two-Step procedure (TS).

Figure 2:

In the first step, we select sets of biomarkers that are potentially associated with the both the exposure and outcome. In the second step, we test individual biomarkers to identify mediators.

Step 1: Define SE=s:pE,γs0.025,pY,1s0.1 based on p-values from weighted and unweighted group test statistics. Here, we screen out those sets that are unlikely to be strongly associated with both the exposure and outcome. Note, this set differs from SY. Further note that choices of 0.025 and 0.1 can be modified, but we have found these thresholds to work well in practice for the FWER-corrected p-value of 0.05.

Step 2a: For each biomarker in one of the remaining sets, GE1sSEGs, we calculate a conditional p-value for association (i.e. conditioned on its set passing the screening step). Here, we define pE,jC=1-FZE,jTβ^j for jGE1 where FZE,jT is the truncated normal distribution described in the Supplementary Material Section 1.3, and, for completeness, define pE,jC=1 for jGE1.

Step 2b: We then divide GE1 into two complementary sets of biomarkers, GE1=GE2GE2C, where GE2={j:pY,j<0.025} is the set of candidate biomarkers. Let mE be the number of biomarkers in GE2.

Step 2c: For each biomarker in GE2, we now calculate an adjusted conditional p-value, where the adjustment is needed to account for multiple testing. In its simplest form, the adjusted p-value would be pE,jA=minmEpE,jC,1. However, we find it beneficial to decrease the multiple-testing penalty for those biomarkers strongly associated with the outcome (i.e. potentially true mediating biomarkers). Therefore, we define the adjusted p-value to be pE,jA=min(mE,jpE,jC,1), mE,j=(jGE2γ^j2)/γ^j2, if jGE2 and, for completeness, define pE,jA=1 for jGE2.

Step 2d: After completing steps 2a-2c for both the exposure and the outcome, we can now define an adjusted p-value for mediation by pjM=2max(pE,jA,pY,jA) for markers jGE2GY2 and, for completeness, define pjM =1 for jGE2GY2. We then say a set, s, is a mediating set if PTSsminjGspjM<α. Let us formally define the FWER for the TS procedure by FWERTS P(minsPTSs<α). Then, in Supplementary Material Section 1.5, we prove the following theorem.

Theorem 1.

If MisMis'givenEifors,s'{1,..,q}, then limnFWERTSα.

We note that the assumption, MisMis'|Ei, is unlikely to hold in practice and violations of this assumption can lead to the TS procedure having an inflated type I error. Consider the example where E affects Mj(jGs) and a second biomarker Mj'(j'Gs') also affects Mj. Furthermore, of the two biomarkers, only Mj' affects the outcome. Then, s may be mistakenly classified as a mediating set. In Supplementary Material Section 3.1, we offer simulations to show that the effect of Mj' on Mj must be large for there to be an inflated type I error rate. We note that when nm, we can modify our approach so that its performance does not require this assumption (Supplementary Material Section 2).

We offer a couple of remarks. First, the initial screening uses a quadratic (i.e. TE,γs=jGsβjγj2), as opposed to a linear (i.e. TLs=β^s'γ^s), test statistic. This choice offers increased power when the proportion of associated biomarkers is low or biomarkers within a set have opposing effects. Second, the initial screening steps select sets using one weighted and one unweighted test statistic. Ideally, we would have used two weighed statistics but that would greatly complicate post selection inference because of the dependence between TY,βs,TE,γs (see Supplemental Material Section 1.2). Post-selection inference in the second step of TS allows to quantify mediating effects (i.e. NIE(j)) for detected potential mediators (see Supplemental Material Section 1.4).

2.6. Simulations

We compared the performance of the five previously defined procedures (MIN, LIN, QUAD, MARG, TS) for testing sets of biomarkers. Specifically, we used simulations to estimate the power to detect a mediating set of biomarkers in various scenarios defined by equations (35) and the parameters in Table 1.

Table 1:

Simulation Parameters

Parameter Interpretation Possible Values
q Number of sets of biomarkers 15, 20, 50
qOD,qTD Number of one- or two-dimensional sets 0, 4, 6
ms Number of biomarkers per set 15, 20, 50
mM Number of mediating biomarkers in the mediating set 1, 3, 5
mE Number of noise biomarkers in the mediating set associated only with exposure 0, 2, 4
mY Number of noise biomarkers in the mediating set associated only with outcome 0, 2, 4
mEY Number of noise biomarkers in the mediating set with half associated only with the exposure and half associated only with the outcome 0, 2, 4
mD Number of associated biomarkers in one- or two-dimensional sets 6
ρM Correlation between biomarkers within a set 0, 0.25, 0.5
βj Non-null effect of exposure on metabolite 0.065A, 0.045B
γj* Non-null effect of metabolite on outcome 0.065 A, 0.085B
A

Models with a continuous outcome

B

Models with a binary outcome

Bold values indicate default settings.

We assumed there were a total of q{15,20,50} sets of biomarkers, each containing ms{15,20,50} biomarkers.

We assumed that there was qm=1 mediating set, qOD{0,4,6} one-dimensional sets, and qTD{0,4,6} two-dimensional sets, where we say a set is one-dimensional if it contains mD=6 biomarkers associated with only the exposure or only the outcome and two-dimensional if it contains mD/2=3 biomarkers associated with only the exposure and mD/2=3 biomarkers associated with only the outcome. For the one mediating set, we assumed there were mM{1,3,5} true mediators, mE{0,2,4} “noise” biomarkers associated with only the exposure, mY{0,2,4} “noise” biomarkers associated with only the outcome, and mEY=mE+mY. In Supplemental Figure S1, we provide additional details on the simulation model.

Unless otherwise designated, we used the default parameters highlighted in black for the simulations.

In all scenarios, the exposure followed a normal distribution defined by Ei~N(0,1) and the m biomarkers followed a multivariate normal distribution defined by equation 3. The correlation (i.e. off-diagonal element in ΣM) for metabolites within a set was chosen to have AR(1) structure (i.e. ρij=ρMi-j) with ρM{0,0.25,0.5}. For continuous outcomes, the sample contained a total of n=2500 individuals and the outcome followed the normal distribution defined by equation 4. For non-null associations, we let the magnitudes of all effects be the same with βj=γj*=0.065. For a binary outcome, the sample contained n=2500 cases and n=2500 controls retrospectively sampled from a large cohort with outcomes generated from a logistic model logit(P(Yi|Ei,Mi))=γ0*+γE*Ei+j=1mγj*Mij and the incidence defined by γ0*=log0.01099=-4.6, the non-null exposure effects defined by βj=0.045 and the non-null outcome effects defined by γj*=0.085. The choice of effect sizes ensured similar power when testing continuous and binary outcomes. We generated 1000 simulations per scenario to estimate power of five methods at a FWER=α=0.05.

We also conducted additional sensitivity analyses. We explore the effect of confounding in Supplementary Material Section 3.1. Specifically, we consider the scenario where there are no mediators, but there are confounders that link the exposure, biomarkers, and outcome. We also explore the methods under a wider set of parameter values in Supplementary Material Section 3.2. Specifically, we further explore the power when varying the proportion of biomarkers in a set that are mediators and the strength of the biomarkers’ combined association with both the exposure and outcome.

2.7. Metabolomic Study of Breast Cancer

Our motivating study aims to identify metabolites that mediate the known relationship between high BMI and the increased risk of estrogen-receptor positive (ER+) breast cancer. This study nested inside the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Study (PLCO), includes 410 (ER+) breast cancers and 433 controls matched on study age (+/− 2 years), date of blood collection (+/− 3 months), and hormone therapy use at baseline. The study collected serum samples at the first follow up visit, one-year post-baseline, and using these specimens, measured the serum metabolites (< 1 Kilodalton molecular weight) with liquid chromatography-tandem mass-spectrometry. Metabolite peaks were normalized by dividing by batch median and then log transformed. Of the 1057 measured serum metabolites, we consider the 481 metabolites that have known identities and are present in at least 90% of the case-control population. These 481 metabolites can be divided into 38 disjoint sets defined by their biologic properties 34. Details on the study have been previously published 26.

3. Results

3.1. Simulations

In general, TS successfully controlled the FWER at the targeted level (0.05). The MIN, LIN, and QUAD procedures similarly controlled the FWER. However, we note that all four procedures tend to have conservative FWERs when most biomarkers are associated with neither the exposure nor the outcome. The conservative nature of the tests observed here is consistent with previous observations when testing individual biomarkers21. As expected, the MARG procedure had inflated FWER when the number of two-dimensional sets was greater than 0 (Supplemental Figures S3F and S8F). Again, this inflated type I error results from falsely declaring a set as a mediating set if it is marginally associated with both the exposure and outcome, regardless of whether true mediating biomarkers are present. Additionally, TS, MIN, LIN and QUAD were relatively robust to the presence of unmeasured confounders affecting the biomarkers, exposure and outcome (see Supplemental Figures S3-S12).

The simulations demonstrate that the newly proposed TS procedure has comparable or better performance characteristics then the competing methods in all tested scenarios. Note, we show the results for only a selection of illustrating scenarios in the main text and show additional results in the supplementary material. We focus on a baseline scenario with q=20 disjoint sets of size ms=20,qm=1 mediating set, qOD=0 one-dimensional sets, qTD=0 two-dimensional sets, mM=3 mediating biomarkers, mE=0 and mY=0 noise-biomarkers, and a correlation of ρM=0. We then vary individual parameters to assess their specific impact on the relative performance of the five procedures (Figures 3, 4 and Supplemental Figures S13-S22).

Figure 3: Simulations for Continuous Outcome.

Figure 3:

The bar-plots show the power to detect the mediating set when using the TS (yellow), MIN (green), MARG (orange), LIN (red), and QUAD (brown) procedures. The baseline scenario includes ms= 20 biomarkers per set, q= 20 disjoint sets, qm= 1 mediating set, qOD= 0 one-dimensional sets, qTD= 0 two-dimensional sets, mM= 3 mediating biomarkers, mE= 0 noise-biomarkers, and a correlation of ρM= 0. We evaluate the effect of varying a single parameter, keeping all other parameters set to their baseline value.

Figure 4: Simulations for Binary Outcome.

Figure 4:

The bar-plots show the power to detect the mediating set when using the TS (yellow), MIN (green), MARG (orange), LIN (red), and QUAD (brown) procedures. The baseline scenario includes ms= 20 biomarkers per set, q= 20 disjoint sets, qm= 1 mediating set, qOD= 0 one-dimensional sets, qTD= 0 two-dimensional sets, mM= 3 mediating biomarkers, mE= 0 noise-biomarkers, and a correlation of ρM= 0. We evaluate the effect of varying a single parameter, keeping all other parameters set to their baseline value.

The MARG procedure unexpectedly did not have the highest power under all settings. The main disadvantage of MARG is that it uses suboptimal statistics TE,1s and TY,1s to detect associations between the group of biomarkers and either the exposure or outcome. Nevertheless, MARG did have the highest power in most settings, and performed significantly better when the number of associated markers, mM,mE,mY,andmEY, in the mediating set was large (Figure 3D and 3G-1I and Figure 4D and 4G-2I). JTV-comp33, a similar procedure to MARG, also had high, if not the highest power, when included in simulations (see Supplemental Figure S21). However, this study not only had an inflated type I error when the number of two-dimensional sets was greater than 0 but also in some scenarios when the number of one-dimensional sets was greater than 0 (see Supplemental Figure S22).

QUAD generally had the lowest power to detect the mediating set in all our simulations. The power tends to be low because the parametric bootstrap procedure 10 is known to be overly-conservative. LIN generally has power only slightly lower than TS. However, LIN has higher power when either the number of mediators in a set is large (see Figure 3D and 4D for setting with mM=5 and Supplemental Figures S15 and S16) or the number of test sets is small (see Figure 3A and 4A for setting with q=15). The power for LIN was significantly affected by the number of noise biomarkers in a mediating set (Figures 3G-1I and 4G-2I) because large values of β^s and γ^s increase the variance in the denominator of the statistic. We note that we did not evaluate LIN in examples where the biomarkers in the mediating set have opposing effects (i.e. β^s'γ^s0) as clearly LIN would have little to no power in this potentially unrealistic scenario.

The MIN procedure had reasonable power, achieving more than 80% of the power of the TS procedure for the baseline scenario. Moreover, the MIN procedure had the highest power when there was only mM=1 mediating biomarker and q= 20 disjoint sets (Figures 3D and 4D). However, even when there was only a single mediating biomarker, increasing the number of sets and, therefore, biomarkers, decreased the difference in power achieved by the MIN and TS procedures (see Supplemental Figures S13 and S14).

The final observation is that the TS procedure had consistently higher power than the other methods for the realistic scenarios evaluated here. We note that changing the number of one- and two- dimensional sets did not have a meaningful impact on the overall or relative performance of the TS procedure (Figures 3E-3F, 4E-4F). We note that increasing the number of noise biomarkers resulted in a slight loss of power for the TS procedure but had no effect on the MIN procedure (Figures 3D-3I, 4D-4I). In contrast to other tests, increasing the total number of sets had minimal effect on the TS procedure, but greatly reduced the power of the MIN, LIN and QUAD procedures. In the supplementary material, we compared the four procedures in a wider set of scenarios that have been more rarely observed in practice. We note that when the proportion of mediating biomarkers (mM/ms) was large or close to 1, LIN and QUAD did have higher power (see Supplemental Figures S17 and S18). However, we also note that when the proportion was low or the number of sets was large, the TS procedure had notably higher power. Specifically, when q= 100 disjoint sets and ms=100, the TS procedure had power above 80% while the other procedures had power below 40% (see Supplemental Figures S13 and S14). As expected, the power for all procedures, including TS, is significantly improved by increasing the number of mediating biomarkers in the set and/or reducing the correlation between biomarkers (Figures 3C, 4C). We also saw similar results when the overall effect of mediators increased (i.e. 0.1% to 2% of phenotypic variation) while the number of mediators remained constant (see Supplemental Figures S19 and S20). We note that for our TS procedure, which is based on the minimum conditional p-value, this increase in power can be mainly attributed to the increased probability that the set is selected in the first step.

Increased correlation reduces the power for all tests because σ^γ increases significantly when the regression for the outcome includes highly correlated biomarkers. Moreover, for the TS approach, the p-values from post-selection inference tend be larger because, in theory, an association with an outcome may be caused by the association to another mediating biomarker. Finally, compared to MIN, the TS identified a larger number of true mediators (see Supplemental Figures S23 and S24), consistent with the overall improvement in power (see Figure 3 and 4).

3.2. Breast Cancer Study

We tested the 38 sets of metabolites to determine if any mediated the relationship between BMI and risk of breast cancer. We observed that TS selected three pathways associated with the exposure and seven pathways associated with the outcome at step one, but only one pathway, Lysine metabolism, was identified as a mediating set with adjusted p-value PTSs=0.043. TS identified that the specific mediating biomarker, 3-Methylglutarylcarnitine-1, drove the association with pjM=0.043. LIN and MIN also detected the same pathway with adjusted p-value of 0.0001 and 0.041. Associations detected by LIN, TS and MIN were large driven by the single biomarker 3-Methylglutarylcarnitine-1 (see Table 2). On the other hand, MARG discovered a different Sterol/steroid pathway (pMs=0.038) that contains several sex hormones, suggesting that non-overlapping sets of hormones may be associated with the exposure and outcome. Lastly, QUAD did not identify any pathways that are potential mediators. However, the lowest adjusted p-value of 0.18 was for the Lysine metabolism group. In Table 2, we present the metabolites in the pathway discovered by LIN, TS and MIN and we present similar results for the Sterol/steroid pathway in the Supplementary material (Table S2).

Table 2:

Individual biomarker results for Lysine metabolism group pathway

Metabolite P-value for association with risk of breast cancer P-value for association with BMI
2-Aminoadipate 0.43 5.3∙10−4
3-Methylglutarylcarnitine-1* 9.5∙10−5 2.0∙10−8
3-Methylglutarylcarnitine-2 0.02 0.31
Glutarate pentanedioate 0.82 0.96
Lysine 0.35 0.25
N6-Trimethyl-L-lysine 0.13 0.02
N2-Acetyl-L-lysine 0.28 0.09
N6-Acetyl-L-lysine 0.17 0.55
Pipecolate 0.64 0.26
3-Methylglutaconate 0.62 0.82
*

Metabolite discovered by TS and MIN to mediate effect of BMI

4. Discussion

We introduced a new procedure, TS, to test if sets of biomarkers mediate the relationship between an exposure and an outcome. The TS procedure is computationally efficient, controls the family-wise error rate (FWER), and has high statistical power for detecting potentially mediating sets of biomarkers. Additionally, the TS also identifies individual biomarkers that are potential mediators. The strength of the method comes from the first, screening, step that removes all sets that are unlikely to be strongly associated with both the exposure and the outcome. As compared with standard association tests this screening step removes a significantly larger number of sets (e.g. 100 x (1–0.025 × 0.1) = 99.75% of sets removed compared to 100 x (1–0.025)=97.5% of sets removed). The statistical complication, which was solved and discussed in the supplementary material, is to calculate the adjusted, conditional p-values based on this screening approach. In addition to higher power, the added benefit of our approach is that it identifies the individual mediating biomarkers and allows practitioners to make more precise claims about the underlying biology by measuring effects mediated through biomarkers. In the remainder of this section, we focus on providing insight into the trends observed in our simulations, potential extensions to the TS procedure, and a discussion on the benefits of group testing.

One observation is that the QUAD and LIN procedures had lower power in many tested scenarios. Here, we offer some comments about those tests. First, importantly, we note that these procedures were not designed for scenarios where only a small subset of the biomarkers in the mediating set qualify as actual mediators. Second, despite having lower power, these procedures still have the advantage of offering a set-level p-value. In contrast, the TS procedure offers biomarker-level p-values for elements of the set, and then uses the minimum biomarker-level p-value as the set-level p-value. Third, the test statistic used in the QUAD procedure is similar to those statistics use in the first step of the TS procedure, TQs=TY,βs=TE,γs=jGsβ^jγ^j2 and therefore the similarity in procedures’ statistics needs to be reconciled with the dissimilarity in their performance. For the null distribution, the QUAD procedure assumes that both of the estimated vectors, β^s and γ^s, follow multivariate normal distributions with non-zero means, while the TS procedure assumes one vector is a group of fixed weights and the other vector follows a multivariate normal variable with zero means. As further illustrated by an example in Supplementary material of the original paper 10, the variance of TQ,bs under its associated null is an order of magnitude larger than the variance of TYs or TEs under their associated nulls. Fourth, when testing a set of biomarkers, the quadratic version of the test statistics (see corresponding references 30,31) generally have higher statistical power than their linear counterparts when only a subset of biomarkers are true positives regardless of the sign of effect. A second observation, which applies to all procedures, is that testing groups will only have higher power, as compared to testing individual biomarkers, when the groupings combine mediators together. Albeit not explicitly stated, our examples showed the cost if the groups were formed randomly. In this extreme scenario, a set would likely have at most one mediating biomarker (i.e. mM=1). As Figures 3D, 4D, S17, and S18 illustrate, the power of all group tests are noticeably lower than the power of MIN, where we recall that MIN is equivalent to testing each biomarker individually. A third observation is that TS and MIN had low power when the majority of biomarkers in a set each had a small mediating effect. The last point, which we did not show by simulation, is that in the presence of alternating effects, where βjγj can be both positive and negative, LIN will clearly have lower power to detect mediating sets.

There are potential modifications to the TS procedure. First, as remarked previously, in the initial step of TS method, we used a combination of weighted statistics, TY,βs=TE,γs=jGsβ^jγ^j2 and unweighted statistics, TE,1s=jGsβ^j2 and TY,1s=jGsγ^j2, to test the marginal null. The more powerful approach would be to select candidate sets by weighted tests only (i.e. S=s:pE,γs0.025,pY,βs0.025). However, in the second step of our new procedure, the adjustment of p-values would need to take into consideration selection by both of these test statistics, a complication that requires further thought. The second modification of TS would be to calculate the adjusted p-values by computing them under the global null 28. Such a test can provide better power when the proportion of mediators in a set is very low. A third modification is to allow one biomarker to appear in multiple sets. However, such a modification is not straight-forward, as the assumptions in Theorem 1, would clearly fail to hold. Another potential modification is to allow there to be exposure/biomarker interactions in the outcome model. As a final note, we refer to each selected set as a potentially mediating set, as opposed to simply referring to it as a mediating set. We note that the test for an association between a set of biomarkers and the outcome ignores biomarkers from all other sets. Therefore, for these association tests, the other biomarkers are “unmeasured” confounders that could potentially bias our findings. Note, for large m, such bias is difficult to avoid because models cannot easily include all biomarkers.

We expect there to be growing interest in testing whether groups of biomarkers are mediators. In some scenarios, we might expect the exposure to affect an underlying, latent process that affects both the individual biomarkers and the outcome22. In other scenarios, we might expect the exposure to directly affect the biomarkers and the biomarkers to directly influence the outcome8,10,22,33. Recently, for example, Chen and colleagues9 investigated whether a thermal stimula excited a region of the brain (e.g. sets of fMRI voxels in common areas), which in turn affected the reaction. Huang33 recently investigated whether smoking affected methylation levels in a gene (e.g. sets of probes linked to a common gene), which in turn affected cancer risk. As another example, a study8 investigated whether high intake of fish, as measured by a questionnaire, influenced serum levels of sets of metabolites (e.g. sets that were associated with consumption of specific fish), which in turn were associated with a reduced risk of colorectal cancer.

Supplementary Material

appendix

Footnotes

Data availability statement

The data and code and data have been submitted the journal to be openly available.

5 References

  • 1.Steen J, Loeys T, Moerkerke B, Vansteelandt S. Flexible Mediation Analysis With Multiple Mediators. Am J Epidemiol. 2017;186(2):184–193. [DOI] [PubMed] [Google Scholar]
  • 2.VanderWeele TJ, Vansteelandt S. Mediation Analysis with Multiple Mediators. Epidemiol Methods. 2014;2(1):95–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Interpretation Pearl J. and identification of causal mediation. J Psychological methods. 2014;19(4):459. [DOI] [PubMed] [Google Scholar]
  • 4.Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3(2):143–155. [DOI] [PubMed] [Google Scholar]
  • 5.Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol. 1986;51(6):1173–1182. [DOI] [PubMed] [Google Scholar]
  • 6.MacKinnon DP. Introduction to statistical mediation analysis. New York: Lawrence Erlbaum Associates; 2008. [Google Scholar]
  • 7.Imai K, Keele L, Tingley D. A general approach to causal mediation analysis. Psychol Methods. 2010;15(4):309–334. [DOI] [PubMed] [Google Scholar]
  • 8.Boca SM, Sinha R, Cross AJ, Moore SC, Sampson JN. Testing multiple biological mediators simultaneously. Bioinformatics. 2014;30(2):214–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chen OY, Crainiceanu C, Ogburn EL, Caffo BS, Wager TD, Lindquist MA. High-dimensional multivariate mediation with application to neuroimaging data. Biostatistics. 2018;19(2):121–136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Huang YT, Pan WC. Hypothesis test of mediation effect in causal mediation model with high-dimensional continuous mediators. Biometrics. 2016;72(2):402–413. [DOI] [PubMed] [Google Scholar]
  • 11.Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–15550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Flannick J, Mercader JM, Fuchsberger C, et al. Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature. 2019;570(7759):71–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vidmar L, Maver A, Drulovic J, et al. Multiple Sclerosis patients carry an increased burden of exceedingly rare genetic variants in the inflammasome regulatory genes. Sci Rep. 2019;9(1):9171. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Saunders EJ, Dadaev T, Leongamornlert DA, et al. Gene and pathway level analyses of germline DNA-repair gene variants and prostate cancer susceptibility using the iCOGS-genotyping array. Br J Cancer. 2018;118(6):e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fransen E, Bonneux S, Corneveaux JJ, et al. Genome-wide association analysis demonstrates the highly polygenic character of age-related hearing impairment. Eur J Hum Genet. 2015;23(1):110–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wu C, Delano DL, Mitro N, et al. Gene set enrichment in eQTL data identifies novel annotations and pathway regulators. PLoS Genet. 2008;4(5):e1000070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Huang J, Weinstein SJ, Moore SC, et al. Pre-diagnostic Serum Metabolomic Profiling of Prostate Cancer Survival. J Gerontol A Biol Sci Med Sci. 2019;74(6):853–859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Moutsianas L, Agarwala V, Fuchsberger C, et al. The Power of Gene-Based Rare Variant Methods to Detect Disease-Associated Variation and Test Hypotheses About Complex Disease. PLOS Genetics. 2015;11(4):e1005165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Derkach A, Lawless JF, Merico D, Paterson AD, Sun L. Evaluation of gene-based association tests for analyzing rare variants using Genetic Analysis Workshop 18 data. BMC Proc. 2014;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Derkach A, Zhang H, Chatterjee N. Power Analysis for Genetic Association Test (PAGEANT) provides insights to challenges for rare variant association studies. Bioinformatics. 2018;34(9):1506–1513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Barfield R, Shen J, Just AC, et al. Testing for the indirect effect under the null for genome-wide mediation analyses. Genet Epidemiol. 2017;41(8):824–833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Derkach A, Pfeiffer RM, Chen TH, Sampson JN. High dimensional mediation analysis with latent variables. Biometrics. 2019;75(3):745–756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Chakrabortty A, Nandy P, Li H. Inference for Individual Mediation Effects and Interventional Effects in Sparse High-Dimensional Causal Graphical Models. arXiv preprint arXiv:10652. 2018. [Google Scholar]
  • 24.Sampson JN, Boca SM, Moore SC, Heller R. FWER and FDR control when testing multiple mediators. Bioinformatics. 2018;34(14):2418–2424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Huang Y-T. Joint significance tests for mediation effects of socioeconomic adversity on adiposity via epigenetics. Ann Appl Stat. 2018;12(3):1535–1557. [Google Scholar]
  • 26.Moore SC, Playdon MC, Sampson JN, et al. A Metabolomics Analysis of Body Mass Index and Postmenopausal Breast Cancer Risk. J Natl Cancer Inst. 2018;110(6):588–597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Heller R, Chatterjee N, Krieger A, Shi J. Post-Selection Inference Following Aggregate Level Hypothesis Testing in Large-Scale Genomic Data. Journal of the American Statistical Association. 2018:1–14.30034060 [Google Scholar]
  • 28.Heller R, Meir A, Chatterjee N. Post-selection estimation and testing following aggregated association tests. arXiv preprint arXiv:00497. 2017. [Google Scholar]
  • 29.Lee JD, Sun DL, Sun Y, Taylor JEJTAoS. Exact post-selection inference, with application to the lasso. 2016;44(3):907–927. [Google Scholar]
  • 30.Derkach A, Lawless JF, Sun L. Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results. Statist Sci. 2014;29(2):302–321. [Google Scholar]
  • 31.Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89(1):82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Imai K, Keele L, Yamamoto T. Identification, Inference and Sensitivity Analysis for Causal Mediation Effects. Statist Sci. 2010;25(1):51–71. [Google Scholar]
  • 33.Huang YT. Variance component tests of multivariate mediation effects under composite null hypotheses. Biometrics. 2019. [DOI] [PubMed] [Google Scholar]
  • 34.Derkach A, Sampson J, Joseph J, Playdon MC, Stolzenberg-Solomon RZ. Effects of dietary sodium on metabolites: the Dietary Approaches to Stop Hypertension (DASH)-Sodium Feeding Study. Am J Clin Nutr. 2017;106(4):1131–1141. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

appendix

RESOURCES