Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 18.
Published in final edited form as: Biometrics. 2023 Feb 7;79(2):1057–1072. doi: 10.1111/biom.13716

Efficient and robust methods for causally interpretable meta-analysis: transporting inferences from multiple randomized trials to a target population

Issa J Dahabreh 1,2,3,*, Sarah E Robertson 1,2, Lucia C Petito 4, Miguel A Hernán 1,2,5, Jon A Steingrimsson 6
PMCID: PMC10948002  NIHMSID: NIHMS1972901  PMID: 35789478

Summary:

We present methods for causally interpretable meta-analyses that combine information from multiple randomized trials to estimate potential (counterfactual) outcome means and average treatment effects in a target population. We consider identifiability conditions, derive implications of the conditions for the law of the observed data, and obtain identification results for transporting causal inferences from a collection of independent randomized trials to a new target population in which experimental data may not be available. We propose an estimator for the potential (counterfactual) outcome mean in the target population under each treatment studied in the trials. The estimator uses covariate, treatment, and outcome data from the collection of trials, but only covariate data from the target population sample. We show that it is doubly robust, in the sense that it is consistent and asymptotically normal when at least one of the models it relies on is correctly specified. We study the finite sample properties of the estimator in simulation studies and demonstrate its implementation using data from a multi-center randomized trial.

Keywords: meta-analysis, transportability, causal inference

1. Introduction

When examining a body of evidence that consists of multiple trials, decision makers are typically interested in learning about the effects of interventions in some well-defined target population. In other words, they are interested in synthesizing the evidence across trials and transporting causal inferences from the collection of trials to a target population in which further experimentation may not be possible. Typically, each trial samples participants from a different underlying population, by recruiting participants from centers with different referral patterns or in different geographic locations. The goal of evidence synthesis in this context is to use the information from these diverse trials to draw causal inferences about the target population, accounting for any differences between the target population and the populations underlying the trials.

“Meta-analysis,” an umbrella term for statistical methods for synthesizing evidence across multiple trials (Cooper et al., 2009), traditionally focuses on modeling the distribution of treatment effects (effect sizes) across studies or on obtaining unbiased and minimum variance summaries of data from multiple trials (Higgins et al., 2009; Rice et al., 2018). Standard meta-analysis methods produce estimates that do not have a clear causal interpretation outside of the sample of participants enrolled in the trials because the estimates do not pertain to any well-defined target population (Dahabreh et al., 2020). Recent work on “generalizability” and “transportability” has considered methods for extending causal inferences to a target population, when data are available from a single randomized trial (Westreich et al., 2017; Rudolph and van der Laan, 2017; Dahabreh et al., 2020). When multiple trials are available, methods have been proposed for assessing case-mix heterogeneity in individual patient data meta-analyses without specifying or using data from a target population of substantive interest (Vo et al., 2019, 2021).

Here, we propose methods for causally interpretable meta-analysis that can be used to extend inferences about potential (counterfactual) outcome means and average treatment effects from a collection of randomized trials to a target population in which experimental data may not be available (Dahabreh et al., 2020). We consider identifiability conditions, derive implications of the conditions for the law of the observed data, and obtain identification results. We then propose novel estimators for potential outcome means and average treatment effects in the target population. The estimators use data on baseline covariates, treatments, and outcomes from the collection of trials, but only data on baseline covariates from the sample of the target population. We show that the estimators are doubly robust, in the sense that they remain consistent and asymptotically normal provided at least one of the two working models on which they rely is correctly specified. Last, we study the finite sample properties of the estimators in simulation studies and demonstrate the implementation of the methods using data from a multi-center trial of treatments for hepatitis C infection.

2. Data, sampling scheme, and causal estimands

Suppose we have data from a collection of randomized trials 𝒮, indexed by s=1,,m. For each trial participant we have information on the trial S in which they participated, treatment assignment A, baseline covariates X, and outcomes Y. We assume that the same finite set of treatments, 𝒜, has been compared in each trial of the collection (extensions to cases where only a subset of treatments are evaluated in each trial are straightforward but require more cumbersome notation). From each trial s𝒮, the data are independent and identically distributed random tuples Xi,Si=s,Ai,Yi,i=1,,ns, where ns is the total number of randomized individuals in trial s.

We also obtain a simple random sample from the target population (of individuals not participating in the trials), in which treatment assignment may not be under the control of investigators (e.g., individuals may self-select into treatment). We use the convention that S=0 for individuals from the target population. The data from the sample of the target population consists of independent and identically distributed tuples Xi,Si=0,Ai,Yi, i=1,,n0, where n0 is the total number of individuals sampled from the target population. The total number of observations from the trials and the sample of the target population is n=s=0mns. We define a new random variable R, such that R=1, if S𝒮; and R=0, if S=0.

Informally, we assume that the observations from the trials in the collection 𝒮 and from the target population S=0 are Bernoulli-type (independent random) samples (Breslow and Wellner, 2007; Saegusa and Wellner, 2013) from (near-infinite) underlying populations (Robins, 1988). Specifically, we assume that the sample from the target population is representative of a well-specified population of substantive interest. In contrast, we do not require that the trial samples are obtained through a formal sampling process; rather, that the investigators are willing to model the data from each trial as if sampled from a hypothetical underlying population. This modeling choice is consistent with the fairly standard super-population approach to the analysis of individual randomized trials (Robins, 1988). Though alternative frameworks (e.g., randomization-based inference) can be appealing in some cases, we find the super-population approach attractive when the goal is to extend inferences from a collection of trials to a new target population (Dahabreh and Hernán, 2019). Investigators conducting meta-analyses usually do not have control over the selection of participants of each trial or the relative sample size of different trials. Thus, we view the observations available for analysis as obtained by sampling the corresponding underlying populations with sampling fractions that are not under the control of the investigators and are unknown to them. In Section 4 we further formalize this sampling scheme; for now it suffices to say that all expectations and probabilities below are defined under the sampling scheme. Throughout, we use f() to generically denote densities. In Section 4 we discuss the implications of the sampling scheme for identifiability of these densities.

To define causal estimands, let Ya denote the potential (counterfactual) outcome under intervention to set treatment to a𝒜 (Rubin, 1974; Robins and Greenland, 2000). We are interested in the potential outcome mean in the target population, EYaR=0 for each a𝒜, as well as the average causal effect, EYaYaR=0 for each pair of treatments a𝒜 and a𝒜. The treatments used in the target population need not be the same as the treatments used in the trials (e.g., some treatments may not be available outside experimental settings). Furthermore, as we show below, data on treatments and outcomes from the target population are not necessary for identification and estimation of the potential outcome means and average treatment effects of interest.

3. Identification

3.1. Identifiability conditions

The following are sufficient conditions for identifying the potential outcome mean in the target population EYaR=0.

A1. Consistency of potential outcomes: if Ai=a, then Yia=Yi, for every individual i and every treatment aA.

Implicit in condition A1 are assumptions that (i) there is no direct effect of participation in any trial (R=1) or participation in some specific trial (S=s) on the outcome (Dahabreh et al., 2019; Dahabreh and Hernán, 2019); (ii) there are no multiple versions of treatment (or that treatment variation is irrelevant with respect to the outcome (VanderWeele, 2009)); and (iii) there is no treatment-outcome interference (Rubin, 1986, 2010). These assumptions may be most plausible in the case of large pragmatic trials where fidelity to the assigned treatment can be high across settings. Note that consistency of potential outcomes is distinct from the notion of consistency used in so-called network meta-analyses; consistency of potential outcomes refers to the relationship between observed (factual) and potential (counterfactual) outcomes under each treatment, not the relationship between contrasts (“direct” and “indirect”, in meta-analytic parlance) of different treatments.

A2. Exchangeability over treatment A in each trial: for each trial sS and each aA, YaA(X,S=s).

Condition A2 is typically plausible because of randomization (marginal or conditional on X) in each of the trials . The condition is not equivalent to the condition YaA(X,R=1) because the trials may have different randomization ratios and trial participation may have direct effects on the outcome or share unmeasured common causes with the outcome (the last two possibilities, however, are precluded by conditions A1 and A4, respectively). It is also worth noting that condition A2 would also hold if 𝒮 was a collection of observational studies in which the covariates X were sufficient to adjust for baseline confounding. Thus, our results can also apply to pooled analyses of observational studies, provided that background knowledge suggests “adjustment” for X is sufficient to control confounding.

A3. Positivity of treatment assignment in each trial: for each treatment aA and for each trial sS, if f(x,S=s)0, then Pr[A=aX=x,S=s]>0.

Condition A3 is also plausible in marginally and conditionally randomized trials.

A4. Exchangeability over S: for each aA,YaSX.

In applied work, condition A4 will typically be a critical assumption connecting the randomized trials in the collection 𝒮, between them and with the target population (S=0). On its own, condition A4 is not testable; we will show, however, that it has testable implications when combined with conditions A1 and A2). In practice, condition A4 will need to be examined in light of substantive knowledge and examining the impact of violations of the condition may require undertaking sensitivity analyses (Robins et al., 2000).

A5. Positivity of trial participation: for each sS, if f(x,R=0)0, then Pr[S=sX=x]>0.

Informally, condition A5 states that covariate patterns in the target population can also occur in each trial. This condition only involves the observable data and thus is in principle testable. Formal examination of the condition, however, is challenging when X is high dimensional (Petersen et al., 2012).

3.2. Implications of the identifiability conditions

We now explore some implications of the identifiability conditions. By Lemma 4.2 of Dawid (1979), condition A4 implies exchangeability over S among the collection of trials, that is,

YaSXYaSX,R=1, for every a𝒜, (1)

and also implies exchangeability of participants in the collection of trials and the target population,

YaSXYaI(S=0)XYaRX. (2)

The first of these results will be useful to derive restrictions on the law of the observed data imposed by the identifiability conditions; the second result will be useful in obtaining identification results for causal quantities in the target population, using information from the collection of trials.

By Lemma 4.3 of Dawid (1979), the result in (1) and condition A2 imply conditional exchangeability of potential outcomes over (S,A) in the collection of trials, that is,

YaA(X,R=1,S)YaS(X,R=1)Ya(S,A) (X,R=1). (3)

Noting that Ya(S,A)(X,R=1) implies YaS(X,R=1,A=a), and using condition A1, we obtain

YS(X,R=1,A=a). (4)

Thus, among individuals participating in any trial (R=1), the observed outcome Y is independent of the trial random variable S, within treatment groups (A=a) and conditional on covariates X. Because this condition does not involve potential outcomes, it is testable using the observed data (e.g., using methods for comparing conditional densities). Furthermore, because {R=0}{R=0,S=0}, we also obtain that YS(X,R,A=a). The independence condition in (4) implies that for every a𝒜 and every x such that f(x,S=0)0,

EYX=x,S=1,A=a==EYX=x,S=m,A=a. (5)

This restriction is testable using widely available parametric or non-parametric approaches for modeling the conditional expectations (e.g., (Racine et al., 2006; Luedtke et al., 2019)). Such statistical testing may be a useful adjunct to assessments of the causal assumptions based on substantive knowledge.

In practical terms, violations of the restrictions in displays (4) or (5) can occur, for example, when there are unmeasured common causes of trial participation S and the outcome Y (failure of condition A4), when trial participation directly effects the outcomes (failure of condition A1), or when there exists outcome-relevant variation in the treatments evaluated across trials (again, failure of condition A1). Though formal statistical assessments of the restrictions are possible, they will often be challenging when X is high-dimensional. Furthermore, formal assessments of the restrictions in (4) and (5) will in general not be able to pinpoint the specific identifiability condition whose failure explains the violation of the observed data restrictions. Thus, considerable background knowledge and scientific judgment will be needed to decide on the appropriateness of combining information across trials to learn about the target population.

3.3. Identification of potential outcome means using the collection of trials

When conditions A1 through A5 hold, the potential outcome mean in the target population can be identified using covariate, treatment, and outcome data from the collection of trials, and baseline covariate data from the target population.

Theorem 1 (Identification of potential outcome means): Under conditions A1 through A5, the potential outcome mean in the target population under treatment a𝒜,EYaR=0, is identifiable by the observed data functional

ψaEEYX,R=1,A=aR=0, (6)

which can be equivalently expressed as

ψa=1PrR=0EIR=1,A=aYPrR=0XPrR=1XPrA=aX,R=1. (7)

The proof is given in Web Appendix A.

It also follows that, under the conditions of Theorem 1, the average treatment effect in the target population comparing treatments a and a in 𝒜 is also identifiable: EYaYaR= 0]=EYaR=0EYaR=0=ψ(a)ψa.

3.4. Identification under weaker conditions

Practitioners of meta-analysis believe that by combining evidence from multiple randomized trials it should be possible to draw inferences about target populations that are – in some vague sense – broader than the population underlying each randomized trial. To give a concrete example: suppose that we have data from two trials of the same medications, one recruiting individuals with mild disease and the other recruiting individuals with severe disease. Suppose also that our target population includes some individuals with mild and some with severe disease and that effectiveness varies by disease severity. Intuition suggests that if inferences are “transportable” from each of the two trials to each subset of the target population defined by disease severity, then it should be possible to draw some conclusions about treatment effectiveness in the target population. Yet, in this setting, condition A5 is grossly violated (e.g., individuals with mild disease have zero probability of participating in one of the two trials). In the following two sub-sections, we give identification results that rely on weaker identifiability conditions and capture the intuition that when combining evidence from multiple trials we can draw inferences about target populations that are broader than the population underlying each randomized trial.

3.4.1. Weakening the positivity conditions of Theorem 1.

Suppose that the independence conditions YaRX and YaA(X,R=1) hold, taken either as primitive conditions or as implications of identifiability conditions A2 and A4 (as was done for Theorem 1).

Furthermore, consider the following positivity conditions:

A3*. Positivity of the probability of treatment in the collection of trials: for each treatment a𝒜, if f(x,R=1)0, then Pr[A=aX=x,R=1]>0.

A5*. Positivity of the probability of participation in the collection of trials: if f(x,R=0)0, then Pr[R=1X=x]>0.

It is worth noting that these positivity conditions are weaker than the corresponding conditions A3 and A5, in the sense that A3* is implied by, but does not imply, A3; and A5* is implied by, but does not imply, A5. The difference between the two sets of assumptions is practically important: informally, condition A3 requires every treatment to be available in every trial and for every covariate pattern that can occur in that trial, and condition A5 requires every covariate pattern that can occur in the target population to be represented in every trial. In contrast, condition A3* requires every treatment to be available in the aggregated collection of trials and for any covariate pattern that can occur in the collection (but not necessarily in every trial); condition A5* requires every covariate pattern that can occur in the target population to occur in the aggregated collection of trials (but not necessarily in every trial).

Using these modified conditions, we obtain the following result (we give the formal arguments in Web Appendix A and Web Appendix B):

Theorem 2 (Identification of potential outcome means under weaker positivity conditions): If YaRX and YaA(X,R=1) and conditions A1, A3*, and A5* hold, then the potential outcome mean in the target population under treatment a𝒜,EYaR=0, is identifiable by ψ(a).

3.4.2. Identification under weaker overlap and exchangeability conditions.

We will now show that the exchangeability conditions for potential outcome means can also be relaxed, alongside the positivity conditions; to do so, we need to introduce some additional notation in order to keep track of the subsets of the collection of trials where different covariate patterns can occur.

Let 𝒳j for j{0,1,,m} denote the support of the random vector X in the subset of the population with S=j. That is, 𝒳j{x:f(xS=j)>0}. For each covariate pattern X=x that can occur in the collection of trials, that is, for each xs𝒮𝒳s, define 𝒮x as the subset of trials in 𝒮 such that x belongs in their support. That is, 𝒮xs:s𝒮,x𝒳s. Intuitively, 𝒮x denotes the subset of trials in the collection 𝒮 where the covariate pattern X=x can occur. Using this additional notation, consider the following identifiability conditions:

A4. Exchangeability in mean over: For every x such that f(x,S=0)0 and every s𝒮x,EYaX=x,S=0=EYaX=x,S=s.

A5. Overlap of the collection 𝒮 with the target population: s𝒮𝒳s𝒳0=𝒳0. Using the above two conditions in the place of conditions A4 and A5 still allows for identification of the potential outcome means, using the following result.

Theorem 3 (Identification of potential outcome means under weaker conditions): Under identifiability conditions A1 through A3, A4, and A5, for every a𝒜,EYaR=0 is identifiable by

ϕa EYX=x,IS𝒮x=1,A=afxS=0dx.

The proof is given in Web Appendix B.

In simple cases, as in our hypothetical example of effect modification by disease severity where the positivity violation was due to a single binary covariate, the above result simply suggests that the conditional mean of the outcome Y can be modeled separately in each group of trials defined by that binary covariate. For more complicated violations of positivity condition A5, especially those involving multiple continuous covariates, the above result is mostly useful, not for suggesting a particular modeling strategy, but for showing that some degree of interpolation when modeling the conditional expectation may be relatively benign, when there is adequate overlap between the collection of trials and the target population, in the sense of condition A5.

3.5. Identification under exchangeability in effect measure

Up to this point, we have examined conditions that are sufficient for identifying both potential outcome means and average treatment effects. If interest is restricted to average treatment effects, identification is possible under a weaker condition of exchangeability in effect measure over S given baseline covariates, essentially, a requirement that conditional average treatment effects, but not necessarily potential outcome means, can be transported from each trial in the collection S to the target population; see Dahabreh et al. (2018, 2020) for a similar argument in the context of transporting inferences from a single trial. Under this weaker condition, however, the individual potential outcome means are not identifiable. Specifically, consider the following condition:

A4. Exchangeability in effect measure over S: for every pair of treatments a and a, with a𝒜 and a𝒜, for every s𝒮, and for every x such that f(x,S=s)0, we have EYaYaX=x,S=s=EYaYaX=x,R=0.

It is easy to see that this condition, combined with conditions A1 through A3, and A5, implies that the observed data trial-specific conditional mean difference comparing treatments a𝒜 and a𝒜,E[YX,S=s,A=a]EYX,S=s,A=a does not vary over s𝒮 for all X values with positive density in the target population. Using this implication, we now give an identification result for the average treatment effect under the weaker condition of exchangeability in effect measure.

Theorem 4 (Identification under exchangeability in effect measure): Under conditions A1 through A3, A4, and A5, the average treatment effect in the target population comparing treatments a𝒜 and a𝒜,EYaYaR=0, is identifiable by the observed data functional ρa,aEτa,a;XR=0, where τa,a;XE[YX,S=s,A=a]EYX,S=s,A=a does not vary over s𝒮. Furthermore, ρa,a can be re-expressed as ρa,a=1Pr[R=0]Ewa,a;R,X,S,AY, where

wa,a;R,X,S,AIR=1,A=aPrA=aX,S,R=1IR=1,A=aPrA=aX,S,R=1PrR=0XPrR=1X.

The proof is given in Web Appenidx C. Note in passing that positivity condition A5 in Theorem 4 can be relaxed in a way analogous to what was done in Theorem 3.

Because the potential outcome means EYaR=0 in the target population often are of inherent interest, in the rest of this paper we focus on estimating the identifying functionals ψ(a), for a𝒜, following Theorems 1 and 2.

4. Remarks on the sampling model

We now revisit the assumed sampling scheme that we described in Section 2 and introduce some notation to help us distinguish between the population sampling model, where data are obtained by simple random sampling from a common super-population, and a biased sampling model, where data are obtained by stratified random sampling from the target population and the populations underlying the trials, with unknown and possibly variable sampling fractions.

Under a nonparametric model for the observed data, the density of the law of the observable data O=(X,S,R,A,Y), using p to generically denote densities, can be written as p(r,x,s,a,y)=p(r)p(xr)p(sx,r)p(ar,x,s)p(yr,x,s,a). By definition, S contains all the information contained in R, which implies that p(ar,s,x)=p(as,x). Furthermore, by (4), p(yr=1,x,a)=p(yr=1,s,x,a); and the equivalence {R=0,S=0}{S=0} gives p(yr=0,x,a)=p(yr=0,s,x,a); thus, p(yr,x,a)=p(yr,s,x,a). We conclude that, under simple random sampling, the density of the observable data would be

pr,x,s,a,y=prpxrpsx,rpax,spyr,x,a.

This density can be viewed as reflecting a population sampling model.

The data collection approach described in Section 2, however, induces a biased sampling model (Bickel et al., 1993). What we mean here is that the sampling fraction from the sub-population underlying each randomized trial (S=1,,m) and the target population sample (R=S=0) is not under the control of the investigators (and, typically, unknown to them) and reflects the particular circumstances (e.g., recruiting practices) of how sampling was conducted (e.g., convenience sampling is typical in randomized trials). Thus, in the data, the ratios njn, for j{0,1,,m}, do not reflect the population probabilities of belonging to the subset of the population with S=j, because the sampling fraction from each subset is unknown (and possibly variable between subsets). As a technical condition, we require that as n,njnπj>0, for j{0,1,,m}. Nevertheless, under the biased sampling model, the limiting values, πj, are not necessarily equal to the super-population probabilities under the population sampling model.

Under this more plausible sampling model, the density of the observable data can be written as

q(r,x,s,a,y)=q(r)q(xr)q(sx,r)p(ax,s)p(yr,x,a)=q(r){q(xr=1)}r{q(xr=0)}1rq(sx,r)p(ax,s)p(yr,x,a)=q(r){q(xr=1)}r{pxr=0}1rqsx,rpax,spyr,x,a.

Here, q(r)p(r) and q(sx,r)p(sx,r) to reflect the fact that the sampling fractions for participants in the trials and members of the target population are not equal to the super-population probabilities under the population sampling model, but instead depend on the complex processes of randomized trial design and conduct (i.e., the biased sampling model). For the same reason, q(xr)p(xr), because in general we do not expect q(xr=1)=p(xr=1); the distribution of X in the collection of trials depends on the sampling from the different population underlying each trial. Nevertheless, we expect p(xr=0)=q(xr=0) because we take a simple random sample from the target population under the biased sampling model. In contrast, the terms p(ax,s) and p(ya,x,r) are the same in both densities, reflecting the property of stratified (by S) random sampling of the population underlying each trial or the target population, and the independence condition in (4).

The functional ψ(a) in Theorems 1 and 2 only depends on components of p(r,x,s,a,y) that are also identifiable under the distribution q(r,s,x,a,y) induced by the biased sampling. Specifically, using the representation in (6), ψ(a) depends on p(yr=1,x,a), and p(xr=0)=p(xs=0), both of which are identifiable under the biased sampling model.

5. Estimation and inference

We now turn our attention to the estimation of the functional ψ(a), for a𝒜. The results presented in this section apply to data obtained under the population sampling model p(r,x,s,a,y) and the biased sampling model q(r,x,s,a,y), because ψ(a) is identifiable under both. In fact, the results in Breslow et al. (2000) imply that influence functions for ψ(a) under sampling from q(r,x,s,a,y) are equivalent to those under sampling from p(r,x,s,a,y), but with densities from q(r,x,s,a,y) replacing those under p(r,x,s,a,y); see Kennedy et al. (2015) for a similar argument in the context of matched cohort studies and Dahabreh et al. (2020) in the context of transporting inferences from a single trial. In the remainder of the paper, we assume that analysts will be working under q(r,x,s,a,y), because the biased sampling model is more realistic for applied meta-analyses.

5.1. Proposed estimator

In Web Appendix D we show that the first-order influence function (Bickel et al., 1993) of ψ(a), for a𝒜, under the nonparametric model for the observable data, is

Ψq01(a)=1πq0I(R=1,A=a)Prq0[R=0X]Prq0[R=1X]Prq0[A=aX,R=1]YEq0[YX,R=1,A=a]+I(R=0)Eq0[YX,R=1,A=a]ψq0(a),

where πq0=Prq0[R=0] and the subscript q0 denotes that all quantities are evaluated at the “true” data law. Specifically, we show that Ψq01(a) satisfies ψqt(a)tt=0=EΨq01(a)u(O), where u(O) denotes the score of the observable data O=(R,X,S,A,Y) and the left hand side of the above equation is the pathwise derivative of the target parameter ψ(a). Theorem 4.4 in (Tsiatis, 2007) shows that Ψq01(a) lies in the tangent set; it follows (see, e.g., (Van der Vaart, 2000), page 363) that Ψq01(a) is the efficient influence function under the nonparametric model (it is in fact the unique influence function under that model). For a more thorough discussion of semiparametric efficiency theory and precise definitions of pathwise derivatives and the tangent space we refer to Chapter 25 in (Van der Vaart, 2000). Furthermore, in Web Appendix D we show that Ψq01(a) is the efficient influence function under semiparametric models incorporating the restriction YS(X,A=a,R) or where the probability of treatment conditional on covariates and trial participation status is known (e.g., if all trials have the same randomization probabilities).

The influence function above suggests the estimator

ψ^aug(a)=1nπ^i=1n{I(Ri=1,Ai=a)1p^(Xi)p^(Xi)e^a(Xi){Yig^a(Xi)}+I(Ri=0)g^a(Xi)}, (8)

where π^=n1i=1nIRi=0,g^a(X) is an estimator for E[YX,R=1,A=a],e^a(X) is an estimator for Pr[A=aX,R=1], and p^(X) is an estimator for Pr[R=1X]. Note that ψ^aug(a) involves data on (R=1,X,A,Y) from trial participants and data on (R=0,X) from the sample of the target population; thus, treatment and outcome data from the target population are not necessary. Furthermore, it is natural to estimate the average treatment effect in the target population for comparing treatments a and a in 𝒜, that is, EYaYaR=0, using the contrast estimator δ^a,a=ψ^aug(a)ψ^auga.

In practical applications, analysts may be able to improve the performance of the estimator by normalizing the weights so that their sum is equal to the total number of observations in the target population sample (Dahabreh et al., 2018, 2019, 2020). This normalization, which was originally proposed for survey analyses (Hájek, 1971), may be particularly useful when 1p^(X)p^(X)e^a(X) is highly variable over the observations in the trials (Robins et al., 2007).

Asymptotic properties of estimators for potential outcome means: Let ga*(X),ea*(X), and p*(X) denote the asymptotic limits (assumed to exist) of g^a(X),e^a(X), and p^(X), respectively. Finally, define γ^=1ni=1nIRi=01=π^1 as an estimator for γ*=Pr[R=0]1.

For general functions γ,ga(X),ea(X), and p(X) define Hγ,ga(X),ea(X),p(X)=γI(R=0)ga(X)+I(R=1,A=a)1p(X)p(X)ea(X)Yga(X). Using notation from van der Vaart and Wellner (1996), define Pn(v(W))=n1i=1nvWi and Gn(v(W))=nPn(v(W))E[v(W)], for some function v and a random variable W. Using this notation, ψ^aug(a)=PnHγ^,g^a(X),e^a(X),p^(X).

To establish asymptotic properties of ψ^(a), we make the following assumptions:

  1. (i) The sequence Hγ^,g^a(X),e^a(X),p^(X) and its limit Hγ*,ga*(X),ea*(X),p*(X) fall in a Donsker class (van der Vaart and Wellner, 1996).

  2. (ii) Hγ^,g^a(X),e^a(X),p^(X)Hγ*,ga*(X),ea*(X),p*(X)2a.s.0.

  3. (iii) EHγ*,ga*(X),ea*(X),p*(X)2<.

  4. (iv) At least one of the following two assumptions holds: (a) p^(X)a.s.p*(X)=Pr[R=1X] and e^a(X)a.s.ea*(X)=Pr[A=aX,R=1]; or (b)g^a(X)a.s.ga*(X)=E[YX,R=1,A=a],p*(X)ε, and ea*(X)ε a.s. for some ε>0.

The following Theorem gives the asymptotic properties of ψ^aug(a); a detailed proof is given in Web Appendix E.

Theorem 5: If assumptions (i) through (iv) hold, then

(1) ψ^aug(a)a.s.ψ(a); and

(2) ψ^aug(a) has the asymptotic representation

nψ^aug(a)ψ(a)=GnHγ*,ga*(X),ea*(X),p*(X)+Rem+oP1, (9)

where GnHγ*,ga*(X),ea*(X),p*(X) is asymptotically normal and

RemnOPPr[R=1X]p^(X)2+Pr[A=aX,R=1]e^a(X)2×g^a(X)E[YX,R=1,A=a]2. (10)

If g^a(X),p^(X),e^a(X) and ga*(X),p*(X),ea*(X) are Donsker; p*(X) and ea*(X) are uniformly bounded away from zero – assumption (iv); and ga*(X) and Y are uniformly bounded, then Hγ^,g^a(X),e^a(X),p^(X) and its limit are Donsker (Kosorok, 2008).

Assumptions (i), (ii), and (iii) are standard assumptions used to show asymptotic normality of M-estimators (Van der Vaart, 2000). Assumption (i) restricts the flexibility of the models used to estimate the nuisance parameters (and their corresponding limits). But, it still covers a wide range of commonly used estimators such as parametric Lipschitz classes and VC classes (van der Vaart and Wellner, 1996). For data-adaptive estimators, Donsker assumptions can be relaxed using sample splitting (Robins et al., 2008).

Assumption (iv) indicates that the estimator ψ^aug(a) is doubly robust, in the sense that it converges almost surely to ψ(a) when either (1) the model for the conditional outcome mean E[YX,R=1,A=a] is correctly specified, so that g^a(X) converges to the true conditional expectations almost surely; or (2) the model for the probability of participation in any trial Pr[R=1X] and the model for the probability of treatment assignment Pr[A=aX,R=1] are correctly specified, so that p^(X) and e^a(X) converge to the true conditional probabilities almost surely. When the treatment assignment is the same for all the trials Pr[A=aX,R=1] is known and when the treatment assignment only depends on a few categorical covariates Pr[A=aX,R=1] is easy to estimate consistently. But, when the treatment assignment mechanism depends on continuous covariates, as would be the case in pooled analyses of observational studies, estimating Pr[A=aX,R=1] may be more challenging.

The asymptotic representation given by (9) gives several useful insights into how estimation of the nuisance parameters affect the asymptotic distribution of ψ^aug(a). By the central limit theorem, the first term on the right hand side of (9) is asymptotically normal. Hence, the asymptotic distribution of the estimator relies on the behavior of the term given in equation (10). If both the pair of nuisance parameters p^(X) and g^a(X) and the pair of nuisance parameters e^a(X) and g^a(X) converge combined at a rate fast enough such that term (10) is oP(1), then the estimator is n-consistent, asymptotically normal, and has asymptotic variance equal to the variance of the efficient influence function.

The asymptotic representation gives the rate of convergence result

ψ^aug(a)ψ(a)=OP1n+Pr[R=1X]p^(X)2+Pr[A=aX,R=1]e^a(X)2×g^a(X)E[YX,R=1,A=a]2

Thus, if both the combined rate of convergence of p^(X) and g^a(X) and the combined rate of convergence of e^a(X) and g^a(X) are at least n, then ψaug(a) is n convergent.

5.2. Inference

To construct Wald-style confidence intervals for ψ(a), when using parametric models, we can easily obtain the sandwich estimator (Stefanski and Boos, 2002) of the sampling variance for ψ^(a). Alternatively, we can use the non-parametric bootstrap. If the remainder term in equation (9) of Theorem 5 is oP(1), then the estimator ψ^aug(a) is asymptotically normally distributed. The asymptotic normality combined with Assumption (iii) ensure that the confidence intervals calculated using the non-parametric bootstrap asymptotically have the correct coverage rate.

6. Simulation study

We performed simulation studies to evaluate the finite sample performance of the augmented estimators described in the previous section and compare them with alternative approaches.

6.1. Data generation

The data generation process involved six steps: generation of covariates, selection for trial participation, sampling of individuals from the target population, allocation of trial participants to specific trials, random treatment assignment, and potential/observed outcomes. Simulations contained either n=10,000 or n=100,000 individuals, including both trial participants and the sample of the target population.

  1. Covariates: Three covariates X1,X2,X3 for each individual in the target population were drawn from a mean-zero multivariate normal distribution with all marginal variances equal to 1 and all pairwise correlations equal to 0.5.

  2. Selection for trial participation: We considered three trials, 𝒮={1,2,3}, a reasonable number given the requirement that all trials have examined the same treatments. We examined scenarios where the total number of trial participants, s=13ns, was 1,000, 2,000, or 5,000. We “selected” observations for participation in any trial using a logistic-linear model, R~Bernoulli(Pr[R=1X]) with Pr[R=1X]=expβXT1+expβXT,X=1,X1,,X3, β=β0,ln(2),ln(2),ln(2), where we solved for β0 to result (on average) in the desired total number of trial participants given the total cohort sample size (exact numerical values for β0 are available for all scenarios in the code to reproduce the simulations in Web Appendix H).

  3. Sampling of individuals from the target population: We used baseline covariate data from all remaining non-randomized individuals in the sample, n0=ns=13ns; this corresponds to taking a census of the non-randomized individuals in the simulated cohort and treating them as the sample from the target population.

  4. Allocation of trial participants to specific trials: We allocated trial participants (R=1) to one of the three randomized trials in 𝒮 using a multinomial logistic model, S(X,R=1)~Multinomialp1,p2,p3;s=13ns, with p1=Pr[S=1X,R=1]=1p2p3,p2=Pr[S=2X,R=1]=eξXT1+eξXT+eζXT, and p3=Pr[S=3X,R=1]=eζXT1+eξXT+eζXT, where ξ=ξ0,ln(1.5),ln(1.5),ln(1.5) and ζ=ζ0,ln(0.75),ln(0.75),ln(0.75). We evaluated a scenario in which the trials had the same sample size and one where the sample size varied across trials. We used Monte Carlo methods to obtain intercepts ξ0 and ζ0 that resulted in approximately equal-sized trials or in unequal-sized trials with a 4:2:1 ratio of samples sizes (exact numerical values for ξ0 and ζ0 are available for all scenarios in the code to reproduce the simulations) (Robertson et al., 2021).

  5. Random treatment assignment: We generated an indicator of unconditionally randomized treatment assignment, A, among randomized individuals. In one scenario the treatment assignment mechanism was marginally randomized and constant across trials, A~Bernoulli(Pr[A=1S=s]) with Pr[A=1S=s]=1/2. In the second scenario, the treatment assignment mechanism varied across trials and was marginally randomized, with probabilities Pr[A=1S=1]=1/2,Pr[A=1S=2]=1/3, and Pr[A=1S=3]=2/3.

  6. Outcomes: We generated potential outcomes as Ya=θaXT+ϵa, for a{0,1},θ0=(1.5, 1, 1, 1), and θ1=(0.5,1,1,1). In all simulations, ϵa had an independent standard normal distribution for a=0,1. We generated observed outcomes under consistency, such that Y=AY1+(1A)Y0.

6.2. Methods implemented and comparisons

Estimators:

In each simulated dataset, we applied the estimators from Section 5. Heuristically, the identification results in Theorems 1 and 2 suggest two non-augmented (and non-doubly robust) estimators: the g-formula estimator,

ψ^g(a)={i=1nI(Ri=0)}1i=1nI(Ri=0)g^a(Xi),

and the weighting estimator

ψ^w(a)={i=1nI(Ri=0)}1i=1nI(Ri=1,Ai=a)1p^(Xi)p^(Xi)e^a(Xi)Yi,

where g^a(X),p^(X), and e^a(X) were previously defined. The estimators ψ^g(a) and ψ^w(a) can be viewed as special cases of the augmented estimator ψ^aug(a): we obtain ψ^g(a) by identically setting p^(X) to one, and we obtain ψ^w(a) by identically setting g^a(X) to zero, in equation (8). It follows that ψ^g(a) and ψ^w(a) are not robust to misspecification of the models used to estimate g^a(X), or p^(X)×e^a(X), respectively (Dahabreh et al., 2020).

Model specification:

All working models included main effects for the observable covariates Xj,j=1,2,3. Models for the probability of participation in the trial and models for the probability of treatment included main effects for all covariates. We used logistic regression to model the probability of participating in any trial (R=1 vs 0). In our simulation scenarios, when the probability of treatment (the randomization ratio) varies across trials, a logistic regression model of treatment with main effects of the covariates (and no adjustment for trial S) is not correctly specified. To avoid misspecification of the model for the probability of treatment, we used the fact that Pr[A=aX,R=1]=s=1mPr[A=aX,S=s]Pr[S=sX,R=1], and modeled the probability of treatment in each trial Pr[A=aX,S=s] using logistic regression (separately in each trial) and the probability of participation in a particular trial among the collection of trials Pr[S=sX,R=1] using a multinomial logistic regression model (in the collection of trials). We refer to this approach as “averaging the trial-specific treatment probabilities.” For comparison, in the Appendix, we report results using a (misspecified) logistic regression model with main effects only. In our simulation setup, when the randomization ratio varies across trials, use of this model is expected to have some bias when using ψ^w(a), should not affect ψ^g(a) at all, and should only affect the precision, but not induce bias for ψ^aug(a), when the outcome model is correctly specified. Outcome models were fit separately in each treatment group using linear regression to allow effect modification by all covariates.

Comparisons:

We compared the bias and variance of estimators over 10,000 runs for each scenario and each treatment.

6.3. Simulation results

Tables 1 and 2 present selected results from our simulation study for scenarios with n=10,000 and total trial sample size, s=13ns, of 2000 individuals, when averaging the trial-specific treatment probabilities; complete simulation results are presented in Web Appendix F, including results for modeling the probability of treatment using logistic regression across all trials. Bias estimates for ψ^aug(a) and ψ^g(a) were near-zero for all estimators, regardless of how the probability of treatment was modeled; ψ^w(a) also had near-zero bias except when the randomization ratio varied across trials and the probability of treatment was modeled using logistic regression fit across all trials (see Appendix Table S3). The low bias of the g-formula estimator ψ^g(a) and the weighting estimator ψ^w(a) (when the model for the probability of treatment was correctly specified) indirectly verify the double robustness property of the augmented estimator ψ^aug(a); this is also supported by the near-zero bias of ψ^aug(a) even when the randomization ratio varied across trials and the model for the probability of treatment was misspecified (Appendix Table S3). Across all scenarios, the sampling variance of the augmented estimator ψ^aug(a) was larger than that of the outcome model-based estimator ψ^g(a), but substantially smaller than that of the weighting estimator ψ^w(a).

Table 1:

Bias estimates based on 10,000 simulation runs; selected sample size scenarios; the probability of treatment in the collection of trials was estimated by averaging the trial-specific treatment probabilities.

a n s=13ns Balanced TxAM varies ψ^aug(a) ψ^g(a) ψ^w(a)
1 10000 2000 Yes No 0.0022 0.0003 −0.0011
1 10000 2000 Yes Yes 0.0004 −0.0003 −0.0026
1 10000 2000 No No −0.0006 −0.0001 −0.0082
1 10000 2000 No Yes 0.0011 0.0002 −0.0067
1 100000 2000 Yes No −0.0003 0.0002 −0.0104
1 100000 2000 Yes Yes −0.0002 0.0008 0.0102
1 100000 2000 No No −0.0012 −0.0002 −0.0074
1 100000 2000 No Yes 0.0009 0.0013 0.0039
0 10000 2000 Yes No 0.0001 0.0008 0.0083
0 10000 2000 Yes Yes 0.0006 0.0004 0.0060
0 10000 2000 No No 0.0010 −0.0003 0.0002
0 10000 2000 No Yes 0.0002 0.0005 0.0106
0 100000 2000 Yes No 0.0013 0.0005 0.0016
0 100000 2000 Yes Yes −0.0009 −0.0000 0.0034
0 100000 2000 No No −0.0009 −0.0003 0.0064
0 100000 2000 No Yes 0.0012 0.0006 0.0109

In the column titled Balanced, Yes denotes scenarios in which the trials had on average equal sample sizes; No denotes scenarios with unequal trial sample sizes. In the column titled TxAM varies, Yes denotes scenarios in which the treatment assignment mechanism varied across trials; No denotes scenarios in which the mechanism did not vary.

Table 2:

Variance estimates based on 10,000 simulation runs; the probability of treatment in the collection of trials was estimated by averaging the trial-specific treatment probabilities.

a n s=13ns Balanced TxAM varies ψ^aug(a) ψ^g(a) ψ^w(a)
1 10000 2000 Yes No 0.0105 0.0038 0.2893
1 10000 2000 Yes Yes 0.0084 0.0035 0.1945
1 10000 2000 No No 0.0101 0.0038 0.2167
1 10000 2000 No Yes 0.0088 0.0038 0.1925
1 100000 2000 Yes No 0.0144 0.0039 0.3316
1 100000 2000 Yes Yes 0.0120 0.0035 0.3610
1 100000 2000 No No 0.0143 0.0039 0.4028
1 100000 2000 No Yes 0.0128 0.0037 0.2925
0 10000 2000 Yes No 0.0103 0.0039 0.1165
0 10000 2000 Yes Yes 0.0131 0.0042 0.1509
0 10000 2000 No No 0.0101 0.0039 0.1136
0 10000 2000 No Yes 0.0118 0.0040 0.1551
0 100000 2000 Yes No 0.0151 0.0039 0.1873
0 100000 2000 Yes Yes 0.0198 0.0043 0.2699
0 100000 2000 No No 0.0146 0.0038 0.1663
0 100000 2000 No Yes 0.0218 0.0040 0.3243

In the column titled Balanced, Yes denotes scenarios in which the trials had on average equal sample sizes; No denotes scenarios with unequal trial sample sizes. In the column titled TxAM varies, Yes denotes scenarios in which the treatment assignment mechanism varied across trials; No denotes scenarios in which the mechanism did not vary.

7. Application of the methods to the HALT-C trial

7.1. Using the HALT-C trial to emulate meta-analyses

The HALT-C trial data:

The HALT-C trial (Di Bisceglie et al., 2008) enrolled 1050 patients with chronic hepatitis C and advanced fibrosis who had not responded to previous therapy and randomized them to treatment with peginterferon alfa-2a (a=1) versus no treatment (a=0). Patients were enrolled at 10 research centers and followed up every 3 months after randomization. Here, we used the secondary outcome of platelet count at 9 months of follow-up as the outcome of interest; we report all outcome measurements as platelets×103/ml. We used the following baseline covariates: baseline platelet count, age, sex, previous use of pegylated interferon, race, white blood cell count, history of injected recreational drugs, ever received a transfusion, body mass index, creatine levels, smoking status, previous use of combination therapy (interferon and ribavirin), diabetes, serum ferritin, hemoglobin, aspartate aminotransferase levels, ultrasound evidence of splenomegaly, and ever drinking alcohol. For simplicity, we restricted our analyses to patients with complete baseline covariate and outcome data (n=948).

Using the HALT-C trial to emulate a meta-analysis and evaluate the proposed methods:

We treated observations from the largest center with complete data in the HALT-C trial, with sample size n0=199, as a sample from the target population, S=R=0. We then used the data from the remaining 9 centers as a collection of “trials,” 𝒮={1,,9}, with a total sample size of j=19nj=749. Appendix Table S5 summarizes covariate information, stratified by R; Appendix Tables S6 and S7 summarize covariate information stratified by S. Because HALT-C used 1:1 randomized allocation, all trials had the same treatment assignment mechanism. Our goal was to transport causal inferences from the nine trials to the population represented by the target center. We were able to use the randomization in R=0 to empirically benchmark the methods we propose using as a reference standard the unadjusted estimate of the average treatment effect in the target population. In view of how we configured the data for this illustrative analysis, our results should not be clinically interpreted.

Methods implemented and comparisons:

We implemented the augmented estimator ψ^aug(a), along with the g-formula estimator ψ^g(a) and the weighting estimator ψ^w(a), defined in Section 6.2. These estimators use covariate, treatment, and outcome data from the collection of trials 𝒮 but only covariate data from the sample of the target population R=0. In these analyses we specified linear regression models for the conditional expectation of the outcome in each treatment group, and logistic regression models for the probability of trial participation (in any trial) and the probability of treatment among randomized individuals. All models included main effects for the baseline covariates listed above. For comparison, we also used an unadjusted linear regression analysis to estimate the potential outcome means and average treatment effects in R=0 only; this analysis is justified by randomization and uses treatment and outcome data from the target population.

To provide an assessment of the observed data independence condition in equation (5), we used analysis of covariance (ANCOVA) to compare a linear regression model that included the main effects of baseline covariates, treatment, and all possible product terms between baseline covariates and treatment, against a linear regression model that additionally included the main effect of the trial indicators and all product terms between baseline covariates, treatment, and the trial indicators.

7.2. Results in the emulated meta-analysis

Table 3 summarizes results from the emulation of meta-analyses using HALT-C trial data. All transportability estimators, which only use baseline covariate data from R=0 and covariate, treatment, and outcome data from R=1, produced estimates that were very similar to the benchmark estimates from the unadjusted estimator that uses treatment and outcome data from the target center R=0. The three transportability estimators produced similar point estimates, which suggests that the transported inferences are not driven by model specification choices. The three transportability estimators also produced estimates similar to those of the benchmark unadjusted estimator in R=0, which suggests (but does not prove) that the identifiability conditions needed for the different transportability estimators hold, at least approximately, and that model specification is approximately correct. The ANCOVA p-value of 0.425 did not indicate gross violations of the observed data implication in display (5).

Table 3:

Results from analyses using the HALT-C trial data.

Estimator a=1 a=0 Mean difference
R=0 (benchmark) 121.6 (111.0, 133.3) 164.4 (151.2, 178.1) −42.8 (−60.2, −25.7)
ψ^aug(a) 124.5 (116.3, 133.5) 167.1 (157.3, 177.7) −42.6 (−51.2, −34.4)
ψ^g(a) 124.6 (116.6, 133.0) 167.1 (157.3, 177.3) −42.5 (−50.5, −34.9)
ψ^w(a) 123.2 (113.6, 136.0) 165.9 (152.6, 182.9) −42.7 (−60.1, −26.2)

R=0 denotes analyses using data only from the sample of the target population. Numbers in parentheses are 95% quantile-based confidence intervals from 10,000 bootstrap samples; 95% normal confidence intervals from the sandwich estimator were similar.

8. Discussion

We have described methods to transport causal inferences from a collection of randomized trials to a target population in which baseline covariate data are available but no experimentation has been conducted. Our results provide a solution to one of the leading problems in the field of “evidence synthesis”: how to use a body of evidence consisting of multiple randomized trials to draw causal inferences about the effects of the interventions investigated in the trials for a new target population of substantive interest.

Traditional approaches to evidence synthesis, including meta-analysis, focus on modeling aspects of the distribution of study effects or producing statistically optimal summaries of available data (Higgins et al., 2009; Rice et al., 2018). In general, such analyses do not produce causally interpretable estimates of the effect of well-defined interventions on any target population because the contributions of individual studies to the meta-analysis summary are weighted by the precision of the study-specific estimates, without considering the relevance of the studies to any target population or the strong assumptions needed to combine information across studies to estimate treatment effects.

In contrast, our approach explicitly targets a well-defined population chosen on scientific or policy grounds, which may be different from the populations underlying the trials. In our experience, policy-makers who use evidence syntheses to inform their decisions are not interested in the populations represented by the completed trials and almost always have a different target population in mind. Thus, the statistical methods we propose can be viewed as a form of causally interpretable meta-analysis, consistent with the conceptual framework outlined in (Sobel et al., 2017). An important aspect of our approach is the specification of the target population and the use of sample data from that population to estimate treatment effects. In practical applications, obtaining data from a target population can be challenging (Barker et al., 2021), but we expect that challenge will become easier to address with increasing availability of administrative and registry data from populations of clinical or policy relevance. Another important aspect of our approach is the explicit statement of the assumptions needed for the causal interpretation of estimates produced by the estimators we propose. Though the assumptions are fairly strong even for the simple case we study here (point treatments with complete adherence and outcomes assessed at the end of the study), our approach naturally connect with the broad literature on causal inference and thus extensions to address time varying treatments, incomplete adherence, and longitudinal or failure-time outcomes are possible. We believe that making assumptions explicit allows reasoning about them and can be the basis of future work on sensitivity analyses. Future work may also address differential covariate measurement error in the randomized trials and the sample of the target population, systematically missing data (e.g., when some covariates are not collected in some trials), and variable selection methods (e.g., methods incorporating the independence in display (4)).

Our methods can are a generalization of methods for extending inferences from a single trial to a target population (e.g., Cole and Stuart (2010); Dahabreh et al. (2018, 2020); Rudolph and van der Laan (2017)). They also relate to the growing literature on matching-adjusted indirect comparison (e.g., Signorovitch et al. (2010, 2012); Cheng et al. (2020); Jackson et al. (2021)). This literature has focused on the common situation where data on the target population are only available in summary form (e.g., estimates of the first moments of the covariates in X), typically does not use explicit causal (counterfactual) notation (with some recent exceptions, e.g., Cheng et al. (2020)), relies heavily on parametric assumptions (or other investigator-imposed constraints), and only applies to a single source trial (not multiple trials in meta-analysis) (Jackson et al., 2021). Our identification results suggest obvious extensions of matching-adjusted indirect comparison methods to the multiple trial case, provided investigators are willing to make the necessary causal assumptions. Conversely, methods from the literature on matching-adjusted indirect comparison and related matching methods (Jackson et al., 2021) can be used to relax the requirement of individual participant data from the target population inherent in our approach (at the cost of reliance on additional parametric assumptions or reduced efficiency). It may also be interesting to combine our approach with recent extensions of meta-regression methods to incorporate matching-adjusted indirect comparisons (Phillippo et al., 2020).

We caution that our methods will be most useful when the available trials are quite similar (e.g., in terms of treatments, outcomes, and follow-up protocols), as may be the case in prospective research programs investigating a particular intervention or close replication efforts. It should come as no surprise that a strong degree of similarity is needed in order for a “pooled” analysis to produce estimates with a clear causal interpretation with respect to a particular target population; after all, our approach focuses on differences in participant characteristics across the populations underlying the trials and the target population, not variations in treatment. Extensions of our approach to address variations in the randomized treatments, as well as non-randomized co-interventions, would be valuable. We suggest, however, that summarizing bodies of evidence much larger and diverse than a few closely related studies, particularly when individual participant data are only available from a minority of sources, is best undertaken with a descriptive or predictive attitude, focusing on modeling the response surface of study effects (Rubin, 1992), rather than on obtaining a single causally interpretable estimate. That said, we sincerely hope that future work will find ways to improve on this rather unambitious view.

Supplementary Material

Supporting Information

Acknowledgments

This work was supported in part by Patient-Centered Outcomes Research Institute (PCORI) awards ME-1306-03758, ME-1502-27794, and ME-2019C3-17875; National Institutes of Health (NIH) grant R37 AI102634; and Agency for Healthcare Research and Quality (AHRQ) National Research Service Award T32AGHS00001. The content is solely the responsibility of the authors and does not necessarily represent the official views of PCORI, its Board of Governors, the PCORI Methodology Committee, NIH, or AHRQ.

The data analyses in our paper used HALT-C data obtained from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Repositories (https://repository.niddk.nih.gov/studies/halt-c/; last accessed: January 18, 2021). This paper does not necessarily reflect the opinions or views of the HALT-C investigators, the NIDDK Central Repositories, or the NIDDK.

Footnotes

Supplementary Materials

Web Appendices A through G are available with this paper at the Biometrics website on Wiley Online Library. A copy of the Web Appendices, current as of February 4, 2022, is also available here: web link.

Code to reproduce our simulations and the HALT-C analyses, as well as an artificial data-set to illustrate the application of the methods are provided on GitHub: https://github.com/serobertson/EfficientCausalMetaAnalysis.

References

  1. Barker DH, Dahabreh IJ, Steingrimsson JA, Houck C, Donenberg G, DiClemente R, and Brown LK (2021). Causally interpretable meta-analysis: Application in adolescent hiv prevention. Prevention Science pages 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bickel PJ, Klaassen CA, Wellner JA, and Ritov Y (1993). Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press; Baltimore. [Google Scholar]
  3. Breslow NE, Robins JM, Wellner JA, et al. (2000). On the semi-parametric efficiency of logistic regression under case-control sampling. Bernoulli 6, 447–455. [Google Scholar]
  4. Breslow NE and Wellner JA (2007). Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scandinavian Journal of Statistics 34, 86–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Cheng D, Ayyagari R, and Signorovitch J (2020). The statistical performance of matching-adjusted indirect comparisons: Estimating treatment effects with aggregate external control data. The Annals of Applied Statistics 14, 1806–1833. [Google Scholar]
  6. Cole SR and Stuart EA (2010). Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial. American Journal of Epidemiology 172, 107–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cooper H, Hedges LV, and Valentine JC (2009). The handbook of research synthesis and meta-analysis. Russell Sage Foundation. [Google Scholar]
  8. Dahabreh I, Petito L, Robertson S, Hernán M, and Steingrimsson J (2020). Toward causally interpretable meta-analysis: Transporting inferences from multiple randomized trials to a new target population. Epidemiology (Cambridge, Mass.). [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dahabreh IJ and Hernán MA (2019). Extending inferences from a randomized trial to a target population. European Journal of Epidemiology 34, 719–722. [DOI] [PubMed] [Google Scholar]
  10. Dahabreh IJ, Robertson SE, and Hernán MA (2019). On the relation between g-formula and inverse probability weighting estimators for generalizing trial results. Epidemiology (Cambridge, Mass.) 30, 807–812. [DOI] [PubMed] [Google Scholar]
  11. Dahabreh IJ, Robertson SE, Steingrimsson JA, Stuart EA, and Hernán MA (2020). Extending inferences from a randomized trial to a new target population. Statistics in Medicine 39, 1999–2014. [DOI] [PubMed] [Google Scholar]
  12. Dahabreh IJ, Robertson SE, Tchetgen Tchetgen EJ, Stuart EA, and Hernán MA (2018). Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals. Biometrics 75, 685–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Dahabreh IJ, Robins JM, Haneuse SJ-P, and Hernán MA (2019). Generalizing causal inferences from randomized trials: counterfactual and graphical identification. arXiv preprint arXiv:1906.10792. [Google Scholar]
  14. Dawid AP (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society: Series B (Methodological) 41, 1–15. [Google Scholar]
  15. Di Bisceglie AM, Shiffman ML, Everson GT, Lindsay KL, Everhart JE, Wright EC, Lee WM, Lok AS, Bonkovsky HL, Morgan TR, et al. (2008). Prolonged therapy of advanced chronic hepatitis C with low-dose peginterferon. New England Journal of Medicine 359, 2429–2441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hájek J (1971). Comment on “An essay on the logical foundations of survey sampling by D. Basu”. In Godambe VP and Sprott DA, editors, Foundations of statistical inference, page 236. Holt, Rinehart, and Winston, New York City, NY. [Google Scholar]
  17. Higgins JP, Thompson SG, and Spiegelhalter DJ (2009). A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society) 172, 137–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Jackson D, Rhodes K, and Ouwens M (2021). Alternative weighting schemes when performing matching-adjusted indirect comparisons. Research Synthesis Methods 12, 333–346. [DOI] [PubMed] [Google Scholar]
  19. Kennedy EH, Sjölander A, and Small D (2015). Semiparametric causal inference in matched cohort studies. Biometrika 102, 739–746. [Google Scholar]
  20. Kosorok MR (2008). Introduction to empirical processes and semiparametric inference. Springer. [Google Scholar]
  21. Luedtke A, Carone M, and van der Laan MJ (2019). An omnibus non-parametric test of equality in distribution for unknown functions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 81, 75–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Petersen ML, Porter KE, Gruber S, Wang Y, and van der Laan MJ (2012). Diagnosing and responding to violations in the positivity assumption. Statistical Methods in Medical Research 21, 31–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Phillippo DM, Dias S, Ades A, Belger M, Brnabic A, Schacht A, Saure D, Kadziola Z, and Welton NJ (2020). Multilevel network meta-regression for population-adjusted treatment comparisons. Journal of the Royal Statistical Society: Series A (Statistics in Society) 183, 1189–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Racine JS, Hart J, and Li Q (2006). Testing the significance of categorical predictor variables in nonparametric regression models. Econometric Reviews 25, 523–544. [Google Scholar]
  25. Rice K, Higgins J, and Lumley T (2018). A re-evaluation of fixed effect (s) meta-analysis. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181, 205–227. [Google Scholar]
  26. Robertson SE, Steingrimsson JA, and Dahabreh IJ (2021). Using numerical methods to design simulations: revisiting the balancing intercept. American Journal of Epidemiology. [DOI] [PubMed] [Google Scholar]
  27. Robins J, Li L, Tchetgen Tchetgen E, and van der Vaart A (2008). Higher order influence functions and minimax estimation of nonlinear functionals. In Probability and statistics: essays in honor of David A. Freedman, pages 335–421. Institute of Mathematical Statistics. [Google Scholar]
  28. Robins JM (1988). Confidence intervals for causal parameters. Statistics in Medicine 7, 773–785. [DOI] [PubMed] [Google Scholar]
  29. Robins JM and Greenland S (2000). Causal inference without counterfactuals: comment. Journal of the American Statistical Association 95, 431–435. [Google Scholar]
  30. Robins JM, Rotnitzky A, and Scharfstein DO (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical models in epidemiology, the environment, and clinical trials, pages 1–94. Springer. [Google Scholar]
  31. Robins JM, Sued M, Lei-Gomez Q, and Rotnitzky A (2007). Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Statistical Science 22, 544–559. [Google Scholar]
  32. Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66, 688–701. [Google Scholar]
  33. Rubin DB (1986). Statistics and causal inference: Comment: Which ifs have causal answers. Journal of the American Statistical Association 81, 961–962. [Google Scholar]
  34. Rubin DB (1992). Meta-analysis: literature synthesis or effect-size surface estimation? Journal of Educational Statistics 17, 363–374. [Google Scholar]
  35. Rubin DB (2010). Reflections stimulated by the comments of Shadish (2010) and West and Thoemmes (2010). Psychological Methods 15, 38–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Rudolph KE and van der Laan MJ (2017). Robust estimation of encouragement design intervention effects transported across sites. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 79, 1509–1525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Saegusa T and Wellner JA (2013). Weighted likelihood estimation under two-phase sampling. Annals of Statistics 41, 269–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Signorovitch JE, Sikirica V, Erder MH, Xie J, Lu M, Hodgkins PS, Betts KA, and Wu EQ (2012). Matching-adjusted indirect comparisons: a new tool for timely comparative effectiveness research. Value in Health 15, 940–947. [DOI] [PubMed] [Google Scholar]
  39. Signorovitch JE, Wu EQ, Andrew PY, Gerrits CM, Kantor E, Bao Y, Gupta SR, and Mulani PM (2010). Comparative effectiveness without head-to-head trials. Pharmacoeconomics 28, 935–945. [DOI] [PubMed] [Google Scholar]
  40. Sobel M, Madigan D, and Wang W (2017). Causal inference for meta-analysis and multi-level data structures, with application to randomized studies of Vioxx. Psychometrika 82, 459–474. [DOI] [PubMed] [Google Scholar]
  41. Stefanski LA and Boos DD (2002). The calculus of m-estimation. The American Statistician 56, 29–38. [Google Scholar]
  42. Tsiatis A (2007). Semiparametric theory and missing data. Springer Science & Business Media. [Google Scholar]
  43. Van der Vaart AW (2000). Asymptotic statistics, volume 3. Cambridge University Press. [Google Scholar]
  44. van der Vaart AW and Wellner JA (1996). Weak Convergence and Empirical Processes. Springer. [Google Scholar]
  45. VanderWeele TJ (2009). Concerning the consistency assumption in causal inference. Epidemiology 20, 880–883. [DOI] [PubMed] [Google Scholar]
  46. Vo T-T, Porcher R, Chaimani A, and Vansteelandt S (2019). A novel approach for identifying and addressing case-mix heterogeneity in individual participant data meta-analysis. Research Synthesis Methods 10, 582–596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Vo T-T, Porcher R, and Vansteelandt S (2021). Assessing the impact of case-mix heterogeneity in individual participant data meta-analysis: Novel use of I2 statistic and prediction interval. Research Methods in Medicine & Health Sciences 2, 12–30. [Google Scholar]
  48. Westreich D, Edwards JK, Lesko CR, Stuart E, and Cole SR (2017). Transportability of trial results using inverse odds of sampling weights. American Journal of Epidemiology 186, 1010–1014. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

RESOURCES