Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Apr 1.
Published in final edited form as: Prev Sci. 2021 Jul 9;23(3):403–414. doi: 10.1007/s11121-021-01270-3

Causally interpretable meta-analysis: Application in adolescent HIV prevention.

David H Barker 1,2, Issa J Dahabreh 3,4, Jon A Steingrimsson 5, Christopher Houck 1,2, Geri Donenberg 6, Ralph DiClemente 7, Larry K Brown 1,2
PMCID: PMC8742835  NIHMSID: NIHMS1731795  PMID: 34241752

Abstract

Endowing meta-analytic results with a causal interpretation is challenging when there are differences in the distribution of effect modifiers among the populations underlying the included trials and the target population where the results of the meta-analysis will be applied. Recent work on transportability methods has described identifiability conditions under which the collection of randomized trials in a meta-analysis can be used to draw causal inferences about the target population. When the conditions hold, the methods enable estimation of causal quantities such as the average treatment effect and conditional average treatment effect in target populations that differ from the populations underlying the trial samples. The methods also facilitate comparison of treatments not directly compared in a head-to-head trial and assessment of comparative effectiveness within subgroups of the target population. We briefly describe these methods and present a worked example using individual participant data from three HIV prevention trials among adolescents in mental health care. We describe practical challenges in defining the target population, obtaining individual participant data from included trials and a sample of the target population, and addressing systematic missing data across datasets. When fully realized, methods for causally interpretable meta-analysis can provide decision-makers valid estimates of how treatments will work in target populations of substantive interest as well as in subgroups of these populations.


Most users of meta-analyses of randomized trials want to synthesize available evidence in order to draw causal inferences about a target population of substantive interest. Unfortunately, results obtained by conventional meta-analysis methods do not have a clear causal interpretation when the distribution of effect modifiers differs among the populations underlying the included trials and the target population (Dahabreh, Petito, et al., 2020; Sobel et al., 2017). This problem cannot be addressed in meta-analyses of effect sizes or other trial-level summary statistics because aggregate data cannot be used to fully account for individual-level effect modifiers. The problem is also not addressed by standard approaches to individual participant data meta-analyses, which can account for individual characteristics (e.g., via covariate-adjusted outcome regression) and heterogeneity of treatment effects across trials (e.g., via mixed-effects models; Tierney et al., 2015), because the output of these approaches cannot be interpreted in the context of the target population. In fact, meta-analyses often do not explicitly specify a target population.

The results of each trial in a meta-analysis apply to the population underlying the trial, reflecting the trial’s eligibility criteria and recruitment practices; this population will have a different distribution of effect modifiers than most target populations of substantive interest, such as patients who are candidates for treatment in a particular setting. Specialized methods are needed to transport inferences from each trial to the target population (Dahabreh, Robertson, et al., 2020; Pearl & Bareinboim, 2011).

To illustrate, consider a hypothetical meta-analysis where, on average, an experimental treatment shows a benefit compared to some control treatment, and where the treatment effect is moderated by baseline (pre-treatment) mental health symptom severity, such that individuals with low severity experience benefit from the treatment, whereas individuals with high severity experience no effect. Suppose we are interested in using meta-analysis to inform treatment decisions for a target population characterized by high symptom severity. If individuals with higher symptom severity are less likely to be recruited and less likely to participate in trials, then trial samples would have lower symptom severity compared to the target population. Treatment effect estimates from the trials, and any conventional meta-analysis of these estimates, would show benefit from the experimental treatment, but these estimates are unlikely to apply to the target population where the treatment benefit will be attenuated due to the higher proportion of individuals with high symptom severity.

As in our hypothetical example, participants in the vast majority of trials are purposely recruited rather than randomly sampled. Recruitment of participants who meet a trial’s inclusion and exclusion criteria results in trials with underlying populations that differ from one another and from the target population. Moreover, participation in trials is voluntary and subject to self-selection: individuals who choose to participate in research likely differ from those who will ultimately receive the treatment (Elwood, 1982). Thus, the target population typically differs from the populations underlying the trials. Conventional meta-analysis methods do not address these differences and causal inference in the context of meta-analysis requires use of specialized “transportability methods” to draw causal inferences about the target population.

Transportability methods address differences between the population underlying a trial and the target population of interest by combining background knowledge, statistical methods, data from the trials, and data from a sample of the target population to extend causal inferences from the trial to the target population (Cole & Stuart, 2010; Dahabreh, Robertson, Tchetgen, et al., 2019; Dahabreh, Robertson, et al., 2020; Rudolph & van der Laan, 2017; Westreich et al., 2017). We recently proposed extensions of these methods that combine individual participant data from multiple trials with baseline covariate data from a sample of the target population to estimate treatment effects relevant to the target population (Dahabreh, Robertson, Petito, et al., 2019; Dahabreh, Petito, et al., 2020). In this manuscript we briefly describe transportability methods for individual participant data meta-analysis, provide a worked example in HIV prevention using data for which conventional meta-analytic methods would have limited usefulness, and discuss challenges and opportunities in using transportability methods for causally interpretable meta-analysis.

Transportability methods for meta-analysis

Identification of treatment effects in each trial

The potential outcomes framework facilitates the formal definition of causal estimands (e.g., average treatment effects) and articulation of assumptions needed for these causal estimands to be identifiable (Neyman, 1923; Robins & Greenland, 2000; Rubin, 1974). Briefly, the framework posits that each individual has a well-defined potential outcome under each treatment being considered. In a trial, at most one potential outcome may be observed for each trial participant (because they are assigned to one treatment) while the other potential outcomes (treatments to which the individual is not assigned) remain unobserved (counterfactual). The potential outcome mean for the population underlying the trial is the average outcome if everyone in the population had received that treatment, and is identifiable when certain conditions are met, including consistency of potential outcomes, exchangeability between treatment groups, and positivity of the treatment assignment probability (Hernán & Robins, 2020). The consistency condition states that the observed outcome for an individual receiving a specific treatment is equal to that individual’s potential outcome under that treatment. This condition holds when there is no interference, that is, each participant’s potential outcome does not depend on the treatment of others; and when all participants receive the same version of treatment (or when treatment variation is irrelevant to the outcome of interest; Hernán, & VanderWeele, 2011, VanderWeele, 2009). These components of the consistency condition are sometimes referred to as the stable unit treatment value assumption (SUTVA; Rubin, 1980, 2010). Conditional exchangeability among treatment groups states that potential outcomes are independent of treatment assignment conditional on baseline covariates. Finally, positivity of the treatment assignment probability states that all participants in the trial have a non-zero probability of being assigned to each treatment. Randomization of well-defined treatments in a controlled trial helps ensure that these assumptions are met, and thus the potential outcome means, and average treatment effects can be expressed as functions of the observable data obtained from the population underlying the trial.

Identification of treatment effects in the target population

Transporting causal inferences from controlled trials to a target population requires conditions beyond those needed for identification of causal estimands in the population underlying each trial. First, it requires a stronger version of consistency that holds across the populations underlying the trials and the target population and also encodes an assumption that trial participation does not affect the outcome except through the treatments (Dahabreh, Robins, Haneuse, et al., 2019; Dahabreh, Robins, & Hernán 2020). Second, it requires conditional exchangeability among populations underlying the randomized trials and the target population, which states that the potential outcome is independent of the trial or target populations, conditional on baseline covariates. Third, it requires positivity of the trial participation probability, which states for the baseline covariates needed to satisfy the condition of exchangeability among populations, every covariate pattern that occurs in the target population should have non-zero probability of occurring in at least one trial that evaluated each treatment of interest (Dahabreh, Robertson, Petito, et al., 2019; Dahabreh, Petito, et al., 2020 discuss several versions of the exchangeability and positivity conditions and explore their implications for identifying causal estimands). When these assumptions are met, it is possible to transport inferences about potential outcome means using data from a collection of trials to a target population, allowing for direct comparison of potential outcome means in the context of the target population.

The identifiability conditions stated above can be used to express the potential outcome means in the target population as a function of the observed data distribution (Dahabreh, Robertson, Petito, et al., 2019). To introduce some notation, let X be a set of baseline covariates collected from all trial participants and from a sample of the target population that are sufficient to satisfy the identifiability conditions. Informally, the sufficient set of covariates in X are all those that either predict the outcome (prognostic indicators) or modify response to treatment and relate to trial participation or treatment assignment. Let S denote the random variable for the data source from which an observation is obtained and S the collection of trials. Our data application involves three randomized trials; we use S to denote the source trial for trial participants, with SS= {1, 2, 3}. We use the convention that S = 0 denotes the target population. Let Y be the outcome of interest examined in the trials and A the assigned treatment. We use lower case letters to denote realizations of these random variables; for example, s denotes a specific study and a denotes a specific treatment. We require data on covariates, treatment assignment, and outcomes from trial participants; but only data on covariates from the target population sample. Data from the trials and the sample from the target population are combined in a composite dataset where the total number of observations is denoted by n and observations are indexed by i. We denote the potential outcome for participant i under treatment a as Yia. The target of inference is the potential outcome mean in the target population, E[Ya|S = 0]. Other parameters of interest are functions of these potential outcome means. For example, the average treatment effect comparing two treatments, a and a′, is defined as E[YaYa|S = 0] = E[Ya|S = 0] – E[Ya|S = 0];

Under the identifiability conditions, the potential outcome mean for treatment a is identified by the following function of the observed data distribution (Dahabreh, Robertson, Petito, et al., 2019): ϕ(a) = E[E[Y|X, S ≠ 0, A = a]|S = 0]. Informally, we are marginalizing the expectation of the outcome conditional on covariates and assignment to treatment a in the collection of trials (S ≠ 0) over the covariate distribution of the target population (S = 0). When the identifiability conditions hold, ϕ(a) can be interpreted as the potential outcome mean under intention to assign members of the target population to treatment a.

Estimation of potential outcome means in the target population

The identification results above can be used to build an estimator of the potential outcome mean using a model for the expectation of the outcome conditional on covariates among trial participants assigned to treatment a, that is, E[Y|X, S ≠ 0, A = a]. Note, there are alternative approaches that rely on modeling the probability of trial participation instead of modeling the expectation of the outcome (Dahabreh, Petito, et al., 2020; Dahabreh, Robertson, Petito, et al., 2019), but for the purposes of this paper we will focus on modeling the expectation of the outcome. Specifically, for every treatment of interest, a, we propose to estimate ϕ(a) as

ϕ^(a)={i=1nI(Si=0)}1i=1nI(Si=0)g^a(Xi).

The estimator uses an outcome regression model g^a(X) estimated using trial data, generates model-based predictions using the covariates of everyone in the sample from the target population, and then averages the predictions in the sample of the target population to generate an estimate of the potential outcome mean in that population. We can estimate g^a(X) using parametric approaches (e.g., generalized linear models) or more flexible data-adaptive approaches (e.g., regularized regression methods, random forests, or other machine learning methods). We can compare treatments a and a′ in the target population by taking the difference ψ^(a,a)= ϕ^(a)ϕ^(a) to estimate the average treatment effect in the target population. The estimators ϕ^(a) and ψ^(a,a) can be interpreted as estimators of the potential outcome mean and the average treatment effect in the target population, respectively, provided the identifiability conditions hold and that the model g^a(X) is correctly specified. Standard errors for ϕ^(a) and ψ^(a,a) can be obtained using bootstrapping.

Defining the target population

An important aspect of transportability methods is the focus on specifying the target population and obtaining a sample from it. Target populations should be chosen based on substantive considerations by specifying the population in which the users of the meta-analysis want to better understand the impact of one or more treatments. For example, eligibility criteria for target populations can be defined on the level of a country, region, hospital, or program. It is also possible to examine subgroups of patients within a target population who share characteristics enabling more specific treatment recommendations. Once the target population has been specified a suitable sample needs to be obtained. Causal inferences are interpreted in the context of the target population and are expected to change depending on the distribution of prognostic indicators or treatment modifiers in the target population. When the identifiability conditions hold and representative data from the target population are available, transportability analyses enable the comparison of multiple treatments by estimating how the target population would have responded to each of the treatments being considered. Thus, estimates for causal parameters in the target population can be compared overcoming between-trial differences in participant characteristics. In the spirit of precision medicine (Dahabreh, Hayward & Kent, 2016), such comparative effectiveness results can help decision-makers identify the most promising treatments for specific target populations or subpopulations. Transportability methods thus provide a promising solution to between-trial differences in participant characteristics and differences between trials and the target population. The remainder of this manuscript will focus on applying these approaches to three adolescent HIV prevention trials among youth receiving mental health care.

Transportability analysis with data from HIV prevention

Methods

We harmonized individual participant data across three clinical trials of HIV prevention designed for adolescents in mental health care (NCT00603369, NCT00500487, NCT00496691). The trials evaluated approaches focusing on emotion regulation (ER; Brown et al., 2011, 2017), family processes (FM; e.g., parent-adolescent communication, parental monitoring and supervision; Barker, Hadley, et al., 2019; Brown et al., 2014), and HIV-related knowledge and skills training (ST; e.g., partner negotiation, condom-use; Brown et al., 2011). All trials included a general-health promotion control (HP; e.g., sleep, nutrition, sexual health). Trials included youth ages 13 to 19 sampled from mental health hospitals and clinics, or from therapeutic schools. All trials included interim (3–6 months) and extended follow-up (9–12 months) assessments. The harmonized dataset represents the largest collection of adolescents receiving mental health services who participated in HIV prevention trials (n=1323), with 4081 completed assessments out of a 4875 possible. The largest of the three trials individually randomized adolescents to three arms, while the other two were implemented in therapeutic schools and used cluster-randomized cross-over designs. Specifically, participants received only one treatment and to avoid crosstalk among students only one treatment was administered at each school during each semester. Schools were randomized to treatment and rerandomized each semester so that by the end of the trial(s) all treatment were administered in each school. Individual participants only received one treatment. The larger of the two studies was a three-arm trial and the smaller a two-arm trial. Both cluster-randomized trials had small intra-class correlations (i.e., < .01) suggesting that within-cluster dependence was not influential. Although individuals were not randomized in these two trials, we proceeded with analyses assuming that treatment assignment was essentially random given covariates. Baseline characteristics for the three trials are presented in Table 1 and show marked between-trial differences in patient characteristics.

Table 1.

Baseline Characteristics

Trial 1 Trial 2 Trial 3 SMD
Sample size, in each treatment group
 Emotion Regulation (ER) -- 157 96
 Family Based (FM) 227 -- --
 Skills Training (ST) 259 136 --
 General Health Promotion (HP) 235 124 89
Age 15.39 (1.32) 15.66 (1.48) 15.79 (1.3) 0.20
Female 57% (410) 30% (126) 61% (112) 0.42
Race 0.59
 White 28% (203) 51% (211) 50% (93)
 African American 57% (413) 30% (125) 17% (31)
 Other 12% (88) 18% (77) 21% (39)
Hispanic 11% (78) 19% (78) 27% (50) 0.28
Psychiatric
 Externalizing diagnosis 48% (335) 45% (178) 48% (84) 0.05
 Internalizing diagnosis 36% (251) 25% (102) 34% (60) 0.15
 Mania symptoms 15% (104) 9% (36) 7% (12) 0.17
 Functional impairment 16.5 (8.84) 14.88 (7.65) 15.09 (8.99) 0.13
 Ever engaged in self-cutting 26% (190) 22% (93) 32% (59) 0.15
Condom use self-efficacy 1.60 (0.90) 1.86 (0.92) 1.71 (0.87) 0.19
HIV knowledge (proportion correct) 0.63 (0.21) 0.56 (0.25) 0.73 (0.18) 0.49
HIV tested 28% (199) 29% (121) 23% (43) 0.04
Substance use
 Ever used marijuana 38% (275) 48% (200) 53% (98) 0.20
 Drank alcohol in past month 21% (148) 41% (171) 28% (52) 0.30
Sexual behavior
 Ever engaged in oral sex 30% (213) 48% (202) 44% (82) 0.26
 Ever engaged in vaginal or anal sex 54% (386) 59% (248) 54% (100) 0.08
 Vaginal or anal sex in past 3–6 months 32% (228) 44% (185) 35% (65) 0.17
 Any condomless sex during past 3–6 months 16% (118) 23% (97) 23% (42) 0.34

Notes: Categorical variables are presented as % (n) and scale variables as mean (standard deviation). SMD = standardized mean difference, average of all pairwise comparisons.

Measures

Measures in all three trials included demographics, psychiatric diagnosis using the Computerized Diagnostic Interview Schedule for Children (C-DISC; Schwab-Stone et al., 1996), functional impairment measured using the Columbia Impairment Scale (CIS; Bird et al., 1993), and risk behaviors associated with HIV transmission using the Adolescent Risk Behavior Assessment (ARBA; Donenberg et al., 2001). Also common across trials were 11 items addressing HIV knowledge and four items addressing self-efficacy for condom use. Items were averaged for HIV knowledge and for self-efficacy for condom use. All trials used audio-computer assisted self-interviews to collect measures. The primary outcome was defined as reporting any occurrence of condomless sex across the extended follow-up period (i.e., across all follow-up assessments). We selected this cumulative definition to help address the developmentally expected sparsity and instability in adolescent sexual partnerships (Barker, Scott-Sheldon, et al., 2019) by increasing the opportunity for observing the risk behavior.

Analytic approach

The goal of the analysis was to estimate: 1) potential outcome means for each treatment (ER, FM, ST, HP) in a target population, 2) average treatment effects comparing treatments in the target population, and 3) conditional average treatment effects within prespecified subgroups identified as being key subpopulations in HIV prevention due to elevated risk for HIV infection including, Black or African American young men, Black or African American young women, Hispanic youth, and youth reporting substance use. There were not sufficient numbers of other key subpopulations in HIV prevention, such as youth identifying as a sexual or gender minority, to support statistical analysis.

Target population.

We were interested in drawing inferences about a target population of U.S. adolescents receiving mental health treatment in routine clinical practice. Unfortunately, we were unable to obtain data from such a target population where covariates (e.g., previous sexual history, mental health symptoms) that we would deem sufficient to satisfy the identifiability conditions had been assessed. In particular, some of the larger surveillance datasets for adolescent populations provide information for some of these variables but differ from the trial data in the assessment of mental health symptoms. Thus, to illustrate the methods, we used a 15% holdout sample from Trial 1 as a sample from a target population of youth actively receiving mental health care and 15% from Trial 2 as a sample from a target attending therapeutic schools. Only the covariate data was used from the holdout samples and they were not used in the estimation of the outcome models. We used this somewhat artificial approach to illustrate the methods appropriate for drawing causal inferences about a new target population.

Estimation.

We used logistic regression to estimate outcome models. The models included predictors that defined the subgroups of interest (i.e., gender, race, ethnicity, substance use) plus important predictors of the outcome including gender, ever engaging in vaginal or anal sex, any condomless sex in the past 3–6 months, and having an externalizing diagnosis. These important predictors were identified using a conditional random forest approach with conditional permutation variable importance (Strobl, Hothorn, & Zeileis, 2009).

We built two types of outcome models using trial data. The first type was built to conduct one-trial-at-a-time transportability analyses, by extending inferences from each trial to the two target populations. Thus, we built separate outcome models for each trial that included interactions between the covariates and treatments evaluated in the trial. These models were then applied to covariate distributions in the target populations to generate predicted probabilities (under each treatment) which were then averaged over each target population. As a point of comparison, we also estimated potential outcome means in each trial, in effect using the covariate distribution of each trial as representative of the trial’s underlying population.

The second type of outcome model was built to synthesize information across trials. We fit one model using data from all three trials that included interactions between the covariates and treatments. This model was then applied to the covariate distributions of the target populations. To estimate average treatment effects, we subtracted the four transported potential outcome means within each target population to estimate pairwise contrasts (i.e., ERvFM, ERvST, ERvHP, FMvST, FMvHP, STvHP). Conditional average treatment effects were calculated by marginalizing over subgroups of each target population.

Provided the conditions of consistency, exchangeability across populations, and positivity of trial participation hold for each of the trials, and that outcome models are correctly specified, we expect potential outcome mean estimates for the target population to be the same for trials that evaluated the same treatment (Dahabreh, Petito, et al., 2020; Dahabreh, Robertson, Petito, et al., 2019). Thus, we compared the estimates from one-trial-at-a-time transportability analyses for each treatment evaluated in each trial. Marked differences in these estimates may suggest one or more identifiability conditions are violated for one or more of the trials and/or for one or more of the treatments. For example, differences among estimates can suggest lack of exchangeability (i.e., omission of important covariates from outcome models), or positivity violations, or variation in how treatments were administered across trials. Nevertheless, lack of differences in the estimates does not guarantee that the conditions hold. Furthermore, it is possible for the identifiability conditions to hold for the aggregate collection of trials, but not for each trial. For example, positivity of participation may be violated for individual trials but not for the aggregate collection (e.g., when a specific trial does not have data on a subgroup of the target population, but at least one other trial in the meta-analysis does; Dahabreh, Robertson, Petito, et al., 2019).

Our data have features that complicate application of conventional approaches to individual participant data meta-analysis, including small number of trials, treatments that were not evaluated in all trials, and a mixture of two and three arm trials. More importantly, when effects are heterogeneous over baseline covariates, conventional meta-analysis methods estimate different parameters compared to the transportability methods. To numerically compare our estimates with those of conventional approaches, we estimated the average treatment effect for the STvsHP contrast using data from the two trials that reported a direct comparison between ST and HP groups. We fit a linear model using study fixed effects with the same covariates as the transportability analyses. Missing data were addressed using multiple imputation

Bootstrap sampling and missing data.

We used nonparametric bootstrap to obtain standard errors and 95% confidence intervals. Participants were sampled with replacement from each of treatment within each trial. Nonparametric bootstrap was selected because it does not require parametric assumptions as is required for parametric bootstrapping. Missing data on baseline characteristics ranged from 0.2% for gender to 8% for condom use self-efficacy. For the outcome, 65% of participants completed all assessments and 11% completed no follow-up assessments. Assuming data was missing at random (given baseline covariates), we performed a single imputation within each bootstrapped sample using predictive mean matching with chained equations (van Buuren & Groothuis-Oudshoorn, 2011). The imputation model included trial and treatment, all outcome assessments, and the covariates listed in Table 1. We then obtained estimates of potential outcome means and treatment effects in each of 1,000 bootstrapped and imputed samples (Schomaker & Heumann, 2018).

Results

Estimated potential outcome means in the target populations from one-trial-at-a-time transportability analyses are shown in Figure 1. There are marked differences between estimates from transportability analyses and those for each trial’s underlying population. For both target populations, there appear to be some differences between one-trial-at-a-time transportability analyses for trials that examined ST and HP, suggesting that there may be violations of the identifiability conditions. We will return to the implications of these between trial differences in the Discussion. Estimated potential outcome means combining data across trial are also depicted in Figure 1 and suggest that that treatments are largely similar to one another.

Figure 1.

Figure 1.

Estimated potential outcome means.

Notes: Original estimates were based on the covariate distribution within each trial. By trial estimates were transported from each trial to the target populations. Combined estimates use causally interpretable meta-analysis to transport the potential outcome means to the target population. FM = Family process; ER = Emotion regulation; ST = Skills training; HP = Health promotion.

Average treatment effects and conditional average treatment effects for subgroups of the target populations are shown in Figure 2. These results suggest some variability in treatment response among subgroups, with some evidence that Black or African American young women in therapeutic schools may not respond as well to the ST treatment. There is also some evidence that Hispanic youth in both target populations may not respond as well to the FM treatment, which is consistent with recent work adapting the FM treatment to better meet the needs of this population (Lescano, et al., 2020). The subgroup analyses, however, result in wide confidence intervals, suggesting that more evidence is needed before making clinical determinations regarding heterogeneity of treatment effects. Importantly, we were able to obtain estimates for subgroups defined by multiple covariates, such as race and gender, and to estimate treatment effects comparing treatments that had not been directly compared in any trial (ER vs. FM).

Figure 2.

Figure 2.

Estimated treatment effects.

Notes: Figure summarizes average treatment effects for each treatment transported to the target samples for the full target sample as well as youth who recently used substances, Hispanic youth, Black or African American young men, and Black or African American young women. Mean average treatment effects across bootstrapped samples and 95% confidence intervals calculated using bootstrapped standard errors. FM=Family based; ER= Emotion Regulation; ST=Skills Training; HP=Health Promotion.

For comparison, using conventional meta-analysis methods to combine evidence from the two trials that directly compared ST to HP, the treatment effect was estimated to be −0.04 [95% CI −0.10; 0.02], which was different from the causally interpretable meta-analysis estimates when using a target population of youth receiving mental health care, 0.01 [−0.06; 0.09] or a target population of youth in therapeutic schools, 0.03 [−0.05; 0.12]. The numerical differences between the estimates likely reflect the fact that the coefficients in the conventional meta-analysis do not estimate the same parameter as the novel estimators described in this paper. In fact, in causally interpretable meta-analysis, the parameter is expected to change with each target population. Furthermore, the two approaches rely on different model specification assumptions and do not use the same data (e.g., conventional methods do not use the data from the target population).

Discussion

This manuscript presents an application of recently proposed methods that allow for causal interpretation of individual-participant-data meta-analysis (Dahabreh, Petito, et al., 2020; Dahabreh, Robertson, Petito, et al., 2019). Under explicit causal and statistical modeling assumptions, the methods provide estimates of potential outcome means and treatment effects in a target population of substantive interest and its subgroups, and enable comparisons among treatments that have not been compared directly in the same trial. As with any new analytical tool, more work is needed to understand the utility of the methods in practical applications.

Defining the target population

Clearly specifying a target population and obtaining a representative sample of that population are defining features of transportability analyses. Ideally, sample data from the target population include all predictors of the outcome and effect modifiers that have a different distribution in the collection of trials and target population; the same variables need to be measured in the trial data. In practice, it can be difficult to obtain such trial and target population data. For example, in our analyses there were marked differences in which covariates were measured across the three trials limiting the number of potential covariates available for analysis. We were also unable to obtain data from a target population of adolescents in mental health care that assessed the same key covariates as the trials. Data quality also impacts the ability to quantify whether a given sample from the target population is representative of that population. In settings with rich data resources, such as hospital systems with electronic medical records, target populations can be specified and sampled, presuming that the medical records contain assessments of the same patient characteristics as the trial data. In other settings, it is important to continue developing data resources that utilize common data elements in clinical trials as well as in data sampled from target populations of interest (Cella et al., 2007; Ohmann et al., 2017; Polanin & Williams, 2016).

The analyses reported here attempt to address differences in the covariate distributions of the trial samples and the target population. The target population distribution can be estimated from data that are masked or otherwise altered to maintain confidentiality. Currently, the approach assumes that an individual contributes data to only one samples from the trials or target population. There are situations where an individual may contribute data to the target population sample (e.g., through a medical record) and to a trial sample. If overlap can be identified, then it can be easily addressed in statistical analyses; unidentifiable overlap is usually more challenging to address (Saegusa, 2019).

Evaluating assumptions

The identifiability conditions necessary for causal inference in meta-analysis must be carefully considered in light of substantive knowledge. For identification of treatment effects in each trial, the consistency, exchangeability among treatment groups, and positivity of treatment assignment probability will usually hold in well-designed randomized controlled trials with well-defined treatments. The no-interference component of the consistency condition depends on the nature of the treatments being compared and the structure of the population underlying the trials. For instance, interference may be present when treatments are implemented on the level of pre-existing groups such as schools or communities.

Evaluating the additional identifiability conditions needed for transportability analyses is challenging. Comparing potential outcome means from each trial in target populations like those depicted in Figure 1 can provide an indication that assumptions are violated either for individual trials or for the aggregate collection of trials. It is possible for assumptions to hold for some treatments but not others. For example, the differences between one-trial-at-a-time transportability analyses and analyses pooling data across trials for the ST and HP treatments suggest that one or more identifiability conditions were violated for one or more of the trials, in relation to the target population; these differences, however, do not pinpoint where violations occurred or their exact nature. In general, differences can indicate that exchangeability across populations does not hold (necessitating sensitivity analyses; Dahabreh, Robins, Haneuse, Saeed et al., 2019); that there exists outcome-relevant treatment variation across trials; that positivity of participation does not hold for one or more trials; or that models are incorrectly specified.

In our worked example, the analysis included many variables that have been linked to condomless sex. However, there are other prognostic indicators not included in the modeling because they were not assessed in all trials, including emotion regulation, family functioning, parent-adolescent communication, parental monitoring, partner characteristics, HIV stigma, abuse history, and parental psychopathology among others (Barker et al., 2012; Barker, Hadley, et al., 2019; Hadley et al., 2015, 2017). If one or more of these variables predicts the outcome and differs in distribution between the trials and the target, including them in the model will generally improve the validity of estimates.

Another explanation for the between trial discrepancy in potential outcome means is between trial differences in treatment content and/or implementation leading to outcome-relevant variation. The single version of each treatment component of the consistency condition needs additional consideration in the context of individual participant data meta-analysis as it requires a treatment to be similarly structured and implemented across trials (Hernán, & VanderWeele, 2011, VanderWeele, 2009). This is perhaps one of the most difficult assumptions for behavioral trials as there are almost always some differences in how a treatment is designed and implemented across trials. In our example the core content of each treatment type was similar across trials, yet there were differences in how treatments were implemented, such as number of sessions, length of sessions, time between sessions. By combining the treatments across trials, we assume all variations in implementation do not impact treatment potency. Indeed, there were differences in how HP was administered with Study 3, having a shorter dose compared to how it was administered in the other 2 trials. An intriguing direction for future investigation would be further developing transportability methods to examine treatment variation between trials.

We evaluated the positivity of trial participation assumption by comparing distributions of estimated probability of participation in the collection of trials and the samples of the target populations. We also examined the distribution of individual trials with the samples of the target populations. Estimated probabilities were generated using logistic regression using the same covariates as the outcome models. We found adequate overlap in the distributions of the estimated probabilities, suggesting that the positivity condition was not grossly violated for variables included in our models.

Obtaining and harmonizing individual participant data

Obtaining data from investigators continues to be a primary challenge to using these and other methods that rely on individual data (Polanin & Williams, 2016). There has been a consistent effort in improving data accessibility by requiring researchers to make their data publicly available (Cella et al., 2007; Ohmann et al., 2017; Polanin & Williams, 2016). As these efforts continue to mature more individual participant data will become available for use. It is an open question how data from trials prior to these efforts will be included as it is costly to acquire and prepare older datasets. Once trial data are publicly available and appropriately documented, data must be harmonized across trials. Harmonization entails equating variable names, coding, and where possible, meaning of the questions and measures used in each trial (Curran & Hussong, 2009). Using common data elements across trials facilitates harmonization. When common data elements are not present there is significant cost in time and expertise required to appropriately harmonize data across trials. These costs need to be accounted for when planning analyses with individual participant data.

Systematically missing data

Properly addressing missing data is essential to obtaining valid inference from clinical trials and for individual participant data meta-analysis. Much has been written about addressing missing data when some covariate is unavailable from some, but not all, individuals in a trial; we addressed potential bias introduced by such missingness by imputing each bootstrapped sample under a missing-at-random assumption. When dealing with data from multiple trials, differences in which patient characteristics are assessed and in which instruments are used to measure them result in systematically missing data, where some covariates are not available for any individuals in one or more trials. There has been work addressing systematically missing data between trials (Curran et al., 2016, 2018; Hong et al., 2018; Jolani et al., 2015; Kunkel & Kaizar, 2017), but to our knowledge this work has not considered transportability analyses. To address systematically missing data in transportability analyses, it will be necessary to define additional identifiability conditions and develop novel estimators.

Implications for HIV prevention

Results from the worked example suggest that differences among treatments for condomless sex are minimal across the extended follow-up. There was limited evidence of effect modification by factors associated with elevated risk for HIV acquisition, including that Black or African American young women in therapeutic schools do not appear to benefit from the skills training compared to other treatments, and Hispanic youth tend not to respond to the family-based treatment as implemented in Study 1. These results are valid if the identifiability conditions and missing data assumptions are met, and models are correctly specified. As discussed previously, there is evidence of violations of the identifiability conditions in our analyses. Our results, therefore, should be interpreted with caution. Future work, collecting additional information on prognostic indicators of condomless sex and including more trials, could improve precision and allow a more thorough evaluation of the identifiability conditions by examining patterns when transporting inferences from different trials to the same target population (e.g., to explore differences in treatment content and implementation across trials).

Implications for evidence synthesis

The conditions that allow estimation of treatment effects in a target population using treatment and outcome data from multiple trials suggest new directions and opportunities for evidence synthesis. Perhaps the most important consideration is that the utility of inferences drawn by synthesizing evidence from diverse sources depends on both the quality of evidence in each source as well as the relevance of that evidence to the target population of interest. Inference is connected to the target population in that results will vary depending on how the target population is defined and which participant characteristics influence treatment response. Guidelines and reporting standards for evidence synthesis tend to focus on evidence quality with little emphasis on the relevance of evidence to the target population. Evidence synthesis would be strengthened by including more explicit discussion of the relation between trials included in a meta-analysis and the target population(s) of substantive interest, as well as more explicit consideration of the numerous assumptions undergirding results and conclusions.

Conclusions

Transportability methods promise to provide individual participant data meta-analysis results that have a causal interpretation, and thus facilitate the assessment of comparative effectiveness among treatments in target populations of substantive interest and their subgroups, even when the treatments have not been compared in the same head-to-head trial. To realize this promise, effort is needed to improve measurement consistency and data availability across trials and samples from target populations.

Acknowledgments:

Heather McGee, PhD, Office of Medical Education, The Warren Alpert Medical School of Brown University, Providence, Rhode Island, was instrumental in harmonizing the datasets used in this manuscript.

Funding:

This study was funded by National Institute of Mental Health (K23MH102131). Included trials were funded by R01MH066641, R01MH63008, and R01MH61149. The study was also supported in part by Patient-Centered Outcomes Research Institute (PCORI) awards ME-1502-27794 and ME-2019C3-17875. The content is solely the responsibility of the authors and does not necessarily represent the official views of PCORI, its Board of Governors, or the PCORI Methodology Committee.

Footnotes

Ethical approval: All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Conflict of Interest: All authors declare that he/she has no conflict of interest.

Informed consent: Informed consent for each trial was obtained from all parents/caregivers and assent was obtained from all participants under the age of 18.

REFERENCES:

  1. Barker DH, Hadley W, McGee H, Donenberg GR, DiClemente RJ, & Brown LK (2019). Evaluating the role of family context within a randomized adolescent HIV-risk prevention trial. AIDS and Behavior, 23(5),1195–1209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Barker DH, Scott-Sheldon LAJ, Gittins Stone D, & Brown LK (2019). Using composite scores to summarize adolescent sexual risk behavior: Current state of the science and recommendations. Archives of Sexual Behavior 48(8),2305–2320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Barker DH, Swenson RR, Brown LK, Stanton BF, Vanable PA, Carey MP, Valois RF, Diclemente RJ, Salazar LF, & Romer D (2012). Blocking the benefit of group-based HIV-prevention efforts during adolescence: The problem of HIV-related stigma. AIDS and Behavior, 16(3), 571–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bird HR, Shaffer D, Fisher P, Gould MS, & et al. (1993). The Columbia Impairment Scale (CIS): Pilot findings on a measure of global impairment for children and adolescents. International Journal of Methods in Psychiatric Research, 3(3), 167–176. [Google Scholar]
  5. Brown LK, Hadley W, Donenberg GR, DiClemente RJ, Lescano C, Lang DM, Crosby R, Barker D, & Oster D (2014). Project STYLE: a multisite RCT for HIV prevention among youths in mental health treatment. Psychiatric Services, 65(3), 338–344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brown LK, Nugent NR, Houck CD, Lescano CM, Whiteley LB, Barker D, Viau L, & Zlotnick C (2011). Safe thinking and affect regulation (STAR): HIV prevention in alternative/therapeutic schools. Journal of the American Academy of Child and Adolescent Psychiatry, 50(10), 1065–1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Brown LK, Whiteley L, Houck CD, Craker LK, Lowery A, Beausoleil N, & Donenberg G (2017). The role of affect management for HIV risk reduction for youth in alternative schools. Journal of the American Academy of Child and Adolescent Psychiatry, 56(6), 524–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B, Ader D, Fries JF, Bruce B, & Rose M (2007). The patient-reported outcomes measurement information system (PROMIS). Medical Care, 45(5 Suppl 1), S3–S11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cole SR, & Stuart EA (2010). Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. American Journal of Epidemiology, 172(1), 107–115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Curran PJ, Cole V, Bauer DJ, Hussong AM, & Gottfredson N (2016). Improving factor score estimation through the use of observed background characteristics. Structural Equation Modeling: A Multidisciplinary Journal, 23(6), 827–844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Curran PJ, Cole VT, Bauer DJ, Rothenberg WA, & Hussong AM (2018). Recovering predictor–criterion relations using covariate-informed factor score estimates. Structural Equation Modeling: A Multidisciplinary Journal, 25(6), 860–875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Curran PJ, & Hussong AM (2009). Integrative data analysis: The simultaneous analysis of multiple data sets. Psychological Methods, 14(2), 81–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Dahabreh IJ, Hayward R, Kent DM (2016). Using group data to treat individuals: understanding heterogeneous treatment effects in the age of precision medicine and patient-centred evidence. International Journal of Epidemiology, 45(6), 2184–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Dahabreh IJ, Petito LC, Robertson SE, Hernán MA, & Steingrimsson JA (2020). Toward causally interpretable meta-analysis: Transporting inferences from multiple randomized trials to a new target population. Epidemiology, 31(3), 334–344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Dahabreh IJ, Robertson SE, Steingrimsson JA, Stuart EA, Hernán MA (2020). Extending inferences from a randomized trial to a new target population. Statistics in Medicine, 39, 1999–2014. [DOI] [PubMed] [Google Scholar]
  16. Dahabreh IJ, Robertson SE, Petito LC, & Hernán MA, & Steingrimsson JA (2019). Efficient and robust methods for causally interpretable meta-analysis: transporting inferences from multiple randomized trials to a target population. ArXiv:1908.09230 [Stat]. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Dahabreh IJ, Robertson SE, Tchetgen EJ, Stuart EA, & Hernán MA (2019). Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals. Biometrics, 75(2), 685–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Dahabreh IJ, Robins JM, Haneuse SJ-PA, & Hernán MA (2019). Generalizing causal inferences from randomized trials: counterfactual and graphical identification. ArXiv:1906.10792 [Stat]. [Google Scholar]
  19. Dahabreh IJ, Robins JM, Haneuse SJ-PA, Saeed I, Robertson SE, Stuart EA, & Hernán MA (2019). Sensitivity analysis using bias functions for studies extending inferences from a randomized trial to a target population. ArXiv:1905.10684 [Stat]. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Dahabreh IJ, Robins JM, & Hernán MA (2020). Benchmarking Observational Methods by Comparing Randomized Trials and Their Emulations. Epidemiology (Cambridge, Mass.), 31(5), 614–619. [DOI] [PubMed] [Google Scholar]
  21. Donenberg GR, Emerson E, Bryant FB, Wilson H, & Weber-Shifrin E (2001). Understanding AIDS-risk behavior among adolescents in psychiatric care: Links to psychopathology and peer relationships. Journal of the American Academy of Child and Adolescent Psychiatry, 40(6), 642–653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Elwood PC (1982). Randomised controlled trials: Sampling. British Journal of Clinical Pharmacology, 13(5), 631–636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Hadley W, Barker DH, Brown LK, Almy B, Donenberg G, & DiClemente RJ (2015). The moderating role of parental psychopathology on response to a family-based HIV prevention intervention among youth in psychiatric treatment. Journal of Family Studies, 21(2), 178–194. [Google Scholar]
  24. Hadley W, Barker D, Thamotharan S, & Houck CD (2017). Relationship between unsupervised time and participation in an emotion regulation intervention and risk outcomes. Journal of Developmental & Behavioral Pediatrics, 38(9), 714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hernán MA, & Robins JM (2020). Causal Inference: What If. Chapman and Hall/CRC. [Google Scholar]
  26. Hernán MA, & VanderWeele TJ (2011). Compound treatments and transportability of causal inference. Epidemiology, 22(3), 368–377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Hong J-L, Jonsson Funk M, LoCasale R, Dempster SE, Cole SR, Webster-Clark M, Edwards JK, & Stürmer T (2018). Generalizing randomized clinical trial results: Implementation and challenges related to missing data in the target population. American Journal of Epidemiology, 187(4), 817–827. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Jolani S, Debray TPA, Koffijberg H, van Buuren S, & Moons KGM (2015). Imputation of systematically missing predictors in an individual participant data meta-analysis: a generalized approach using MICE. Statistics in Medicine, 34(11), 1841–1863. [DOI] [PubMed] [Google Scholar]
  29. Kunkel D, & Kaizar EE (2017). A comparison of existing methods for multiple imputation in individual participant data meta-analysis. Statistics in Medicine, 36(22), 3507–3532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Lescano CM, Castillo HL, Calcano E, Mayor M, Porter M, Rivera-Torgerson Y, Dion C, Marhefka SL, Barker D, Brown LK, & Latino STYLE Research Group. (2020). Latino STYLE: Preliminary findings from an HIV prevention RCT among Latino youth. Journal of Pediatric Psychology, 45(4), 411–422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Neyman J. Sur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes. Master’s Thesis (1923). Excerpts reprinted in English, Statistical Science, 5(4), 463–472. (Dabrowska DM, and Speed TP, Translators.) [Google Scholar]
  32. Ohmann C, Banzi R, Canham S, Battaglia S, Matei M, Ariyo C, Becnel L, Bierer B, Bowers S, Clivio L, Dias M, Druml C, Faure H, Fenner M, Galvez J, Ghersi D, Gluud C, Groves T, Houston P, … Demotes-Mainard J (2017). Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open, 7(12). [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Pearl J, & Bareinboim E (2011). Transportability of causal and statistical relations: A formal approach. 2011 IEEE 11th International Conference on Data Mining Workshops, 540–547. [Google Scholar]
  34. Polanin JR, & Williams RT (2016). Overcoming obstacles in obtaining individual participant data for meta-analysis. Research Synthesis Methods, 7(3), 333–341. [DOI] [PubMed] [Google Scholar]
  35. Robins JM, & Greenland S (2000). Causal inference without counterfactuals: Comment. Journal of the American Statistical Association, 95(450), 431–435. [Google Scholar]
  36. Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. [Google Scholar]
  37. Rubin DB (1980). Randomization analysis of experimental data: The Fisher randomization test comment. Journal of the American Statistical Association, 75(371), 591–593. [Google Scholar]
  38. Rubin DB (2010). Reflections stimulated by the comments of Shadish (2010) and West & Thoemmes (2010). Psychological Methods, 15(1), 38–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Rudolph KE, & van der Laan MJ (2017). Robust estimation of encouragement-design intervention effects transported across sites. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 79(5), 1509–1525. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Saegusa T (2019). Large sample theory for merged data from multiple sources. Annals of Statistics, 47(3), 1585–1615. [Google Scholar]
  41. Schomaker M, & Heumann C (2018). Bootstrap inference when using multiple imputation. Statistics in Medicine, 37(14), 2252–2266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Schwab-Stone ME, Shaffer D, Dulcan MK, Jensen PS, Fisher P, Bird HR, Goodman SH, Lahey BB, Lichtman JH, Canino G, Rubio-Stipec M, & Rae DS (1996). Criterion validity of the NIMH Diagnostic Interview Schedule for Children Version 2.3 (DISC-2.3). Journal of the American Academy of Child and Adolescent Psychiatry, 35(7), 878–888 [DOI] [PubMed] [Google Scholar]
  43. Sobel M, Madigan D, & Wang W (2017). Causal inference for meta-analysis and multi-level data structures, with application to randomized studies of Vioxx. Psychometrika, 82(2), 459–474. [DOI] [PubMed] [Google Scholar]
  44. Strobl C, Hothorn T, & Zeileis A (2009). Party on! A new, conditional variable-importance measure for random forests available in the party package. The R Journal, 1, 14–17. [Google Scholar]
  45. Tierney JF, Vale C, Riley R, Smith CT, Stewart L, Clarke M, & Rovers M (2015). Individual Participant Data (IPD) meta-analyses of randomised controlled trials: Guidance on their use. PLoS Medicine, 12(7), e1001855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. van Buuren S, & Groothuis-Oudshoorn K (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3). [Google Scholar]
  47. VanderWeele TJ (2009). Concerning the consistency assumption in causal inference. Epidemiology (Cambridge, Mass.), 20(6), 880–883. [DOI] [PubMed] [Google Scholar]
  48. Westreich D, Edwards JK, Lesko CR, Stuart E, & Cole SR (2017). Transportability of trial results using inverse odds of sampling weights. American Journal of Epidemiology, 186(8), 1010–1014. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES