Abstract
Combining and analyzing data from heterogeneous randomized controlled trials of complex multiple-component intervention studies, or discussing them in a systematic review, is not straightforward. The present article describes certain issues to be considered when combining data across studies, based on discussions in an NIH-sponsored workshop on pooling issues across studies in consortia (see Belle et al. in Psychol Aging, 18(3):396–405, 2003). Several statistical methodologies are described and their advantages and limitations are explored. Whether weighting the different studies data differently, or via employing random effects, one must recognize that different pooling methodologies may yield different results. Pooling can be used for comprehensive exploratory analyses of data from RCTs and should not be viewed as replacing the standard analysis plan for each study. Pooling may help to identify intervention components that may be more effective especially for subsets of participants with certain behavioral characteristics. Pooling, when supported by statistical tests, can allow exploratory investigation of potential hypotheses and for the design of future interventions.
Keywords: Statistical pooling of studies, Random-effects meta-analysis, Study-level meta-regression, Multilevel meta-regression, Multilevel structural models
INTRODUCTION
Randomized controlled trials (RCTs) are considered the gold standard experimental study design for establishing the causal effect of an intervention on an outcome of interest. RCTs are usually designed to have high internal validity in addressing specific hypotheses but may have less external validity as their inclusion and exclusion criteria may be very restrictive. Often there are many similar trials addressing the same type of research hypotheses but with different target populations, settings, or outcome measures. Such trials may not evaluate exactly the same intervention, especially in trials of interventions that include combinations of multiple behavioral, social, pharmacological and/or environmental components.
A question to consider is whether there are benefits from combining data from several studies. The combining of data from various RCTs can be useful in applications beyond estimation of the overall intervention effect. For example, it may be informative to combine the data for increasing sample sizes of subgroups in which to examine the intervention effect, or to increase the number of events for secondary outcomes, or to reduce variances and obtain more precise confidence intervals for outcomes and adverse events. An alternative to combining the results from various small trials would be to undertake a large definitive trial, i.e., one that establishes conclusively the safety and efficacy of a proposed intervention. However, such trials are not always feasible due to requiring very large sample sizes, long duration, large costs, or by the nature of the intervention (e.g., policy interventions).
In many situations, RCTs are of multicomponent interventions aimed at preventing conditions such as diabetes and obesity or for subjects having a high cardiovascular risk profile. Combining and analyzing the data from heterogeneous randomized controlled trials of complex multiple-component intervention studies, or discussing them in a systematic review, is not straightforward. The first important question is whether it is appropriate at all to combine data from a set of heterogeneous randomized controlled trials. Once the decision to combine the data or results of the various trials is made, the issue of how to combine the trials needs to be considered. A review of possible procedures concluded that the most serious methodological limitation is the question of what studies should be combined rather than how to combine them [1].
The objective of this manuscript is to describe certain statistical issues to be considered when combining data across studies, especially studies that share many commonalities, as in consortia studies. The present article describes certain aspects to be considered when combining data across studies that were discussed in an NIH sponsored workshop on “pooling issues across studies in consortia” (see [2]). Several statistical methodologies are described, and their advantages and limitations are explored. In addition, illustrations of combining data are given with reference to examples from the Childhood Obesity Prevention and Treatment Research (COPTR) consortium.
METHODOLOGICAL ISSUES IN COMBINING SIMILAR BUT DIFFERENT INTERVENTION STUDIES
To be combined, trials should address the same, or similar, research question(s) in similar populations and settings using similar intervention components and implementation approaches and having the same or similar outcome variables. However, strict inclusion criteria that attempt to define trials that are “very similar” may lead to an overly conservative decision that trials should not be combined unless all components are identical in all studies [3]. The more aspects they share in common, such as conceptual or theoretical framework, inclusion and exclusion criteria, recruitment methods, measures, timing of assessments, intervention approaches, and procedures of study implementation (e.g., training, quality assurance, and data management), the less heterogeneity and the more convincing the argument for combining will be. For example, tight definitions of behavioral therapies, or classifying the components using a common taxonomy, may better define exposure variables and strengthen the argument for combining across behavioral interventions.
An example of addressing this question comes from consortia funded by the National Institutes of Health to test the efficacy of a diverse set of obesity-related interventions at multiple sites across the country. The consortia include the COPTR (four studies) [4], the Early Adult Reduction of weight through LifestYle intervention (EARLY; seven studies) [5], the Obesity-Related Behavioral Intervention Trials (ORBIT; seven studies) [6], and the Lifestyle Interventions For Expectant Moms (LIFE-Moms, seven studies). COPTR is testing multilevel intervention approaches to prevent excess weight gain in youth and to reduce weight among overweight and obese youth. Targeted age groups are preschoolers (2–5-year-olds), preadolescents, and adolescents (7–14-year-olds) of diverse racial and ethnic groups in four different locations in the USA. EARLY is testing innovative behavioral approaches for weight control in young adults, 18–35 years of age, at high risk for weight gain. ORBIT is testing methods to translate findings from basic research on human behavior into more effective clinical, community, and population interventions to reduce obesity in a diverse group of subjects. LIFE-Moms is testing behavioral/lifestyle interventions in overweight and obese pregnant women designed to improve weight and metabolic outcomes among women and their children.
Within each of these consortia, the individual trials are each designed to be stand-alone studies with adequate power to address their respective primary hypotheses. There is interest to combine study data for several reasons: the potential to explore certain important secondary hypotheses that are not testable in any one study (i.e., new research questions, such as testing for geographical and other contextual effects that are typically constant within a single study), and more can potentially be learned from information across trials than the information available from each individual study. For example, it may allow the investigation of effect modification by type of study approach or by population or contextual characteristics. Because the studies within consortia have different study populations and intervention approaches, there are analytic challenges in exploring relationships when combining studies, even though they may all have a common outcome measure.
There are both potential advantages and disadvantages of combining data across studies. Potential advantages are larger sample sizes to provide more power to explore relationships and secondary hypotheses and the increased potential for improving the external validity of results by taking advantage of the heterogeneity among the studies in generalizing results to a wider context. Potential disadvantages are that combining different studies increases the overall variability, may produce spurious results, and could affect how the overall results are received by the scientific community. The heterogeneity among the studies can be such that it may actually reduce overall statistical power. In addition, conflicting results may make the overall result inconclusive, despite the analytic methodology, with wider confidence intervals due to the increased heterogeneity.
Thus, in deciding whether to combine data or not, the primary issues that must be considered include not only that they can address an important research question but that the studies are “sufficiently” comparable with respect to their conceptual framework and overall objectives as well as design and implementation features. The latter include participant eligibility criteria and characteristics; intervention settings, approach, components, timing, and actual implementation; outcome(s) of interest (e.g., how and when measured); and study conduct procedures (e.g., staff training, quality assurance, data management). Some studies are implemented in a more “pragmatic style” (flexibility in design characteristics), whereas others in a more “explanatory style.” It is important to devise statistical tests that can inform the decisions for combining the data from RCTs.
METHODS TO SUMMARIZE STUDIES WITHOUT PRODUCING A SUMMARY ESTIMATE
Systematic reviews are commonly conducted as a method for summarizing information from multiple studies that address the same or similar scientific questions. Combining the results from multiple similar randomized controlled trials to synthesize the empirical evidence related to a particular intervention is a well-established methodology in systematic reviews. Studies are required to meet strict prespecified criteria to be included in such a review. The methods and results of each study are considered separately, but results are not necessarily combined quantitatively to produce a summary result in a systematic review. Often the commonalities and differences among studies are summarized in text or in table format. A systematic review conducted over studies within a consortium would highlight the common aspects of design, approach, and data management and measurement that might be unique to that group of studies.
A comparative, as opposed to a summarizing, approach involves comparing the effect from one index study that was of particular interest to the effects found in other studies, one by one. This would provide a test of whether other studies corroborate (validate) the result of the index study. Although this does not provide an overall estimate of the intervention effect, it does provide insights into the underlying heterogeneity. It also prompts the investigator to search for reasons why some studies may be in agreement, while others are not.
A descriptive graphical approach that is useful for the comparison of studies is the “forest plot”. In these plots, the estimate of the intervention effect and its corresponding 95 % confidence interval are presented for each study as a line segment alongside each other. Studies may be arranged in alphabetical order, chronologically, or by size of effect. Forest plots facilitate visual assessment of results from multiple studies and can be used with or without the addition of a summary estimate of effect over the multiple studies.
POOLING METHODOLOGIES TO PRODUCE A COMBINED ESTIMATE
Having made the decision to combine study estimates from multiple RCTs to produce a single estimate, several methods can be considered. Combining by collapsing all observations into a single data set and ignoring study differences is often referred to as “lumping” the data. As illustrated by DeMets [1], this approach may produce misleading results. For example, different interventions in different studies could produce strong results, but in opposite directions, resulting in the analysis of collapsed data showing a null effect. Combining by “pooling” rather than by lumping is preferable, with the term pooling meant to convey a method that statistically adjusts for the study differences. There are several alternatives for pooling, described below.
A specific methodology that may be employed if there are two interventions and the outcome variable is binomial is the Mantel and Haenszel [7] method for combining data over several 2 × 2 contingency tables. For continuous variables, Mantel and Haenszel [7] suggested ANOVA-based approaches for summarizing intervention effects across studies. Instead of ANOVA, one can choose the flexibility of regression models to incorporate study and intervention interaction effects by including appropriate indicator variables. These models can also include study-level and subject-level covariates if information is available on these levels (see Models 1a–c in the Appendix).
Meta-analysis is a well-known approach for obtaining a common intervention effect from several similar trials. The heterogeneity among the individual studies’ estimates of effects, the within-study variance of the outcome measure(s), and a quality assessment of the studies are determined. Combining widely disparate measures into a single summary measure masks conceivably important differences and is often discouraged. If the studies meet a prespecified criterion of effect size homogeneity and other criteria for meaningful cross-study analyses, their individual results may be combined to produce an estimate of the intervention effect. A weighted pooled estimate is obtained, considering the inverse of each study’s variance, under the assumption that the larger the variance of a study, the lower the “quality” of its evidence and therefore the less weight it should have upon the overall effect estimate. This variance may be calculated using either a fixed effects or a random effects approach. The random effects approach attenuates the variance estimates and thus the weights by considering within-study and among-study information (see further below).
However, these approaches, which assume the same or similar interventions for all active arm participants and all control arm participants, do not work for multicomponent interventions that vary across sites, when in actuality, the active and/or control arm subjects at one site may be receiving a different intervention than the active and/or control arm subjects at another site.
Another possible meta-analytic methodology is multiple intervention meta-analyses, e.g., network meta-analysis. It is used when there are not enough head-to-head comparisons of multiple interventions and considers each randomized arm in calculating intervention effect estimates. It may be considered in situations where the same randomized arms are not included in all studies. In our situation, a given arm of a randomized study consists of an intervention with multiple components occurring simultaneously. Thus, the use of network meta-analysis, such as the use of “standard” meta-analysis, is not appropriate for multicomponent interventions.
Table 1 presents a summary of the advantages and disadvantages of four common methods that can be used to address multicomponent interventions: random-effects meta-analysis, meta-regression, multilevel meta-regression, a technique that includes individual participant-level and study-level data, and modeling of structural relationships.
Table 1.
Method | Advantages | Disadvantages |
---|---|---|
Random effects meta-analysis | • Provides a weighted estimate of the overall intervention effect • Weights are attenuated by incorporating information of variation within a study and variation (heterogeneity) across studies |
• Assumes a common intervention across all studies • Unable to handle multicomponent interventions • Does not incorporate information on covariates—whether study-level of subject-level |
Study-level meta-regression | • All the above, but residual heterogeneity across studies is now modeled by study-level covariates • Model can include indicator variables for the various components in multicomponent interventions, thus enabling study of the relative effects of each component • Modeling can include study-level covariates as mediators and moderators of the intervention component effects |
• Does not incorporate information on subject-level covariates • Assumes that components of the interventions are similar enough to be considered as the same—a taxonomy describing the modality of the component may be helpful |
Multilevel meta-regression | • All the above, but additional residual heterogeneity across studies is modeled by subject-level covariates • Modeling can include subject-level covariates as mediators and moderators of the intervention component effects |
• Assumes that components of the interventions are similar enough to be considered as the same—a taxonomy describing the modality of the component may be helpful • Assumes that the subject-level covariates are measured in similar or comparable manner across studies |
Modeling structural relationships | • Uses subject-level information • May identify the underlying factors possibly affecting the treatment effects • Accounts for hypothesized causal pathways |
• Subject-level explanatory variables should be comparable |
Random-effects meta-analysis
In meta-analysis, one is modeling the intervention effect, which is the same as modeling the expected value of the outcome in two-arm studies. In standard fixed-effects meta-analysis, the assumption is that there is a common intervention effect and that each observed study outcome effect differs from the true effect by an amount defined as the “error term,” which is assumed to be normally distributed. If one is willing to assume that the studies are a random sample from a potential pool of all other similar studies, one can assume that each study’s effect varies around its own true study effect, thereby decomposing the total variance for the estimate of the intervention effect into a within-study variance and a between-study variance.
While the fixed-effects meta-analysis approach is widely used, it assumes that there is little heterogeneity in study effects across the various trials. The random effects approach differs from the fixed effects approach in that it considers heterogeneity information across the trials in calculating a trial’s variance, while the fixed effects approach utilizes only within-study information for calculating a study’s variance. Using a random effect approach in a meta-analysis does statistically adjust for some of the heterogeneity across studies. However, there may still be residual heterogeneity among the studies and meta-analytic techniques cannot account for multiple-component interventions.
Study-level meta-regression
The technique of modeling the study level outcome by incorporating study-level covariate information is called meta-regression. To account for the additional or residual heterogeneity among the studies because of different intervention approaches or participant characteristics, one can model the outcome using study-level covariates and thus adjust for the effects of each study upon the outcome as well as upon the effect of the intervention [8]. A generalized linear mixed effects regression model on the primary study outcome is constructed, with an indicator variable for intervention arm and study-level covariates that may be potential effect modifiers (moderators) of the intervention effect or potential confounders of the intervention effect. The introduction of study-level covariates in a meta-regression may explain some of the heterogeneity due to study differences [9].
The issue of whether to include each study’s effects as a random or fixed effect is not straightforward. DeMets [1] cautioned against including random effects for the studies since this can imply that they are a random sample of a specified universe of studies. Moreover, one would require a large number of studies for estimating variances of the random effects for studies. It would thus seem appropriate to include study effects as fixed effects in the pooled estimation. On the other hand, including study effects as random variables accounts for the heterogeneity among the study effects due to the unobserved sources. Including study effects as random effects can be justified by the interest in adjusting for the studies’ source of variability rather than in interpreting those effects. Modeling random effects can lead to narrower confidence intervals around the estimates of intervention effectiveness [8].
In order to understand the intervention effect when there are multiple components to the intervention, Bangdiwala et al. [8] proposed to include indicator variables for each component across the different studies. To avoid component effects being confounded with the study effect, it is important that a study uses more than one component and that a particular component be used in more than one study. However, since it is likely that components are not exactly the same across studies, in order to consider components as “similar,” a common taxonomy could be utilized ([10] ) (see Tate et al. in this issue). Note that the control or “standard care” arm may include some “active” components, and they would also need to be accounted for in the analysis. Having fit a meta-regression, one can then look at the coefficients of each of the component indicator variables to assess their relative contribution to the overall outcome. See Appendix Model 2 for an illustration using the COPTR consortium.
Multilevel meta-regression
Within consortia, investigators have the possibility of obtaining participant-level information in addition to study-level information. The latter might include various aspects of the interventions such as delivery characteristics, implementation strategies, and mechanisms of action [11]. The meta-regression model can be expanded to include such information and is then called multilevel meta-regression (see Model 3 in Appendix). Moderately large heterogeneity among the studies’ target populations, intervention content and modalities, and other aspects may be addressed using study-level along with participant-level covariates [9].
In the Resources for Enhancing Alzheimer’s Caregiver Health (REACH) consortium, this analytic approach was used to allow investigators to include in a single model both participant-level information and individual elements of multicomponent interventions at the study-level to examine the relationships between those elements and outcomes [2, 12]. The REACH interventions were complex multifaceted behavioral interventions, with various components. A natural question is which components are more effective, but since not all studies had the same components in their interventions, REACH investigators decomposed the complex interventions into 12 components (e.g., caregiver affect, care-recipient behavior, knowledge about the social environment), and relationships between the components and outcome were examined. By so doing, main effects and interactions, both within levels (participant, study) and across those levels, were examined.
Modeling structural relationships
Multilevel meta-regression models may account for the heterogeneity among studies and for the effects of the various components across studies but fall short of considering the causal pathways, whether testing those mechanisms is an explicit objective or not. The paths are present in the overall framework for the study, which is why having a common “framework” is crucial when pooling data. Whether those paths are measured and tested or not, they exist and affect the intervention impact. To the extent they can be modeled, they provide richer explanations for the variation in response.
Population-based interventions initially induce behavioral changes among the subjects that, in turn, affect the outcomes of interest [13]. For example, making parents aware of the importance of healthy diets and greater physical activity for children, as is common in childhood obesity interventions, may lead to changes in parental behavior that in turn reduce childhood obesity. Similarly, highly motivated women in the Women’s Health Trial: Feasibility Study in Minority Populations were seen to make healthful dietary changes especially in the intervention group [14]. It is important to analyze the data from RCTs in a broad framework and investigate the pathways underlying the intervention effects. Moreover, exploratory analyses of pooled data from multicenter and/or similar trials can provide insights for the future design of effective interventions.
Multigroup structural equation modeling with mean structures (MG-SEM) [15, 16] is an alternative to the regression approach that accommodates multiple components and pathways. The structural model is specified in each study population separately, and common parameters are constrained to be equal across study groups. Lagrange multipliers are used to determine if constraints significantly worsen the model fit. When a constraint does not hold, parameters are estimated separately in each group. The study variables can be defined at the latent variable level by different combinations of observed variables, and the differences in construct reliability can be taken into account. The validity of latent constructs can be tested under certain identifying assumptions on variances of the variables. This methodology has been used in social sciences but is not common in the evaluation of clinical trials [17, 18]. As discussed next, rigorous testing of the constancy of model parameters can also proceed in the regression framework by applying likelihood ratio tests and taking into account the unobserved between-subject differences via random effects.
STATISTICAL TESTS FOR JUSTIFYING POOLING
The interpretations of treatment or intervention effects in randomized controlled trials can be complex [19, 20]. Pooling data from various studies may be useful for obtaining information that could not be gleaned from individual studies and can improve precision of estimates of intervention or intervention component effects. However, it is important to apply likelihood ratio and other statistical tests for assessing the validity of the pooled estimates for avoiding potentially misleading inferences.
From the standpoint of rigorous justifications for pooling data from similarly designed RCTs, likelihood ratio statistics can be applied to test for the constancy of model parameters across sites [21]. This is especially appealing in true multisite trials and in situations where similar study designs are used for different population groups and relevant explanatory variables are available. For example, in the Women’s Health Trial: Feasibility Study in Minority Populations, the effects of subjects’ “unhealthy eating habits” and dietary intakes on body weight were likely to differ for control and intervention groups. By including separate intercept terms for control and intervention groups, the empirical models enabled testing of the null hypothesis that model parameters are the same for the two groups. In this case, the value of the likelihood ratio statistic was significant [21]. Of course, if the null hypothesis had not been rejected, it would have provided some justification for pooling the data for the two groups.
Further, the null hypothesis of constancy of model parameters may be rejected in certain applications via the use of likelihood ratio tests because the populations differ in important respects such as behavioral and socioeconomic aspects. In such circumstances, it would seem prudent not to pool the data for increasing the sample sizes since that might entail increasing biases in the estimated parameters. However, as noted above, some a priori information can be incorporated in pooled analyses. For example, suppose that in an RCT, the effect of an intervention is significant and the estimated model parameters indicate that an explanatory variable such as subjects’ “participation motivation” was associated with the changes. Then, it may be useful to test if the coefficient of participation motivation does not differ statistically for other population groups for which smaller numbers of observations might be available. This null hypothesis can be tested using Lagrange Multiplier type tests [22] that require model estimation only under the null hypothesis. Moreover, Wald statistics can be applied to test the null hypothesis by estimating the model under the more general alternative hypothesis. In addition, likelihood ratio statistics are insightful since investigators can assess robustness of the estimated parameters under the null and alternative hypotheses [23, 24]. Such statistical tests can be extended to situations where the errors may not be normally distributed though possess finite fourth order moments [23].
CONCLUSION
The question of whether to combine data across studies, such as may be seen in a consortium, does not have a simple answer. Difficult issues to consider are how to approach the problems and how to decide whether it will be useful to combine the data. Combining data from heterogeneous studies can lead to spurious results and conclusions. The argument of combining to achieve higher statistical power for the primary research hypotheses within a consortium of studies might be a weak one since each trial within a consortium is typically adequately powered to address those hypotheses. The potential for addressing the intervention effects within subgroups by pooling, for improving external validity, for asking research questions that are not possible to test in the individual studies such as examining intervention components, and for addressing secondary outcomes with increased power, may be quite attractive. If one decides to pool the data, heterogeneity among the studies’ procedures and uniqueness of subject selection criteria and other important characteristics make it necessary to apply analytical and statistical tools that attempt to address these issues.
There are many potential methodologies for pooling data across studies [25]. Whether weighting the different studies data differently, or via employing random effects, one must recognize that different pooling methodologies may yield different results. It is important to spell out the conceptual framework employed as well as the specific research questions for the pooled data a priori and apply appropriate statistical techniques.
As stated earlier, the objective of this manuscript is to describe certain issues to be considered when combining data across studies, especially studies that share many commonalities, as in consortia studies. In such situations, the number of studies is predetermined and out of the control of investigators. In actual implementation of the methods described, the number of studies needed would be based on the desired precision for the actual estimation of effects and will depend on the variability of the outcome variable. The more studies, the better the precision will be.
This manuscript is not proposing “new” methods, but bringing them together in one place to provide researchers with the advantages and the limitations of the currently available methodologies. The approaches presented here, whether modeling by random-effects meta-regression or using multilevel structural equation models, involve adjusting for the increased heterogeneity in the data due to aggregating information across multiple studies. As for all data syntheses, the number of studies available for pooling is a consideration when using any of these techniques. In the modeling, it is possible to test for interaction and constancy of model parameters across studies in the pooled models via likelihood ratio and other tests. One can also set up sequential tests in certain cases where the hypotheses are nested [23, 26].
In summary, pooling can be used for comprehensive exploratory analyses of data from RCTs and should not be viewed as replacing the standard analysis plan for each study. As noted above in the context of dietary interventions, pooling may help to identify new hypotheses about intervention components that may be more effective especially for subsets of participants with certain behavioral characteristics. Pooling, when supported by statistical tests, can allow exploratory investigation of interesting potential hypotheses and for the design of future interventions.
Acknowledgments
This manuscript is one of three presented in this journal and was supported the NIH National Heart, Lung, and Blood Institute, the Eunice Kennedy Shriver National Institute of Child Health and Development, the NIH Office of Behavioral and Social Sciences Research, the NIH Office of Disease Prevention, and the Centers for Disease Control and Prevention. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
APPENDIX
Modeling overall intervention effects
-
Model 1a: Modeling individual-level outcomes from multiple single-component intervention trials using fixed effects
Let Yij denote the outcome for the jth person in the ith study, and Iij is a dummy variable denoting the study arm assignment for the same individual (=1 for active arm, =0 for control arm), then
could be a potential model examining the effect β1 of the intervention after accounting for the interaction of the intervention with a study-level covariate X1i and with a subject-level covariate X2ij. The error terms eij are assumed to follow an N(0,σ2) distribution. If X1i is a categorical-level variable, this is essentially stratification (or blocking) analysis. If Xi1 is a series of dummy variables identifying the various studies, one is then treating each study as a stratum in a single “stratified large study.”1a -
Model 1b: Modeling study-level outcomes from multiple single-component intervention trials using fixed effects
Since we do not have the individual level information, let Ei denote the observed effect in the ith study, which could be a difference in means between the intervention and control arms for continuous variables, or the log odds ratio of the probability of an event for binary outcomes, then
could be a potential model examining the overall effect μ of the intervention after accounting for the effects of three study-level covariates (X1i, X2i, X3i). The error terms ei are assumed to have an N(0,σ2) distribution for the variation in each study’s estimate of the common effect μ.1b -
Model 1c: Modeling study-level outcomes from multiple single-component intervention trials using random effects
Since we do not have the individual level information, let ei denote the observed effect in the ith study, which could be a difference in means between the intervention and control arms for continuous variables, or the log odds ratio of the probability of an event for binary outcomes, then
could be a potential model examining the overall effect μ of the intervention after accounting for the fixed effects of three study-level covariates (X1i, X2i, X3i) and the random effects ζi, assumed to be N(0,τ2) and independent of the errors ei. The random effects help decompose the total variance in study effects into a component due to across study variation (τ2) and a within-study variation (σ2).1c -
Model 2: Modeling study-level outcomes from multicomponent intervention trials using random effects meta-regression
For illustration purposes, we use the COPTR consortium, where the primary outcome is body mass index change [ΔBMI]. For simplicity of illustration, assume that each of the four studies in the consortium has multicomponent interventions addressing obesity but that three main modalities are common to all—C1 = education modality, C2 = physical activity modality, and C3 = dietary modality. Note that all studies do not necessarily have to offer all modalities as part of their multicomponent intervention and that what they offer may differ within a modality. The Cs above denote indicator variables for whether it is offered or not as part of the intervention in a given study. For example, C1i = 1 if an education component is offered in the ith study, =0 if not. If in addition to the intervention components we have two study-level covariates—say, W1 = proportion of males (assume as a mediator) and W2 = mean age of individuals (assume as a moderator of education component), for example, then, the random-effects meta-regression model with only study-level covariates would be
where we are interested in the overall effect μ but also in the fixed coefficients βs. The ζs are the study random effects that help study the variance components.2 -
Model 3: Modeling individual-level outcomes from multicomponent intervention trials using random effects multilevel meta-regression
In a consortium, one expects to be able to have individual-level information and can thus model the individual change or effect within a person. Using the COPTR example of Model 2, but for individual-level outcomes, we now have that the jth subject in the ith study may have been randomized to receive or not the kth component Ck. Thus, we can model the within-person effect with individual level covariates sex (S) and age (A):
where we are interested in the overall effect μ but also in the fixed coefficients βs. We should point out that any subject-level covariate would ideally be measured using the same instrument or method (i.e., have a set of common metrics) across studies.3
Compliance with ethical standards
Conflicts of interest
All authors have completed the disclosure form on all relationships or interests that could influence or bias the work, and there are no conflicts of interests from any authors.
Adherence to ethical principles
All authors have adhered to ethical principles and maintain the integrity of the research and its presentation by following the rules of good scientific practice.
Footnotes
Implication statements
Policy: Pooling the data across studies, when supported by statistical tests, may facilitate the investigation of hypotheses for improving the design of future interventions and informing policy makers about the interventions that are most likely to be effective.
Practice: Pooling the data from similarly designed randomized controlled trials may be useful for identifying intervention components that may be more effective for diverse participants.
Research: Because combining the data from heterogeneous studies may lead to spurious results, it is important to develop statistical procedures for assessing the validity of models estimated using the pooled data.
References
- 1.DeMets DL. Methods for combining randomized clinical trials: strengths and limitations. Stat Med. 1987;6:341–348. doi: 10.1002/sim.4780060325. [DOI] [PubMed] [Google Scholar]
- 2.Belle SH, Czaja SJ, Schulz R, et al. Using a new taxonomy to combine the uncombinable: integrating results across diverse interventions. Psychol Aging. 2003;18(3):396–405. doi: 10.1037/0882-7974.18.3.396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Spinks A, Turner C, Nixon J, McClure RJ (2009) The ‘WHO Safe Communities’ model for the prevention of injury in whole populations. Cochrane Database of Systematic Reviews Issue 3. Art. No.: CD004445. DOI:10.1002/14651858.CD004445.pub3. [DOI] [PMC free article] [PubMed]
- 4.Pratt CA, Boyington J, Esposito L, et al. Childhood obesity prevention and treatment research (COPTR): interventions addressing multiple influences in childhood and adolescent obesity. Contemp Clin Trials. 2013;36(2):406–413. doi: 10.1016/j.cct.2013.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lytle LA, Svetkey LP, Patrick K, et al. The EARLY trials: a consortium of studies targeting weight control in young adults. Translat Behav Med. 2014;4(3):304–313. doi: 10.1007/s13142-014-0252-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Czajkowski SM, Powell LH, Adler N, et al. (2015) From ideas to efficacy: The ORBIT model for developing behavioral treatments for chronic diseases. Health Psychology (to appear in print) [DOI] [PMC free article] [PubMed]
- 7.Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959;22:719–748. [PubMed] [Google Scholar]
- 8.Bangdiwala SI, Villaveces A, Garrettson M, et al. Statistical methods for designing and assessing the effectiveness of community-based interventions with small numbers. Int J Inj Control Saf Promot. 2012;19(3):242–248. doi: 10.1080/17457300.2012.704050. [DOI] [PubMed] [Google Scholar]
- 9.Morton SC, Adams JL, Suttorp MJ, Shekelle PG (2004) Meta-regression approaches: what, why, when, and how?, Technical Review 8, Agency for Healthcare Research and Quality Publication No. 04–0033. Rockville.
- 10.O’Connor DP, Lee RE, Mehta P, et al. Childhood obesity research demonstration project: cross-site evaluation methods. Childhood Obes. 2015;11:92–103. doi: 10.1089/chi.2014.0061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Schulz R, Czaja SJ, McKay JR, et al. Intervention taxonomy (ITAX): describing essential features of interventions. Am J Health Behav. 2010;34(6):811–821. doi: 10.5993/AJHB.34.6.15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Czaja SJ, Schulz R, Lee CC, et al. A methodology for describing and decomposing complex psychosocial and behavioral interventions. Psychol Aging. 2003;18(3):385–395. doi: 10.1037/0882-7974.18.3.385. [DOI] [PubMed] [Google Scholar]
- 13.Bhargava A. Randomized controlled experiments in health and social sciences: some conceptual issues. Econ Hum Biol. 2008;6:293–298. doi: 10.1016/j.ehb.2008.01.001. [DOI] [PubMed] [Google Scholar]
- 14.Bhargava A, Hays J. Behavioral variables and education are predictors of dietary change in the women’s health trial: feasibility study in minority populations. Prev Med. 2004;38(4):442–51. doi: 10.1016/j.ypmed.2003.11.014. [DOI] [PubMed] [Google Scholar]
- 15.Jöreskog KG. Simultaneous factor analysis in several populations. Psychometrika. 1971;36:409–426. doi: 10.1007/BF02291366. [DOI] [Google Scholar]
- 16.Sörbom D. A general method for studying differences in factor means and factor structure between groups. Br J Math Stat Psychol. 1974;27:229–239. doi: 10.1111/j.2044-8317.1974.tb00543.x. [DOI] [Google Scholar]
- 17.Duncan TE, Duncan SC, Strycker LA. An introduction to latent variable growth curve modeling: concepts, issues, and application. 2. Mahwah: Lawrence Erlbaum Associates, Inc.; 2006. [Google Scholar]
- 18.Rabe-Hesketh S, Skrondal A, Pickles A. Generalized multilevel structural equation modeling. Psychometrika. 2004;69:167–190. doi: 10.1007/BF02295939. [DOI] [Google Scholar]
- 19.Cox D. Planning of experiments. New York: John Wiley & Sons; 1958. [Google Scholar]
- 20.Fisher RA. The design of experiments. Edinburgh: Oliver and Boyd; 1935. [Google Scholar]
- 21.Bhargava A, Guthrie J. Unhealthy eating habits, physical exercise and macronutrient intakes are predictors of anthropometric indicators in the women’s health trial: feasibility study in minority populations. Br J Nutr. 2002;88(6):719–28. doi: 10.1079/BJN2002739. [DOI] [PubMed] [Google Scholar]
- 22.Rao CR. Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. Proc Camb Philos Soc. 1948;44:50–57. doi: 10.1017/S0305004100023987. [DOI] [Google Scholar]
- 23.Bhargava A. Wald tests and systems of stochastic equations. Int Econ Rev. 1987;28:789–808. doi: 10.2307/2526579. [DOI] [Google Scholar]
- 24.Sargan JD. Some tests of dynamic specification for a single equation. Econometrica. 1980;48:879–898. doi: 10.2307/1912938. [DOI] [Google Scholar]
- 25.Weiner BJ, Lewis MA, Clauser SB, et al. In search of synergy: strategies for combining interventions at multiple levels. J Natl Cancer Inst Monogr. 2012;44:34–41. doi: 10.1093/jncimonographs/lgs001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wald A. Sequential analysis. New York: Dover Publications; 1947. [Google Scholar]