Abstract
Objective
Findings from multilevel and latent growth modeling analysis (GMA) need to be included in literature reviews, and this article explicates four rarely discussed approaches for using GMA studies in meta-analysis.
Method
Extant and new equations are presented for calculating the effect size (d) and its variance (v) from reported statistics from GMA studies with each method, and a fixed effects meta-analysis of results from five randomized clinical trials was conducted to demonstrate their applications.
Results
Two common practices that were known to introduce bias in effect sizes because of attrition, confounding of treatment effects with the intraclass correlation, measurement errors, and probable violations of assumptions limited to classical analysis were found to yield smaller effects sizes from retrieved studies than were obtained with a newer model-based framework and its associated GMA d statistic.
Conclusions
The optimal strategy for including a GMA study in a meta-analysis is to use GMA d and its v calculated with the standard error of the unstandardized coefficient for the treatment effect. When that standard error is unknown, the use of GMA d and its v estimated with an alternative equation that requires only GMA d and sample size is recommended.
Keywords: effect sizes, meta-analysis, randomized clinical trials, multilevel analysis
Two revolutions in methodology that are improving the quality of evaluations of efficacy of interventions include: (a) growth modeling analysis (GMA)–multilevel analysis, latent growth models, and hierarchical linear models (Bollen & Curran, 2006; Goldstein, 2011; Hedeker & Gibbons, 2006)–in primary research and (b) meta-analysis (Cumming, 2013; Schmidt & Hunter, 2014) in literature reviews. The effect size and its variance (v) are the key statistics used in a meta-analysis (Borenstein, Hedges, Higgins, & Rothstein, 2009). Until recently, however, there has been little consensus regarding both the conceptualization and calculation of effect sizes for hypotheses tested with GMA (Feingold, 2009; Odgaard & Fowler, 2010), and inclusion of GMA findings in meta-analyses has rarely been addressed. Thus, meta-analysts who retrieve GMA studies for a research synthesis often calculate a standardized mean difference (Cohen’s d) between the treatment and control group for each study using the reported means and the pooled within-group standard deviation (SD) for study completers at the final time point.
However, GMA requires fewer statistical assumptions that are likely to be violated in practice than classical analysis (Gibbons, et al., 1993)–a key reason the former has been supplanting the latter in intervention evaluations (Feingold, 2009). For example, GMA does not require the unrealistic assumption of homogeneity variance across time and conditions. GMA also has less restrictive statistical assumptions regarding attrition than classical analysis because GMA typically uses maximum likelihood estimation that benefits from advances in missing data theory (Little & Rubin, 2002).
In addition, meta-analysis texts (e.g., Rosenthal, 1991) typically provide an equation for calculating d from a reported t ratio for the treatment effect from a retrieved study,
(1) |
In a GMA, the ratio of the coefficient for the effect of group on slope to its standard error (SE) is a critical ratio (CR; e.g., t or z) that is routinely reported as the significance test of the treatment effect. Thus, Equation 1 has been applied to calculate standardized effect sizes for hypothesis tests in GMA studies (see review by Feingold, 2009), and it would appear to offer an alternative approach that can be used in meta-analyses that include GMA findings. However, Equation 1 should only be used with CRs from between-subjects (completely randomized) designs. When applied to findings from repeated-measures designs (including GMA, repeated-measures analysis of variance, paired t-tests, and analysis of covariance), the effect size obtained with Equation 1 is not in the same metric as the classical d because effect potency is confounded with the magnitude of the intraclass correlation (ICC) or the pretest-posttest correlation (Feingold, 2009; Morris & DeShon, 2002) and is thus not suitable for use in meta-analysis.
A more recently introduced approach for obtaining effect sizes from a GMA that does not confound effect magnitude with the ICC entails a transformation of the unstandardized coefficient (b) for the slope difference between the two independent groups into a standardized mean difference (a Cohen’s d equivalent) between the groups at the end of the study,
(2) |
where b is the unstandardized coefficient for a fixed effect of a binary time-invariant covariate (e.g., condition, dummy coded) on slope for linear trend (Feingold, 2009, 2013). Thus, b is the difference in the rate of change in the outcome between the two groups per unit of time (e.g., per week, given weekly assessments that are coded to differ by one point between them, such as 0, 1, 2, and 3). (Neither b nor the GMA d from which it is derived are affected by centering of the time variable.)
Duration is the number of time points minus one when time codes differ by one point, and is thus the length of the study based on units associated with the coefficient (e.g., number of weeks from baseline if b represents the difference in rate of change per week). In meta-analytic applications, SD may be calculated by pooling the variability from the two groups at baseline (Feingold, 2009).1 Given random assignment and a correctly specified linear GMA with complete data, GMA d has the same expected value as that of the standardized mean difference (classical d) between the two groups at the end of the study (Feingold, 2015).
A meta-analysis requires the v for each effect size to be included in it. For classical d,
(3) |
where n1 is the sample size of group 1, n2 is the sample size of group 2 and N is the total sample size (Borenstein et al., 2009).
For GMA d,
(4) |
where is the estimated variance of the unstandardized coefficient for the effect of group on slope, and duration and SD are the values used in the calculation of the respective GMA d (see Feingold, 2015, for the derivation and validation of this equation, and for a worked example of calculations of GMA d and v from published study statistics). The v obtained with Equation 4 is an approximation in part because the equation should include σ (the pooled population SD of the outcome) rather than SD. However, because σ is ordinarily unknown, it must be estimated from SD, which adds a negligible amount of bias to the estimation of v (Feingold, 2015).
If a GMA study reported GMA d,
(5) |
and Equation 5 obviates the need to extract or calculate values for duration and SD required by Equation 4. When a retrieved study does not report SEb, SEb can often be calculated from other provided statistics. For example, when a document had reported both b and a CR as the test of its statistical significance,
(6) |
where b in the unstandardized coefficient for the effect of group on slope and CR is the ratio of b to its SE (which is labeled t, z, or Est./S.E. in different GMA software output files).
If a study reports a 95% confidence interval (CI) rather than a CR for b,
(7) |
where UCLb is the upper confidence limit for b.
Equations 4 and 5 require SEb (whether SEb is extracted or calculated with Equation 6 or Equation 7) and GMA studies have not always reported either SEb or the statistics from which it can be obtained. An alternative method for calculating v for GMA d that does not require SEb is the independence-assumed approach (IAA; Feingold, 2009). The IAA presumes that v has the same sampling distribution as the classical d obtained from an independent-groups (between-subjects) design (with the number of subjects in each group assessed at the end of the study). Although the presentation of the IAA did not include an equation for calculating v for GMA d, it was implicit that the IAA would determine v with the same formula used to obtain v for classical d (Equation 3) but with the d used in Equation 3 being the GMA d calculated with Equation 2.
The use of the IAA to estimate the v for the GMA d from published statistics can be illustrated with reported findings from a GMA of data on attitudes towards deviance from the National Youth Study (Elliott, Huizinga, Menard, 1989) conducted by Raudenbush (1995) that was previously used to demonstrate the computation of the GMA d and its v when the latter was calculated with SEb (Feingold, 2015). The participants were 122 boys and 117 girls (N = 239) whose attitudes were measured five times over a four-year period. The GMA d for the sex difference in attitude growth was .149. With the IAA, v is calculated with Equation 3, v = (239/14274) + (.0222/478) = .017. In other words, the v for the GMA d is calculated as if the GMA d were a classical d obtained by comparing the two sexes in deviant attitudes at the final time point. The IAA is thus a hybrid approach using the GMA d associated with the SEb approach and v calculated with the classical method (but substituting the GMA d obtained with Equation 2 in Equation 3).
The IAA can be also be used when the GMA d is indeterminate but can be estimated from p values (Rosenthal, 1991). If, for example, the only information available about the effect from a GMA study is its sample size and a note that the treatment effect was not statistically significant, the GMA d can be presumed to be zero and v can be calculated with the first term in Equation 3, as the second term is zero and drops out of the equation. If the GMA effect was only reported to be statistically significant at, say, the .05 level but the number of participants in each group at the end of the study is known, the IAA would first entail the calculation of the classical d that would attain significance at p < .05 from a t test for independent groups, and that d–an estimate of the GMA d–would be used in the second term in Equation 3.
An important weakness of the IAA is that an identical GMA d from different studies using the same model would have different SEs when there are differences in the residual variances that are affected by the ICC. However, neither the ICC nor the residual variance is included in Equation 3. Although the effect size should be independent of the correlations among the repeated measures (the ICC), v decreases as the ICC increases in both classical analysis (Morris & DeShon, 2002) and GMA (Feingold, 2009).
The primary purpose of this study is to conduct a small meta-analysis of RCTs examined with GMA to demonstrate (a) the use of four approaches (IAA, SEb, classical, and t to d) for conducting quantitative reviews that include GMA findings, (b) that Equation 1 (i.e., the t to d approach) does not produce the unbiased effect size (GMA d) obtained with Equation 2, and (c) the application of new equations useful to meta-analysts (e.g., for the calculation of SEb using the CR and b).
Method
Location of Studies and Inclusion Criteria
A computer search of articles published in Journal of Consulting and Clinical Psychology was conducted to locate RCTs that cited the article that introduced the GMA d (Feingold, 2009). Five studies were included in the meta-analysis that met three criteria: (1) used three or more time points, (2) reported the GMA d for the difference in linear growth rate on a continuous outcome between an intervention and a control group (i.e., not between two active interventions), and (3) included the observed means, SDs, and ns of the outcome variable from the data collected from study completers at the final time point. Although all five studies meeting these criteria had multiple dependent measures examined separately, the primary outcome was always specified, and the findings from that outcome were used in the meta-analysis.
Calculation of Effect Sizes and their Variances for Each Approach
IAA
The reported GMA d and subsample size for each of the two groups of completers at the end of the study were used to calculate v with Equation 3 (using d calculated with Equation 2).
SEb
The b for the treatment effect (unstandardized coefficient for the slope difference between groups) for the primary outcome was extracted from all studies. However, only two of the five studies also reported SEb. In two of the remaining studies (Olthuis, Watt, Mackinnon, & Stewart, 2014; Safren, et al., 2012), the authors reported the CR for b, and SEb was calculated with Equation 6. Hien et al. (2015) reported the CI for b, and SEb was thus calculated with Equation 7 for that study. The SEbS extracted or calculated for each study was then used to obtain the v for each reported GMA d with Equation 5.
Classical
The effect size used for this approach was Cohen’s d,
(8) |
where MT is the mean of the treatment group at the end of the study, MC is the mean of the control group, and SD is the pooled within-group standard deviation among study completers at end of study. The variance of the classical d was calculated traditionally with Equation 3.
t to d
Equation 1 was used to calculate d for this approach, and its v was obtained with Equation 3 (using baseline sample sizes).
Data Analysis
A fixed effects meta-analysis was conducted using the effect sizes and their variances (ds and vs) obtained with each of the four approaches. The mean ds were obtained by weighting the ds from the individual studies by the inverse of their vs (Borenstein et al., 2009) and the pooled vs were used to calculate the 95% CIs for the weighted mean ds. A test of the homogeneity of the five effect sizes was also conducted in each meta-analysis.
Results
Table 1 reports the main characteristics from each study used in the four analyses, including (a) the authors and print publication year, (b) the primary outcome, for which the GMA d was extracted, (c) the sample sizes at baseline and at final time point, and (d) the d and v for each approach. The results from the meta-analysis are reported in Table 2.
Table 1.
Study | Primary Outcome | Effect Sizes (ds) and Variances (vs) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sample Size | IAA | SEb | Classical | t to d | |||||||||
NBT | NBC | NET | NEC | d | v | d | v | d | v | d | v | ||
Hien et al., 2015 | CAPS Total | 32 | 37 | 21 | 22 | .65 | .098 | .65 | .109 | .31 | .094 | .49 | .060 |
Kerr et al., 2014 | CES-D | 81 | 85 | 70 | 70 | .38 | .029 | .38 | .030 | .05 | .029 | .34 | .024 |
Olthuis et al., 2014 | ASI-3 | 40 | 40 | 25 | 33 | .77 | .075 | .77 | .066 | .27 | .071 | .67 | .053 |
Safren et al., 2012 | Adherence | 44 | 45 | 36 | 30 | .64 | .064 | .64 | .072 | .10 | .061 | .51 | .046 |
Twohig et al. 2010 | Y-BOCS | 41 | 38 | 33 | 31 | .84 | .068 | .84 | .144 | .54 | .065 | .50 | .052 |
Total N | 238 | 245 | 185 | 186 |
Notes. The ds for the Independence Assumed Approach (IAA) and SEb frameworks were calculated with Equation 2; the classical ds were calculated with Equation 8; the t to d approach ds were calculated with Equation 1. The vs for the IAA were calculated with Equations 2 and 3; the vs for the SEb approach were calculated with Equation 5; the vs for the classical and t to d approaches were calculated with Equation 3. NBT = n at baseline for treatment group, NBC = n at baseline for control group, NET = n at end for treatment groups, NEC = n at end for control group, CAPS = Clinician Administered PTSD Scale, Center for Epidemiologic Studies-Depression, ASI = Anxiety Sensitivity Index, Y-BOCS = Yale-Brown Obsessive Compulsive Scale.
Table 2.
IAA | SEb | Classical | t to d | |
---|---|---|---|---|
d | .593 | .573 | .205 | .474 |
Variance of d | .011 | .013 | .011 | .009 |
Standard Error of d | .106 | .113 | .105 | .092 |
95% CI for d | .384, .801 | .352, .795 | .000, .410 | .293, .655 |
Q for test of homogeneity | 2.94 | 2.44 | 2.91 | 1.53 |
Note. IAA = Independence Assumed Approach, d = weighted mean d, CI for d = confidence interval for weighted mean d. None of Qs was statistically significant at p < .05.
The applications of the IAA, SEb, and t to d approaches for combining results from the five studies obtained medium mean effect sizes, ds = .47–.59. By contrast, the classical approach found a small average effect size, d = .20. Paired t-tests indicated the classical method yield significantly smaller effect sizes than were obtained with both the GMA d (i.e., IAA and SEb) and t to d approaches, t(4) = 8.20, p < .001, and t(4) = 3.04, p < .05, respectively. The t to d approach yielded smaller ds than were obtained using GMA d, t(4) = 3.00, p < .05. The statistics for within- and between-study variations in effect sizes were comparable across the four approaches, with no significant heterogeneity of effect sizes within methods.
Discussion
Although the main objective of the meta-analysis was to illustrate the use of four different approaches to research synthesis with GMA studies, serendipitous findings were that the two GMA d approaches produced larger effect sizes than were obtained with the t to d approach, and the latter yielded larger effect sizes than were obtained with the classical approach. However, the small number of studies included in the meta-analysis limits generalizability of the results from comparisons of mean effect sizes across the four analyses. In particular, the differences between mean effect sizes obtained with the t to d approach and any of the other methods would vary according the average ICCs in the GMA studies included in a meta-analysis.
Thus, the meta-analytic findings suggest that empirical researchers who do not use the GMA d to evaluate intervention efficacy are likely to obtain underestimated effect sizes, and meta-analysts using the traditional methods to obtain the effect size for GMA studies for inclusion in their reviews would obtain attenuated mean effect sizes. All studies used in the meta-analysis had missing data, and the effects of attrition in the included studies may have impacted the meta-analytic results differently across analyses. Another plausible explanation for the differences in meta-analytic results is that the GMA d is an effect size that is a transformation from a coefficient from a latent variable model that has been model corrected for measurement errors, whereas the classical d is an underestimate of its parameter because of the effects of such errors (Schmidt & Hunter, 2014).2
The key disadvantage of the IAA is that its estimation of v is inferior to estimation based on SEb because the formula associated with the IAA (Equation 3) does not consider the effects of the ICC (or the residual variance related to it) from the GMA model on the sampling error of the GMA d. However, it was encouraging that the average v obtained for the GMA d from the two GMA d approaches was virtually identical (.011 vs. .013, see Table 2). Thus it would be preferable to use an approach (the IAA) that improves upon the estimation of the effect size parameter than one that uses essentially the same v but in conjunction with a biased effect size.
Unlike in the current meta-analysis, where the variance estimation method was the same for all findings in a given synthesis, “real” meta-analyses that include GMA studies would generally need to use the both the SEb and the IAA approaches, with the latter used only to obtain v for effect sizes from studies in which SEb was not reported (or calculable from the equations introduced in this article). As there would probably be only a few retrieved studies entered in a given meta-analysis that require the use of the IAA because of unavailability of SEb, lack of precision in estimation of v for the GMA d with the IAA for them would probably not be consequential. Thus, the IAA is arguably preferable to the alternatives (the classical approach, the t to d equation, and excluding the study from the meta-analysis) when SEb is indeterminate.
Public Health Significance Statement.
Conclusions about the efficacy of interventions are often based on meta-analyses of randomized clinical trials, including studies that used growth modeling analysis (GMA) to determine the treatment effect. This study describes and illustrates the application of four different methods for including GMA treatment effects in meta-analysis. The two traditional approaches were found to underestimate treatment effect sizes and the use of two newer model-based approaches that work with the GMA d is preferable for use in quantitative reviews.
Acknowledgments
This project was supported by awards from the National Institutes of Health (NIH) Grant RC1DA028344 from the National Institute of Drug Abuse (NIDA), Grant R01AA018669 from the National Institute of Alcoholism and Alcohol Abuse (NIAAA), and Grant R01HD46364 from the National Institute of Child Health and Human Development (NICHD). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH, NIDA, NIAAA, or NICHD.
Footnotes
The GMA d calculated with Equation 2 that uses the pooled within-group SD (as does Cohen’s d) of the outcome as the denominator can be contrasted with an alternative GMA d often used for power analysis for GMA studies that uses the standard deviation of growth rate (b) as the denominator (Muthén & Muthén, 2002; Raudenbush & Liu, 2001). The latter effect size should not be used in meta-analysis because it is not in the same metric as the classical d (see Feingold, 2009, for a full discussion of the differences between the two GMA ds.)
It is also possible that publication bias may have notably inflated the mean GMA d. Given the sample and effect sizes for each of the retrieved GMA studies summarized, the statistical power to detect the slope differences was small (Muthén & Muthén, 2002). Yet, every one of the five studies reported finding a statistically significant treatment effect from the GMA for the primary study outcome. Unsurprisingly, the only study that reported a small GMA d had a sample size almost double that of the next largest study. Although publication bias could have inflated mean effect sizes for all approaches, it was probably least likely to be a factor in the classical approach because statistical significance of the intervention effect was determined from the GMA rather than from the observed mean difference between groups at the end of the study (which was not even reported by the primary researchers in any of the included studies). Another possible explanation for cross-analyses differences in effect sizes is that some of the GMA models were misspecified because the correct model was non-linear and thus the GMA d was based on an incorrect model, which would have result in a GMA d that is not the standardized mean difference at the end of the study.
References
- Bollen KA, Curran PJ. Latent curve models: A structural equation perspective. Hoboken, NJ: Wiley; 2006. [Google Scholar]
- Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to meta-analysis. New York: Wiley; 2009. [Google Scholar]
- Cumming G. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge; 2013. [Google Scholar]
- Elliott DS, Huizinga D, Menard S. Multiple problem youth: Delinquency, substance use, and mental health problems. New York: Springer-Verlag; 1989. [Google Scholar]
- Feingold A. Effect sizes for growth-modeling analysis for controlled clinical trials in the same metric as for classical analysis. Psychological Methods. 2009;14:43–53. doi: 10.1037/a0014699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feingold A. A regression framework for effect size assessments in longitudinal modeling of group differences. Review of General Psychology. 2013;17:111–121. doi: 10.1037/a0030048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feingold A. Confidence interval estimation for standardized effect sizes in multilevel and latent growth modeling. Journal of Consulting and Clinical Psychology. 2015;83:157–168. doi: 10.1037/a0037721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gibbons RD, Hedeker D, Elkin I, Waternaux CM, Kraemer HC, Greenhouse JB, et al. Some conceptual and statistical issues in analysis of longitudinal psychiatric data. Archives of General Psychiatry. 1993;50:729–750. doi: 10.1001/archpsyc.1993.01820210073009. [DOI] [PubMed] [Google Scholar]
- Goldstein H. Multilevel statistical models. 4th. Hoboken, NJ: Wiley; 2011. [Google Scholar]
- Hedeker D, Gibbons RD. Longitudinal data analysis. Hoboken, NJ: Wiley; 2006. [Google Scholar]
- Hien DA, Levin FR, Ruglass LM, López-Castro T, Papini S, Hu MC, Herron A. Combining seeking safety with sertraline for PTSD and alcohol use disorders: A randomized controlled trial. Journal of Consulting and Clinical Psychology. 2015;83:359–369. doi: 10.1037/a0038719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kerr DCR, DeGarmo DS, Leve LD, Chamberlain P. Juvenile justice girls’ depressive symptoms and suicidal ideation 9 years after multidimensional treatment foster care. Journal of Consulting and Clinical Psychology. 2014;82:684–693. doi: 10.1037/a0036521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Little RJA, Rubin DB. Statistical analysis with missing data. 2nd. Hoboken, NJ: Wiley; 2002. [Google Scholar]
- Morris SB, DeShon RP. Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods. 2002;7:105–125. doi: 10.1037/1082-989x.7.1.105. [DOI] [PubMed] [Google Scholar]
- Muthén LK, Muthén BO. How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling. 2002;4:599–620. [Google Scholar]
- Odgaard EC, Fowler RL. Confidence intervals for effect sizes: Compliance and clinical significance in the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology. 2010;78:287–297. doi: 10.1037/a0019294. [DOI] [PubMed] [Google Scholar]
- Olthuis JV, Watt MC, Mackinnon SP, Stewart SH. Telephone-delivered cognitive behavioral therapy for high anxiety sensitivity: A randomized controlled trial. Journal of Consulting and Clinical Psychology. 2014;82:1005–1022. doi: 10.1037/a0037027. [DOI] [PubMed] [Google Scholar]
- Raudenbush SW. Hierarchical linear models to study the effects of social context on development. In: Gottman JM, editor. The analysis of change. Mahwah: NJ: Erlbaum; 1995. pp. 165–201. 1995. [Google Scholar]
- Raudenbush SW, Liu X. Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychological Methods. 2001;6:387–401. [PubMed] [Google Scholar]
- Rosenthal R. Meta-analytic procedures for social research. 2nd. Newbury Park, CA: Sage; 1991. [Google Scholar]
- Safren SA, Cleirigh CM, Bullis JR, Otto MW, Stein MD, Pollack MH. Cognitive behavioral therapy for adherence and depression (CBT-AD) in HIV-infected injection drug users: A randomized controlled trial. Journal of Consulting and Clinical Psychology. 2012;80:404–415. doi: 10.1037/a0028208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmidt FL, Hunter JE. Methods of meta-analysis: Correcting error and bias in research findings. 3rd. Thousand Oaks, CA: Sage; 2014. [Google Scholar]
- Twohig MP, Hayes SC, Plumb JC, Pruitt LD, Collins AB, Hazlett-Stevens H, Woidneck MR. A randomized clinical trial of acceptance and commitment therapy versus progressive relaxation training for obsessive-compulsive disorder. Journal of Consulting and Clinical Psychology. 2010;78:705–716. doi: 10.1037/a0020508. [DOI] [PMC free article] [PubMed] [Google Scholar]