Abstract
The difference between groups in their random slopes is frequently examined in latent growth modeling to evaluate treatment efficacy. However, when end centering is used for model parameterization with a randomized design, the difference in the random intercepts is the model-estimated mean difference between the groups at the end of the study, which has the same expected value as the product of the coefficient for the slope difference and study duration. A Monte Carlo study found that (a) the statistical power to detect the treatment effect was greater when determined from the intercept instead of the slope difference, and (b) the standard error of the model-estimated mean difference was smaller when obtained from the intercept difference. Investigators may reduce Type II errors by comparing groups in random intercepts instead of random slopes to test treatment effects, and should therefore conduct power assessments using end centering to detect each difference.
Keywords: statistical power, latent growth models, randomized controlled trials, effect sizes
Growth modeling analysis (GMA) of repeated measures data using the EM algorithm (Dempster, Laird, & Rubin, 1977) and maximum likelihood estimation--including multilevel modeling /hierarchical linear models (Goldstein, 2011; Hedeker & Gibbons, 2006; Hox, Moerbeek & van de Schoot, 2010) and latent growth models (Bollen & Curran, 2006; Grimm, Ram, & Estabrook, 2016; Preacher, Wichman, MacCallum, & Briggs, 2008)--is the most commonly used framework for studying change in participants in the behavioral sciences (Feingold, in press; Kuljanin, Braun, & DeShon, 2011). GMA is widely used to assess treatment efficacy from randomized controlled trials (RCTs) by comparing the mean trajectories of the intervention and control groups (Feingold, 2009; Gibbons, Hedeker, Elkin, Waternaux, Kraemer, Greenhouse, et al., 1993; Gueorguieva & Krystal, 2004). Thus, power analysis for GMA has addressed the detection of the true slope difference--the effect of group (a binary time-invariant covariate) on the random slopes for linear trend of the outcome (Muthén & Muthén, 2002; Muthén & Curran, 1997; Raudenbush & Liu, 2001; Satorra & Saris, 1985; Spybrook, Raudenbush, Liu, & Congdon, 2008).
Parameterizations for Assessing Treatment Effects with GMA
GMA is generally used to examine random intercepts as well as random slopes, and thus compares the groups on two latent dependent variables--the intercept growth factor and the slope growth factor--extracted from the repeated measures. The intercept growth factor is the model-estimated status of participants when time is coded as “0.” For example, in a study with four equidistant timepoints using time codes of 0, 1, 2, and 3, for T1, T2, T3, T4, respectively, the intercept growth factor would be a latent variable reflecting initial status (a latent baseline measure), and −3, −2, −1, and 0 would be the codes used to compare the groups at final status. The coding of time is referred to as centering, which defines the model parametrization (Raudenbush & Bryk, 2002).
Thus, centering influences the coefficient (bi) for the effect of group (using codes for group differing by 1 unit, e.g., control = 0, treatment = 1) on the intercept growth factor because it determines the timepoint at which the two groups are compared on the outcome variable, and bi is the model-estimated difference between the means of the groups at that timepoint. If investigators use end centering for model parametrization (see Muthén & Muthén, 2000, for an example using end centering in survey research) in a randomized GMA, bi is the model-estimated mean difference between the two groups at the end of the study--an estimate of the same parameter for the treatment effect estimated by the difference between the means (M1 – M2) from a completely randomized design (Hinkelmann, & Kempthorne, 2008).
The difference between the two designs is that classical analysis (e.g., the t test for independent groups) compares the two groups in the means of observed scores at the end of the study, whereas a GMA using random intercepts with end centering compares corresponding model-estimated means on a latent variable of final status extracted from data collected at all timepoints. Both models test the same null hypothesis for the treatment effect, i.e., E(bi) = E(M1 – M2) = 0.
Moreover, in linear GMA with randomization and end centering, bi and bs also test the same null hypothesis. If the null hypothesis is true, the two groups are expected to have the same model-estimated mean at every timepoint, and thus there would be no expected difference between their means at the study’s end. However, if the null hypothesis is false, the two groups would start at the same point but the two lines capturing their outcome trajectories would not be parallel, which would have to result in an expected mean difference between the groups at any timepoint following baseline (thus including the end of the study, where it would be the largest in a linear model). This equivalence can also be logically deduced. If the treatment is effective, there must be an expected difference between the treatment and control groups at the end of the study (bi), and their linear trajectories must differ (bs).
In addition, in a randomized GMA design, multiplying the effect of group on slope (bs) by study duration yields an estimate of the same parameter (the population mean difference at the end of study) as both bi and M1 – M2 (Feingold, 2009, 2013), i.e.,
(1) |
Thus, in a GMA of data from a randomized design, the effect size associated with the null hypothesis significance test can also be obtained from either bi or bs, and both effect sizes have the same expected value. Thus, there is no a priori justification for determining treatment efficacy with one of these coefficients over the other. The coefficient to be used to determine the treatment effect should be the superior estimator of the common parameter and/or a more precise (and hence more powerful) estimator of that parameter.
Parameter Bias, Standard Errors, and Statistical Power of GMA Effects
Parameter bias is the difference between the expected value of a statistic and the parameter of which the statistic is an estimate (Muthén & Muthén, 2002), and can vary with study characteristics (e.g. sample size). Bias in the point estimate indicates that a statistic overestimates or underestimates its parameter on average, and bias in the SE of that point estimate indicates the confidence interval (CI) for the statistic will be too wide or too narrow to correctly capture the parameter.
A good estimator should have ignorable amounts bias in the point estimate and in its SE, and bias in that estimate and/or its SE might differ between bi or bs, thus influencing the choice of the coefficient that should be used to determine the treatment effect. However, even if both of the coefficients are equally valid estimators of the difference between groups at the end of the study, one may have a smaller SE for that mean difference, which would mean that (a) it would have a narrower CI, and (b) there would be greater statistical power to detect the treatment effect when using that coefficient.
Monte Carlo Study of Parameter Bias, Standard Errors, and Statistical Power
The status quo in program evaluation--determining the treatment effect with bs--lacks an empirical foundation, as there is no consensus that bs (a) exhibits less parameter bias than bi in its point estimate or SE, (b) has a narrower CI for the model-estimated mean difference associated with it, or (c) is easier to detect. Monte Carlo analysis is the most popular contemporary approach for examining both parameter bias and statistical power (e.g., Arend & Schäfer, 2019; Cheung, 2009; Hedges, Pustejovsky, & Shadish, 2012; Miočević, O’Rourke, MacKinnon, & Brown, 2018; MacKinnon, Lockwood, & Williams, 2004). However, when the conventional use of the slope difference to test the treatment effect was adopted a generation ago, Monte Carlo (simulation) studies were relatively rare, particularly for findings from then emerging statistical frameworks like GMA. In addition, the SE of bs and SE of bi cannot be meaningfully compared directly because the two statistics are in different metrics. Both effects must be expressed in a common metric (i.e., the model-estimated mean difference) for appropriate Monte Carlo-based comparisons of their SEs.
Thus, this study uses Monte Carlo analysis to examine bias and statistical power for bs and bi,, and to determine whether these model-estimated mean differences derived from these coefficients two estimate a common parameter for the treatment effect (given randomization and end centering parameterization). A second objective was to compare the magnitude of SEs for the estimate of the mean difference obtained with the two coefficients, which would explain differences in the power to detect bs and bi in testing the same treatment effect.
Effect Sizes for GMA
The terms in Equation 1 are both unstandardized effect sizes (effect sizes expressed in the metric of the outcome variable)--model estimated mean differences at the end of the study. Validation of the equation--and thus the hypothesis that both terms estimate the same parameter (mean difference between groups at final status)--necessitates a comparison of effect sizes derived from bi and bs because these two coefficients are in different metrics.
Equations for GMA Effect Sizes.
In a classical analysis comparing two independent groups, the unstandardized effect size (D) is the raw score mean difference between the groups, which can be converted to a standardized effect size (d) by dividing it by the pooled within-group SD (Cohen, 1988; Borenstein, Hedges, Higgins, & Rothstein, 2009). However, in a randomized GMA, the model-estimated mean difference at the end of the study can be derived from bs (irrespective of parameterization) or obtained as bi (given end centering). Thus, in analysis of the same longitudinal data from a randomized study with either classical analysis or a GMA with end centering,
(2) |
and
(3) |
and
(4) |
Thus, bs*duration, bi, bs*duration/SD, and bi/SD are the unstandardized (GMA D) and standardized (GMA d) effect sizes for group difference at the end of the study, and estimate the same parameters as D and d from a classical analysis (assuming the assumptions for both models are met and the SD estimates the same dispersion parameter). Accordingly,
(5) |
and
(6) |
and
(7) |
and
(8) |
Equation 6 was introduced in Feingold (2009), and has since been used as the GMA effect size for group differences in trajectories in hundreds of studies, particularly RCTs (e.g., Chorpita et al., 2017; Felder et al., 2017; Goodnight et al., 2017; Parra-Cardona, et al., 2017; Stice, Rohde, Shaw, & Gau, 2017). In a randomized GMA using end centering, Equation 8 produces a GMA d from bi that has the same expected values as GMA d from bs (Equation 6). (However, there will typically be small differences in the observed values of GMA d obtained with Equations 6 and 8 because randomization is never perfect, and the latter equation does not adjust model estimated mean difference at the end of the study for baseline differences attributable to imperfect randomization.)
Application of equations for GMA D and GMA d.
Either of two methods can be used to obtain effect sizes derived from bs and bi, including a two-step post hoc equations approach, which is the status quo for obtaining GMA d derived from bs (Feingold, 2009, 2013). At step 1, a GMA is conducted and the terms needed to calculate GMA D and/or GMA d with Equations 5–8 are extracted from the output file. At step 2, these terms are entered into those equations to produce the GMA effect sizes.
A recently discussed alternative approach requires statistical software that has user-prescribed parameter functions, such as lavaan in R (Rosseel, 2012), LISREL (Jöreskog & Sörbom, 2006), PROC CALIS in SAS (SAS Institute Inc., 2011), and Mplus (Muthén & Muthén, 2017). The equations for the calculations of GMA D and GMA d are specified in the input so that the program calculates the effect sizes and reports them as “new parameters” in the output file (Feingold, 2019a, 2019b). The current study uses both approaches for effect size estimation, with Mplus used to produce the effect sizes with the user-prescribed parameter functions method.
Standard Errors of GMA Effect Sizes
Equations for SEs of GMA D and GMA d.
Equations are needed to compare the SEs of the two model-estimated mean differences: bs*duration and bi. Feingold (2015) formulated an equation for the variance (v) of the GMA d from bs,
(9) |
where SEbs2 is the square of the SE of bs from the GMA. The SE of GMA d is thus the square root of this v.
The respective equation for the calculation of the v of GMA D is
(10) |
and the SE of GMA D is the square root of its v.
An equation for the SE of the unstandardized GMA effect size (GMA D) from a design with randomization and end centering is not needed because GMA D = bi, and all GMA software produces the SE of bi by default. However, an equation is needed to obtain the SE of the GMA d derived from bi. The terms in Equation 8 can be rearranged to define the intercept-derived GMA d as the product of the regression coefficient for the group difference in intercepts (bi) and a standardizing multiplicator (1/SD). Statistical theory dictates that the multiplication of a random variable (e.g., the sampling distribution of bi) by a constant produces a new random variable that is a linear transformation of the original variable and has a variance that is the product of the variance of the original variable multiplied by the square of the constant (Hodges & Lehmann, 2005). Because the variance of bi is the square of its SE (SEbi2), the v of the GMA d derived from bi is,
(11) |
and the SE of GMA d from bi can be calculated from that v.
Application of equations for SEs of GMA D and GMA d.
The post hoc equations approach for calculating GMA effect sizes can also be used to obtain their SEs (with Equations 9–11). However, if the GMA effect sizes are calculated by specifying equations in program input to generate new parameters, the software will automatically produce the SEs of GMA D and GMA d using the same method it used to calculate the SEs of bi and bs. Thus, no equations for SEs of the effect sizes need to be included in the input with the user-prescribed parameters approach.
GMA with a Non-Equivalent Groups Design
Equations 1, 2, 4, 7, and 8 are valid only when end centering is used in the GMA and the means of the two groups thus have the same expected value at baseline (e.g., because of randomization). Equations 5 and 6 derive effect sizes exclusively from the difference between the groups in their slopes, which makes baseline differences irrelevant. (The effect of group on the intercept, bi, is not included in these equations.) Therefore, the GMA effect sizes derived from slope differences (bs) are effectively adjusted for baseline differences, much like the mean differences in analysis of covariance (ANCOVA) in classical analysis. Thus, the GMA effect sizes obtained with Equations 5 and 6 express differences between the groups at final status that is due entirely to differences in growth rate over the course of the study. By contrast, effect sizes based on intercept differences (calculated with Equations 7 and 8) are the raw model-estimated mean differences at the end of the study that confound differences between the groups in initial status with differences in growth rate when the groups are not equivalent at onset. (This is also true for the completely randomized design in classical analysis.)
The confounding of growth with initial status would suggest that bi can only be used to define the treatment effect in a GMA with end centering when the two groups are expected to be equal at baseline (as with the completely randomized design), but this limitation can be surmounted so the intercept-based method can also be used with non-equivalent groups GMA designs. Consider a GMA study of sex differences (e.g., Huttenlocher, Haight, Bryk, & Seltzer, 1991; Leahey, & Guo, 2001), where the men and women start out the same on the modeled variable, both groups improve over time but the linear slope (rate of growth) is steeper for the men. Thus, the lines for the trajectories from the two groups are not parallel, and there is a gender difference in model-estimated means at the end of the study that is attributable solely to the trajectory difference. Now entertain a second study where the slopes for men and women are the same as in the first study but the men now have a larger mean than women at T1. In this case, the conventional hypothesis tests and effect sizes obtained with the slope difference would be the same as in the first study because bs and residual variances are identical. However, the line for the trajectory for men will now start higher on the y axis than in the first study, and the baseline difference would add to the difference between the sexes at each timepoint relative to the respective difference in the first study. Thus, the mean difference at the end of the second study would equal the model-estimated mean difference in the first study at the final timepoint (bs*duration) plus the model-estimated baseline difference (bi) in the second study if the second study uses initial status centering.
Therefore, baseline-corrected GMA effect sizes from intercept differences in a study with non-equivalent groups can be obtained with a multi-step approach that uses two parameterizations. At step 1, a GMA is conducted using initial status centering to obtain the model-estimated difference between the groups at study onset (isbi). At step 2, isbi is subtracted from all observations from the higher scoring group. (If the same GMA was run using the modified dataset, the only difference would be that bi would now be 0. And because the initial status difference would be 0, the modified dataset would have the same structure as if it were obtained from a design with perfect randomization.) At step 3, the GMA is conducted with the modified dataset using end centering parameterization to obtain bi, which is effectively corrected for the baseline difference in initial status.
However, instead of modifying the dataset as described above, an equation can be used to obtain the same significance tests and effect sizes for the group differences based on growth. At step 1, a GMA is conducted using initial status centering to obtain isbi. At step 2, another GMA of the data is conducted with end centering, and the isbi obtained at step 1 is used as a correction factor in Equations 4 and 5 to obtain the intercept-derived effect sizes (corrected GMA D and GMA d),
(12) |
And
(13) |
where (1) bi is the coefficient for the model-derived end of study difference using a GMA with end centering, (2) isbi is the coefficient for the initial status difference obtained using GMA with initial status centering, and (3) SD is the pooled within groups statistic that is traditionally used to calculate the GMA effect sizes from bs (using Equations 5 and 6).
A Demonstration of the Equivalence of Effect Sizes Obtained from Group Differences in Slopes and Intercepts in GMA with End Centering
An illustrative GMA was conducted to demonstrate that (1) baseline-corrected model-estimated mean differences (GMA D and GMA d) at end of study determined from the group differences in the slope (bs) and the intercept (b1) are identical when obtained from non-equivalent groups, (2) corresponding SEs for these differences are not the same, (3) the post hoc equation and user-prescribed parameter functions methods for SE estimation produce the same effect sizes and SEs, and (4) the two correction methods (data modification and b1 correction) produce the same effect sizes and SEs.
Datasets and models.
Example 6.10 in the Mplus user’s guide (Muthén, & Muthén, 2017) is a textbook example of a linear GMA--with 4 equidistant time points (coded 0, 1, 2, and 3 for T1, T2, T3, and T4, respectively), 2 continuous time-invariant covariates, a single time-varying covariate, and N = 500--that was provided to illustrate analysis of an accompanying dataset included with the Mplus software (ex6.10.dat). (Given 4 equidistant timepoints differing by 1 unit between them, duration = 3.)
A modification of this example was used for the current illustrative analysis, for which one of the two continuous time-invariant covariates in the dataset had to be dichotomized to create a binary covariate for group, and the other covariates were excluded to simplify the model. Because the two groups formed by the dichotomization of the continuous covariate had appreciably different means at baseline on the outcome, the example used artificial data that would be obtained from a non-equivalent groups design, which thus required correction for the group difference in initial status (isbi). Thus, GMA had to be conducted with two parameterizations, one with initial status centering (0, 1, 2, and 3 for T1, T2, T3, and T4, respectively) and one with end centering (−3, −2, −1, and 0 for T1, T2, T3, and T4, respectively). The isb1 extracted from the GMA with initial status centering was used as the correction factor to illustrate the application of each of the two correction approaches. In one GMA, isb1 was used to modify the dataset by subtracting it from every observation in the higher scoring group, and the effect sizes and SEs from analysis of the altered dataset were calculated with the same equation that would be used with data from a randomized design (see lower half of Appendix A for the input statement). In the second GMA, the analysis was of the raw data that had not been previously corrected for initial status differences and an equation specified the effect sizes should be calculated from b1 - isb1 instead of b1 (see top half of Appendix A for the input statement). The SD used in the calculations of GMA ds was derived from the GMA model by taking the square root of the sum of the intercept growth factor plus the average of the 4 y residual variances (see Feingold, 2015), which estimates the same dispersion parameter as the pooled within-group SD of observed variable at the end of the study that is used to calculate classical d.
The input statement for the GMA with end centering produced both unstandardized and standardized effect sizes (GMA Ds and GMA ds)--and their SEs with the default delta method (Benichou & Gai, 1989; Kendall & Stuart, 1977)--for the group difference at the end of the study. However, the post hoc equations approach was also used, in which the standard output statistics from the GMA were extracted and entered in the effect size equations in a second step.
Table 1 reports the GMA effect sizes and SEs from the illustrative analysis with correction factor specified in the effect size equations. The first two columns report the results produced by Mplus, and the second two columns report the results from post hoc equations using standard statistics from the GMA. (As expected, findings from the GMA in which the baseline corrections were made to the data prior to analysis and no baseline correction was specified in the input were the same. Thus, the duplicative findings from that analysis were not included in the table.)
Table 1.
Effect Sizes from Slope and Intercept Differences from Illustrative Analysis
Input Statements | Post Hoc Equations | ||||
---|---|---|---|---|---|
Equation | Effect Size | SE | Effect Size | SE | |
GMA D from slopes (bs) | bs*duration | 1.523 | .199 | 1.524 | .198 |
GMA D from intercepts (bi) | bi - isbi | 1.522 | .263 | 1.522 | NA |
GMA d from slopes (bs) | bs*duration/SD | .513 | .069 | .513 | .065 |
GMA d from intercepts (bi) | (bi - isbi)/SD | .513 | .090 | .512 | .089 |
Note. In the equations used in the illustrative study, duration = 3 and isbi = .955. SE = Standard Error, bi = effect of group on intercepts from end centered GMA, isbi = bi from GMA with initial status centering, bs = effect of group on slopes from GMA irrespective of parameterization, NA = not applicable (SE for GMA D from intercepts obtained directly from GMA as SE of bi.)
The four different methods (slope difference with Mplus input, slope difference with post hoc equations, intercept difference with Mplus input, and intercept difference with post hoc equations) yielded essentially identical effect sizes, thus supporting the validity of the frameworks for effect size estimation from GMA (and of the formulated equations for their calculations). As predicted, the SEs of the effect sizes derived from bi and bs were not the same, but the finding that the SE of the mean difference estimated from bi was greater than that derived from bs appeared to support the conventional use of slope differences to compare groups.
However, because the dataset for this example was Monte Carlo-generated, the illustrative analysis can be thought of as a Monte Carlo study with a single replication, with a relatively large sample size (N = 500), limiting both the generalizability of the observed differences in the SEs of the effect sizes obtained with the two types of differences. Thus, a Monte Carlo study was conducted to compare differences in statistical power to detect the treatment effect between bs and bi with end centering, and to determine whether power differentials varied as a function of effect magnitude and sample size. Moreover, unlike with the illustrative analysis of data from a non-equivalent groups design, the Monte Carlo study was conducted with models from the more commonly used randomized design.
Method
Previous Monte Carlo Studies of GMA Effects
Evaluation of bs.
To demonstrate the use of Monte Carlo analysis with GMA, Muthén and Muthén (2002) conducted a Monte Carlos study with four linear models formed by crossing two levels of the unstandardized coefficient for the group differences in slopes (bss = .10 and .20) with two sample sizes (Ns = 150 and 250). All other parameter specifications (e.g., residual variances, 10,000 replications) were the same across models.
Feingold (2015) first conducted a Monte Carlo study of bs with only slight modifications of the GMA models used in Muthén and Muthén (2002). Specifically, the parameterization was changed by replacing time codes of 0, 1, 2, and 3 (initial status centering) with respective codes of −3, −2, −1, and 0 (final status centering). In addition, the effects of group on the intercepts (bis) were changed to .30 (when bs = .10) and .60 (when bs = .20), which were the products of the specifications for bs (.10 or .20) and duration (= 3). Therefore, the modified models simulated findings from a randomized GMA (i.e., where the expected difference between the groups on y1/T1 was zero). No other model specifications were changed, although 20 analyses were conducted to examine bias in bs and its SE because three additional sample sizes (50, 100, and 500) and an additional SE estimation method (the bootstrap) were used. (See Appendix 1 in Muthén and Muthén, 2002, for the full original model specifications, using bs = .20 and N = 150, and Appendix A in Feingold, 2015, for the modified model specifications, using bs = .10 and N = 250).
Evaluation of GMA d with post hoc equations.
The primary objective of Feingold (2015) was to examine bias in the GMA d derived from bs, using post hoc equations to calculate GMA d and its SE, with SD calculated from each model as the square root of the sum of the variance of the intercept growth factor and the average of the four y residual variances. Mplus had generated the standard parameters for each replication from the simulations used to examine bias in bs, and a text file containing each replication’s parameters was imported into SPSS. Equations were then applied to the requisite parameter estimates for slope differences using SPSS to calculate GMA d, SE, and coverage by reproducing the methods used by Mplus to obtain the respective results for bs in the same models (e.g., averaging GMA ds over the 10,000 replications). Given the specifications for bs and the residual variances in these models, the population values for GMA d were .3464 for the small effect and .6928 for the medium effect, and these effect size parameter specifications were used to determine bias in the estimates of GMA d (based on the averages of the ds across replications in each analysis) with SPSS (see Appendix B in Feingold, 2015, for the SPSS syntax used to conduct the Monte Carlo study of GMA d obtained with post hoc equations from bs and its SE).
Evaluation of GMA d with user-prescribed parameter functions.
With the user-prescribed parameter functions approach, bias in bs and the GMA d derived from bs can both be obtained directly by software in a single stage, with the SEs and coverage for bs and GMA d calculated with identical methods. Accordingly, Feingold (2019a) conducted a Monte Carlo evaluation of bias in SEs and CIs for GMA d by specifying the equation for GMA d in an Mplus input statement and obtained bias in GMA d and its SE (with the delta method and the bootstrap) with the same 10 models used in the Feingold (2015) Monte Carlo study.
Assessment of parameter bias.
The validity of the estimation of bs, GMA d, SE of bs and SE of GMA d obtained in the Monte Carlo studies obtained with Mplus and SPSS using either post hoc equations or parameter creation methods was examined with conventional practices for interpreting Monte Carlo results using percent bias and coverage, which posit that ignorable parameter bias should be less than 10% in their point estimates and less than 5% in their SEs (Muthén & Muthén, 2002). Percent bias was calculated by subtracting estimate averages (across replications) from specified parameters, dividing the result by the parameter, and then multiplying by 100 to compute percent bias. Coverage was the proportion of replications containing the parameter, and coverage was used to assess bias in the CI. (Perfect coverage is .950 for the 95% CI, and thus CI bias is the difference between coverage and .950.)
Current Monte Carlo Study
The present study used the same 10 models from the previous two Monte Carlo studies of GMA with end centering but they were now used to compare (1) bias in the estimates and SEs for bi with respective bias in bs, (2) bias in the standardized effect size (GMA d) determined with bi and bs, (3) the magnitude of SEs of these effect sizes from bi and bs, and (4) power to detect bi and bs. This study used the same two approaches from the previous work (i.e., parameter creation with Mplus and post hoc equations).
Mplus input for parameter creation method.
Because bi is a default parameter, there was no need to create a new parameter to evaluate the unstandardized effect size because GMA D = bi. However, the GMA d derived from bi (= bi/SD) was produced directly in Mplus by labeling coefficient parameters and specifying the equations to create new parameters in the input statement (see Appendix B for an input statement for one of the Monte Carlo simulations). The analysis for each condition was rerun with a bootstrap specification (with 500 draws) to obtain bootstrap SEs for GMA d (and coverage for GMA d using bootstrap CIs), as the bootstrap has been found to be particularly useful for CIs for effect sizes from multilevel analysis (Feingold, 2019a, 2019b; Lai, 2020).
Post hoc equations.
This Monte Carlo study used SPSS to examine bias in the GMA ds derived from bi with post hoc equations and with the same SPSS data files of parameter outputs from the Monte Carlo study of GMA d derived from bs, and the analysis used the same methods previously used to examine GMA d derived from bs with post hoc equations (Feingold, 2015).
Power analysis.
The statistical power for each effect in a Monte Carlo analysis is the proportion of replications in which that effect is statistically significant (where statistical significance in each replication was determined by whether the coefficient divided by its SE exceeded 1.96, with SE calculated by the default delta method). Power was examined for both bi and bs.
Results
Bias in Estimates of bi and bs*3 (GMA D)
Table 2 reports the bias in GMA D, bs*3 (= bs*duration = S) and bi (= I)--the model-estimated mean differences at the end of the study from the group difference in the slope and intercept growth factors, respectively. Each row reports the parameter, the average of the 10,000 estimates of the parameter (mean GMA D), and the percent bias in estimate in each condition for small (.30) and medium (.60) unstandardized coefficients--at the sample size condition specified in column 1. When rounded to the customary two decimal places, the average estimates were always the same for the slope- and intercept-based mean differences, which were also always identical to the specified parameters. In other words, after typical rounding, bias in GMA D was zero across all conditions used in the simulations,
Table 2.
Monte Carlo Analysis of Bias in the Estimates of the End of Study Mean Difference (GMA D)
Parameter = .30 | Parameter = .60 | Both Parameters | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
N | Average GMA D | Percent Bias | Average GMA D | Percent Bias | SD of GMA D | |||||
S | I | S | I | S | I | S | I | S | I | |
50 | .2988 | .3039 | .40 | 1.30 | .5988 | .6039 | .20 | .65 | .3774 | .2244 |
100 | .3018 | .3026 | .60 | .87 | .6018 | .6026 | .30 | .43 | .2634 | .1567 |
150 | .3006 | .3020 | ,20 | .67 | .6006 | .6020 | .10 | .33 | .2160 | .1275 |
250 | .2988 | .3017 | .40 | .57 | .5988 | .6017 | ,20 | .28 | .1662 | .0983 |
500 | .2994 | .3011 | .20 | .37 | .5994 | .6011 | .10 | .18 | .1182 | .0695 |
Mdn | .2988 | .3020 | .40 | .67 | .5994 | .6020 | .20 | .33 | .2160 | .1275 |
Note. Average = mean of GMA D estimates from 10,000 replications, SD of GMA D = standard deviation of GMA Ds across 10,000 replications. S =estimated mean difference and bias from slope difference (= bs*duration = bs*3), I = estimated mean difference and bias from intercept difference (= bi).
Bias in SEs and CIs of bi and bs*3 (GMA D)
Table 3 reports the findings from the Monte Carlo analysis of the SEs of the estimates of the GMA Ds. Each row reports the results from a simulation using the sample size specified in Column 1. (Results are not reported separately by effect size because the findings were invariant across effect magnitude.) The bias in SE was always below 3.80%, and coverage was always .94–.95. Bias in SEs of estimates from slope differences were not notably different from bias in SEs of estimates from intercept differences, with median biases of 1.37% and 1.33% when SEs were determined with the delta method, and .64% and .63% when SEs were calculated with the bootstrap, both respectively. There was, however, evidence of small sample bias when in SEs were calculated with the delta method (but not with the bootstrap) in that the percent bias was at least twice as great when N = 50 than when N = 100–500, and the magnitude of the small sample bias did not depend on whether the estimates were based on group differences in slopes or intercepts.
Table 3.
Monte Carlo Analysis of Bias in the Standard Errors for the Unstandardized Effect Size (GMA D)
N | Coverage | Percent Bias | Confidence Interval Bias | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Delta | BTSP | Delta | BTSP | Delta | BTSP | |||||||
S | I | S | I | S | I | S | I | S | I | S | I | |
50 | .937 | .937 | .942 | .940 | 3.34 | 3.74 | .64 | 1.34 | .013 | .013 | .008 | .010 |
100 | .943 | .943 | .944 | .944 | 1.37 | 1.91 | .23 | .77 | .007 | .007 | .006 | .006 |
150 | .942 | .946 | .943 | .947 | 1.67 | 1.33 | .97 | .63 | .008 | .004 | .007 | .003 |
250 | .947 | .947 | .949 | .948 | .72 | .61 | .36 | .31 | .003 | .003 | .001 | .002 |
500 | .945 | .948 | .946 | .947 | 1.02 | .43 | .76 | .29 | .005 | .002 | .004 | .003 |
Mdn | .943 | .946 | .944 | .947 | 1.37 | 1.33 | .64 | .63 | .007 | .004 | .006 | .003 |
Note. Coverage = 95% coverage, Confidence Interval Bias = difference between coverage and .950, delta = delta method, BTSP = bootstrap, S = GMA D from slope difference, I = GMA D from intercept difference.
Bias in Estimates of GMA d
Table 4 reports the findings for GMA d derived from both bs and bi that parallel the results for estimates of GMA D reported in Table 2. For the smaller effect, after rounding to two decimal places, the average GMA d was exactly the same as its parameter in 9 of the 10 simulations (the exception being an absolute bias--the difference between the estimate average and the parameter--of .01 observed at N = 50 for the intercept-derived GMA d). For the larger effect, after rounding, the absolute biases in GMA d were .00–.03. Differences in bias between respective intercept- and slope-derived GMA ds were trivial (.00–.01).
Table 4.
Monte Carlo Analysis of Bias in the Point Estimates of the Standardized Effect Size (GMA d)
Parameter =.3464 | Parameter =.6928 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
N | Average GMA d | SD of GMA d | Percent Bias | Average GMA d | SD of GMA d | Percent Bias | ||||||
S | I | S | I | S | I | S | I | S | I | S | I | |
50 | .3545 | .3611 | .4506 | .2700 | 2.34 | 4.24 | .7108 | .7173 | .4539 | .2764 | 2.60 | 3.54 |
100 | .3533 | .3544 | .3095 | .1852 | 1.99 | 2.31 | .7048 | .7058 | .3117 | .1890 | 1.73 | 1.88 |
150 | .3504 | .3521 | .2532 | .1500 | 1.15 | 1.65 | .7003 | .7020 | .2551 | .1533 | 1.08 | 1.33 |
250 | .3468 | .3503 | .1938 | .1150 | .12 | 1.13 | .6951 | .6987 | .1952 | .1175 | .33 | .85 |
500 | .3466 | .3484 | .1373 | .0809 | .06 | .58 | .6939 | .6956 | .1383 | .0825 | .16 | .40 |
Mdn | .3504 | .3521 | .2532 | .1500 | 1.15 | 1.65 | .7003 | .7020 | .2551 | .1533 | 1.08 | 1.33 |
Note. Average GMA d = mean of GMA ds from 10,000 replications. SD of GMA d = standard deviation of GMA ds across 10,000 replications. S = GMA d from slope difference (bs), I = GMA d from intercept difference (bi),
As would be expected from the minimal absolute biases, the median relative bias in the smaller GMA d was only 1.15% when based on slopes and 1.65% when based on intercepts. The median bias in the larger GMA d was 1.08% when based on slopes and 1.33% when based on intercepts. Bias in the estimates of GMA d never exceeded 4.30%, and thus were ignorable.
Bias in SEs and CIs of GMA d
Table 5 reports the biases in the SEs and CIs for the GMA ds, which were consistently greater than the corresponding biases in SEs and CIs of GMA D. However, the observed bias in these two statistics was moderated primarily by sample size and estimation method. Bias was practically nonexistent when the SEs and CIs were obtained with the bootstrap, as coverage for GMA d was nearly perfect, irrespective of sample size and the coefficient used to calculate GMA d. With the post hoc equations approach and the delta method, by contrast, bias was notable bias at N = 50, but decreased progressively as sample size increased, particularly with post hoc equations. When sample size was 150 or larger, the percent bias never exceeded 4.6%.
Table 5.
Monte Carlo Analysis of Bias in the Standard Errors of the Standardized Effect Size (GMA d) from Slopes and Intercepts
N | Parameter | Coverage | Percent Bias | Confidence Interval Bias | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Eq | Delta | BTSP | Eq | Delta | BTSP | Eq | Delta | BTSP | |||||||||||
S | I | S | I | S | I | S | I | S | I | S | I | S | I | S | I | S | I | ||
50 | .3464 | .936 | .933 | .942 | .939 | .954 | .951 | 4.02 | 5.81 | 3.35 | 4.63 | 2.66 | 1.15 | .014 | .017 | .008 | .011 | −.004 | .001 |
50 | .6928 | .935 | .925 | .941 | .938 | .954 | .949 | 4.71 | 8.00 | 3.26 | 4.67 | 2.75 | 1.12 | .015 | .025 | .009 | .012 | −.004 | .001 |
100 | .3464 | .942 | .941 | .945 | .945 | .952 | .950 | 1.78 | 3.13 | 1.29 | 2.21 | 1.52 | .59 | .008 | .009 | .005 | .005 | −.002 | .000 |
100 | .6928 | .943 | .936 | .947 | .947 | .952 | .950 | 2.47 | 5.08 | 1.22 | 1.96 | 1.60 | .79 | .007 | .014 | .003 | .003 | −.002 | .000 |
150 | .3464 | .940 | .944 | .942 | .947 | .946 | .951 | 2.17 | 2.47 | 1.78 | 1.53 | .00 | .27 | .010 | .006 | .008 | .003 | .004 | .001 |
150 | .6928 | .939 | .939 | .942 | .947 | .946 | .949 | 2.90 | 4.57 | 1.76 | 1.50 | .04 | .26 | .011 | .011 | .008 | .003 | .004 | .001 |
250 | .3464 | .947 | .946 | .948 | .949 | .950 | .951 | 1.19 | 1.57 | .83 | .70 | .15 | .26 | .003 | .004 | .002 | .001 | .000 | .001 |
250 | .6928 | .945 | .941 | .948 | .948 | .949 | .952 | 1.90 | 3.66 | .77 | .68 | 20 | .34 | .005 | .009 | .002 | .002 | .001 | .002 |
500 | .3464 | .944 | .948 | .945 | .951 | .948 | .950 | 1.46 | 1.11 | 1.17 | .25 | .66 | .12 | .006 | .002 | .005 | .001 | .002 | .000 |
500 | .6928 | .944 | .943 | .946 | .951 | .947 | .950 | 2.17 | 3.03 | 1.16 | .00 | .58 | .36 | .006 | .007 | .004 | .001 | .003 | .000 |
Mdn | .942 | .941 | .942 | .947 | .950 | .950 | 2.17 | 3.40 | 1.26 | 1.52 | .62 | .35 | .008 | .009 | .005 | .003 | .000 | .001 |
Note. Coverage = 95% coverage, Confidence Interval Bias = difference between coverage and .950, delta = delta method, BTSP = bootstrap, Eq = post hoc equations, delta = delta method, S = GMA d from slope difference, I = GMA d from intercept difference.
Statistical Power and Standard Errors for Model-Estimated Mean Differences
The first two columns in Table 6 report the power estimates for bi and bs from the 10 Monte Carlo simulations using the default delta method for estimation, which is the proportion of significant effects in the 10,000 replications for each condition defined by sample size and effect size. The power to detect bi was much greater than the power to detect bs. Using the conventional value of .80 as the threshold for acceptable power, bi was adequately powered for the larger effect with a sample size of about 50 but a sample size of 150 was necessary when the treatment effect was defined as bs. For the smaller effect, a sample size between 150 and 250 was required for adequate power when examining bi, but even the largest sample size of 500 was insufficient to detect bs.
Table 6.
Power to Detect bs and bi and Average Standard Errors of GMD D from bs and bi
Power | Average SE of GMA D | ||||
---|---|---|---|---|---|
N | Pop GMA D | bs | bi | bs*3 | bi |
50 | .30 | .146 | .305 | .365 | .216 |
50 | .60 | .384 | .793 | .365 | .216 |
100 | .30 | .219 | .508 | .260 | .154 |
100 | .60 | .637 | .972 | .260 | .154 |
150 | .30 | .297 | .663 | .212 | .126 |
150 | .60 | .802 | .998 | .212 | .126 |
250 | .30 | .449 | .869 | .165 | .098 |
250 | .60 | .947 | 1.000 | .165 | .098 |
500 | .30 | .726 | .990 | .117 | .069 |
500 | .60 | .999 | 1.000 | .117 | .069 |
Mdn | .543 | .920 | .212 | .126 |
Note. Power is the proportion of replications in which bs and bi were statistically significant. Pop GMA D = Population GMA D, bs*3 = GMA D, bi = GMA D. Average SE of GMA D is the Monte Carlo-calculated mean of the Standard Errors of the model-estimated mean differences at end of study--bs*duration (= bs*3) when derived from bs (the slope difference) or bi (the intercept difference).
The third and fourth columns in Table 6 report the average SEs (the means of the SEs across replications) of the model-estimated mean differences, as determined from either the slope difference or the intercept differences. The SEs were consistently smaller when the mean differences were estimated from intercept differences than from slope differences. These differences in SEs are the most plausible explanation for the power differences apparent in the first two columns of Table 6.
Discussion
The conventional approach for evaluating the treatment effect in GMA is to compare the two groups in their random slopes by examining the effect of group on slope (bs) for both null hypothesis significance tests and determination of effect sizes (Feingold, 2015). This article demonstrates that the same hypothesis tests and associated effect sizes with the same expected values can be obtained using the effect of group on the random intercepts (bi) in a randomized design with end centering. Baseline corrections can be used to achieve the same effect sizes from bs and bi with data from a non-equivalent groups GMA design.
Randomized Designs
The Monte Carlo study found that bs and bi were equally good estimators of their respective parameters, and thus parameter bias should not be not be a factor in determining which coefficient should be used to define the treatment effect. This simulation study also confirmed the prediction that expected values of the model-estimated mean differences obtained with bs and bi were the same, as each was compared to a common parameter to assess parameter bias, and the average estimates of this parameter were identical (after rounding to two decimal places). However, the Monte Carlo study found smaller SEs of those estimated mean differences when obtained from bi than bs, with differences in statistical power thus favoring the use of bi to determine the treatment effect.
Non-Equivalent Groups Designs
Effect size estimation.
Although the focus of the current work was on findings from randomized designs (because groups are often compared with GMA to evaluate treatment efficacy), the groups may be determined instead from sample characteristics, such as gender (e.g., Huttenlocher, Haight, Bryk, & Seltzer, 1991; Leahey, & Guo, 2001). When the two groups formed by measurement instead of manipulation, they may not be comparable at onset. Although null hypothesis significance tests and associated effect sizes based on bs are effectively adjusted for baseline differences as in ANCOVA, that is not true for findings based on bi, which are analogous to results from a completely randomized ANOVA.
Accordingly, two correction methods were proposed to adjust bi for initial status differences, and an illustrative analysis using artificial data from a non-equivalent groups design demonstrated that both approaches produced identical corrected effect sizes (unstandardized and standardized model-estimated mean differences) and SEs. Moreover, as predicted, these corrected effect sizes were identical to respective effect sizes calculated from bs but their SEs differed.
Interestingly, the SEs in this example were smaller for effect sizes derived from bs than from bi--which was the opposite of the results obtained in the Monte Carlo study--and would suggest greater statistical power for the conventional approach using bs in the example’s model. There are two possible explanations for the contradictory results obtained between the illustrative and Monte Carlo analyses. SE differences may vary with the characteristics of the study, such as whether groups were formed by randomization. If that is the case, then it would be incorrect to conclude that assessing treatment effects with bi instead of bs is always preferable.
However, because the dataset for the example was Monte Carlo-generated, the illustrative analysis can be thought of as a Monte Carlo study with a single replication, with a relatively large sample size (N = 500). If the SEs had been averaged over thousands of replications, they might have been smaller when based on group differences in intercepts rather than slopes, and these potential differences might have been greater in smaller samples (where power is more of an issue). Thus, conclusions about SE differentials should be based on the results from the Monte Carlo study rather than from the illustrative analysis.
Power estimation.
Calculation of the statistical power to detect the baseline-adjusted effects from a GMA study of groups that differ at baseline is only slightly more complicated than determining power for a treatment effect than in a GMA of data from a randomized design. The investigators must posit the mean difference between groups at onset (initial status difference) and at the end of the study, and then subtract initial status from final status to determine the designated coefficient for the adjusted treatment effect. Then the Monte Carlo model is set up exactly as for a randomized design, where there is no difference in initial status but bi is the adjusted rather than the raw difference at the end of the study and bs = bi/duration.
As an example, consider a GMA of sex difference where the authors believe the mean difference at end of the study difference will be 10 points (bi = 10 in end-centered GMA), and men and women’s means will differ by 4 points at baseline (bi = 4 in an initial status-centered GMA). The Monte Carlo model for power assessment would be set up exactly as for a randomized design in which the authors were positing no difference at baseline, with bi = 6 and bs = 6/duration. (The reason this approach is valid is that the illustrative study found that adjusting the final status differences for baseline differences does not affect the SE of bi).
Determining the SD Used in the Calculation of GMA d
There are different ways the SD used to transform GMA D to GMA d can be calculated, each of which estimates a different dispersion parameter (see discussion in Feingold, 2013). Note that this issue of selection of SD is not applicable to unstandardized effect sizes (GMA Ds) are sometimes preferable to standardized effect sizes (Baguley, 2009; Cohen, 1988; Hayes, 2013; Pek & Flora, 2018) and are directly used for power analysis.
The current illustrative and Monte Carlo analyses used a method in which SD was estimated from the GMA model. However, both of these analyses were conducted with a model in which treatment was the only time-invariant covariate, and the inclusion of other covariates affects the error terms for which SD is estimated (Feingold, 2019a; Maxwell, Delaney, & Kelley, 2018). The SD estimation method used in the illustrative and Monte Carlo simulations is only appropriate for such a single covariate model. With additional covariates, the residual variance of the random slopes is reduced not only by the effect of group but also by the effects of the other covariate(s), and thus can no longer be used in the estimation of the pooled within-group SD of the outcome at a given time point. Instead, the pooled SD from the observed data (e.g., at baseline) can be specified in the equations for GMA d (post hoc or in program inputs). This especially appropriate in examining real data from RCTs, as the variances at later time points can be biased by treatment effects, which was not applicable to the Monte Carlo study.
Hypothesis Testing and Effect Size Estimation in Complex GMA Designs
Nonlinear GMA models.
Linear models are not always appropriate because they make the unrealistic assumption that effect sizes increase linearly over time (Feingold, 2013). For example, treatment may have a strong effect at the beginning of an RCT and then plateau or decrease over the course of the study. Thus, nonlinear models are often needed for correct specification of the GMA (Marcoulides & Khojasteh, 2018).
Feingold (2019b) recently formulated equations for effect sizes from linear slope differences (bs) for use with quadratic GMA that includes the coefficient for the effect of group on the linear and quadratic slopes. The GMA d derived from both coefficients from a quadratic model more accurately estimates the group difference in means at the end of the study. Moreover, these equations can be used to obtain a separate effect size for each timepoint (e.g., end of trial and at follow-up in an RCT), which is necessary because effect sizes at end of study from quadratic GMA cannot be used to estimate effect sizes at intermediate timepoints with backwards extrapolation (as can be done with linear GMA models).
However, with the intercept-based approach, no changes are needed in equations to estimate effect sizes from a quadratic model because the difference in the intercept growth factor at end of the study (bi) can be determined from either a linear or quadratic model. There is, however, a difference when estimating effect sizes for intermediate timepoints in the study. When the approach based on linear and quadratic slope differences is used to obtain time-varying effect sizes, an equation specifying a different value for each measurement occasion is needed (Feingold, 2019b). When the intercept-based approach is applied to quadratic GMA, by contrast a different GMA is required to obtain effect sizes for each timepoint because the parameterization must also be changed. For example, if a quadratic model is used to examine data with 4 timepoints, the intercept-based approach would require a quadratic GMA with time codes of −2, −1, 0, and 1 for T1, T2, T3, and T4, respectively, to obtain the effect size at the third measurement occasion.2
Categorical outcomes.
In methods of categorical data analysis (Agresti, 2002), including latent class analysis (Masyn, 2013) and logistic regression (Hosmer & Lemeshow, 2000), a popular metric for an effect size is the odds ratio (OR; Fleiss & Berlin, 2009). The OR is often used with binary outcomes, especially in substance abuse research, where abstinence vs. use is a common dichotomous dependent variable (Feingold, MacKinnon, & Capaldi, 2019). The OR then communicates whether the treatment group has a higher probability of maintaining abstinence following treatment than does the control group. GMA can also use categorical outcomes (Feingold, Tiberio, & Capaldi, 2014), with the same null hypothesis (i.e., odds of being in one category, such as alcohol consumers, are the same for the treatment and controls at the end of the study). The use of the intercept growth factor instead of the slope growth factor as the latent variable for testing treatment effects is more straightforward to implement with binary than with continuous outcomes. If an end centering parameterization is used, the effect of the group on the random intercepts is the model-estimated mean difference at the end of the study in the logit metric. The standardized effect size (OR) is then obtained through exponentiation to yield the GMA OR. However, unlike with GMA d, GMA software will generally report GMA OR in analysis with a binary outcome. Thus, the investigators can obtain the GMA OR by using end centering and requesting the OR be outputted. As there is no need to divide an effect by SD to obtain the standardized effect size, no parameter creation or post hoc equation is required to produce the standardized effect, and a wider range of GMA software can output GMA OR than GMA d. Most important, the GMA OR estimates the same parameter as the OR from a completely randomized design that compared the two groups in their probabilities at the end of the study, much as GMA d estimates the same parameter as d from a comparison of two groups with the completely randomized design in classical analysis.
Confidence Intervals for GMA Effect Sizes
GMA effect sizes were used to demonstrate that identical model-estimated mean differences at the end of the study can be obtained from bs and bi, and thus either coefficient can be used to estimate the mean difference at the end of the study, and to determine its significance. However, effect sizes--measures of effect magnitude independent of sample size (Grissom & Kim, 2012)--are important in their own right, and are now widely reported in the behavioral sciences (Kelley & Preacher, 2012). Effect sizes are useful because there is a crucial distinction between statistical significance and practical significance (Preacher & Kelley, 2011).
There is now a consensus that effect sizes should always be accompanied by their CIs because an effect size observed in a sample is a point estimate, with a stability that varies by design and sample size (Cumming, 2013; Odgaard & Fowler, 2010). SEs are useful in calculating CIs, and also because CIs afford significance tests without a need for p values. If, for example, the 95% CI for a mean difference does not include the value of 0, the difference is statistically significant at the .05 level.
When GMA d--whether calculated from bs or bi--is obtained directly by the user’s GMA software through parameter creation, the CI for GMA d can be obtained from the program with the same commands used to obtain CIs for the unstandardized coefficients. Otherwise, the SE of GMA d can be calculated with Equation 9 and then multiplied by 1.96. The resulting product can be added and subtracted to GMA d to obtain the 95% CI for GMA d. Similarly, the SE of GMA D obtained with Equation 10 can be used to obtain the corresponding CI for GMA D.
Adjusting for Baseline Differences in Randomized GMA Designs
In an ANCOVA comparing two groups formed by randomization (and with the baseline of the outcome variable as the covariate), the raw and adjusted mean differences have the same expected value but not the same observed value because randomization is never perfect. Yet, the convention in ANCOVA is to draw conclusions based on the adjusted rather than the raw mean difference. When the same design is analyzed with a regression framework, the regression coefficient for the treatment effect (effect of treatment controlling for baseline) equals the adjusted mean difference from an ANCOVA.
By comparison, in a GMA with end centering, bi is the estimate of the raw mean difference at the end of the study, which is an analogue of the mean difference in a completely randomized design rather than of the more widely used adjusted mean difference in ANCOVA and regression. The adjusted mean difference from bi from a randomized design (e.g., an RCT) can be obtained by using either of the two methods described that adjust for baseline differences in a non-equivalent groups GMA design. However, the model-estimated raw and adjusted mean differences (and associated p values for their statistical significance) in data from an RCT should be very similar given randomization because they have the same expected value, and the difference is due to sampling error in the initial status difference in the unadjusted effects.
Limitations and Conclusions
The current work confirmed the logically derived expectation that the same null hypothesis for the treatment effect from linear GMA can be tested with either of two GMA parameters but that the power to reject that hypothesis varies with the parameter used for hypothesis testing. Although the Monte Carlo study found greater power to detect treatment effects with intercept difference instead of the conventionally-used slope difference, that finding may not generalize to all GMA models (under all kinds of conditions). In addition, although the article discussed the use of the intercept-based approach with quadratic models for effect size and CI estimation, statistical power for nonlinear models with this approach was not addressed.
Because of possible lack of external validity of current findings, researchers using Monte Carlo analysis--or any power assessment method that can examine bi as well as bs--to determine power to detect the group difference in growth in a GMA should use end centering parameterization and examine the power to detect both the intercept and slope differences between the groups in the models for their planned studies to make an informed decision about the parameter (bi or bi) to be used to define the treatment effect in planned studies.
Acknowledgments
This work was supported by National Institutes of Health (NIH)/National Institute on Alcohol Abuse and Alcoholism (NIAAA) grant R01AA025069. The content is solely the responsibility of the author and does not necessarily represent the official views of the NIH or NIAAA.
Appendix A. Mplus Input Statements for Generating GMA Effect Sizes for Illustrative Example
Model Using Correction Factor for Initial Status Difference
DATA: FILE IS ex6.10.dat;
VARIABLE: NAMES ARE y11–y14 ×1 ×2 a31–a34;
USEVARIABLEs=y11–y14 ×1;
DEFINE: IF (x1 GE −.073) THEN x1=1;
IF (x1 LT −.073) THEN x1=0;
MODEL: i s | y11@−3 y12@−2 y13@−1 y14@0;
i s ON x1;
s on x1(bs);
i on x1(bi);
i(v1);
y11–y14(r1–r4);
MODEL CONSTRAINT:
!calculation of GMA D from slopes
new(rawDs);
rawDs =bs*3;
!calculation of GMA d from slopes
new(GMAds);
GMAds = (bs*3)/sqrt(v1 + r1/4 + r2/4 + r3/4 + r4/4);
!calculation of GMA D from intercepts
new(rawDi);
rawDi=bi-.955;
!calculation of GMA d from intercepts
new(GMAdi);
GMAdi = (bi −.955)/sqrt(v1 + r1/4 + r2/4 + r3/4 + r4/4);
Model Using Data Corrected for Initial Status Differences
DATA: FILE IS ex6.10.dat;
VARIABLE: NAMES ARE y11–y14 ×1 ×2 a31–a34;
USEVARIABLEs=y11–y14 ×11;
DEFINE: IF (x1 GE −.073) THEN x11=1;
IF (x1 LT −.073) THEN x11=0;
IF (x11 eq 1) THEN y11=y11-.955;
IF (x11 eq 1) THEN y12=y12-.955;
IF (x11 eq 1) THEN y13=y13-.955;
IF (x11 eq 1) THEN y14=y14-.955;
MODEL: i s | y11@−3 y12@−2 y13@−1 y14@0;
i s ON x11;
s on x11(bs);
i on x11(bi);
i(v1);
y11–y14(r1–r4);
MODEL CONSTRAINT:
!calculation of GMA D from slopes
new(rawDs);
rawDs =bs*3;
!calculation of GMA d from slopes
new(GMAds);
GMAds = (bs*3)/sqrt(v1 + r1/4 + r2/4 + r3/4 + r4/4);
!calculation of GMA d from intercepts
new(GMAdi);
GMAdi = bi)/sqrt(v1 + r1/4 + r2/4 + r3/4 + r4/4);
Note. GMA = growth modeling analysis. Calculation of GMA D from intercepts in “Model
Using Data Corrected for Initial Status Differences” need not be specified in MODEL
CONSTRAINT because GMA D = bi, and bi is reported in the output.
Appendix B. Mplus Input Statement for a Monte Carlo Study with bi = .60 and n = 250
MONTECARLO: NAMES ARE y1–y4 x;
CUTPOINTS = x (0);
NOBSERVATIONS = 250;
NREPS = 10000;
SEED = 53487;
CLASSES = C(1);
GENCLASSES = C(1);
ANALYSIS: TYPE = MIXTURE;
ESTIMATOR = ML;
MODEL MONTECARLO:
%OVERALL%
[x@0]; x@1;
i BY y1–y4@1;
s BY y1@−3 y2@−2 y3@−1 y4@0;
[y1–y4@0];
[i*0 s*.2];
i*.25;
s*.09;
i WITH s*0;
y1–y4*.5;
i ON x*.6;
s ON x*.2;
%C#1%
[i*0 s*.2];
MODEL:
%OVERALL%
i BY y1–y4@1;
s BY y1@−3 y2@−2 y3@−1 y4@0;
[y1–y4@0];
[i*0 s*.2];
i*.25;
s*.09;
i WITH s*0;
y1–y4*.5;
i ON x*.6;
s ON x*.2;
%C#1%
[i*0 s*.2];
s ON x*.2 (b);
i on x*.6 (bi);
i (v1);
y1–y4 (r1–r4);
MODEL CONSTRAINT:
new(di*.6928);
di= bi/sqrt(v1 + r1/4 + r2/4 + r3/4 +r4/4);
Footnotes
The coefficient for the effect of group on slope (bs) is the difference in the rate of change in the outcome between the two groups per unit of time (e.g., per week when time is coded in weeks), and duration is the length of the study based on units associated with bs (e.g., number of weeks from baseline if bs is the group difference in rate of change per week). When time is coded by measurement occasions with values differing by 1 (e.g., 0, 1, and 2 for T1, T2, and T3, respectively), the duration is one less than the number of measurement occasions (timepoints).
The advantages to using the intercept rather over the slope difference approach are likely to be greater for quadratic GMA than for linear GMA. The Monte Carlo analysis of estimates and SEs of time-varying effect sizes from quadratic GMA indicated that much larger sample sizes were needed to obtain GMA ds from quadratic models than for GMA ds from linear models derived exclusively from the coefficient for the linear slope difference (Feingold, 2019b). The reason there should be greater bias in GMA ds calculated from slope differences in quadratic models is that the effect sizes are obtained with an equation that includes two coefficients--each subject to sampling errors--rather than the single coefficient for the effect on linear slopes in linear GMA.
References
- Agresti A (2002). Categorical data analysis (2nd ed.). New York: Wiley. [Google Scholar]
- Arend MG, & Schäfer T (2019). Statistical power in two-level models: A tutorial based on Monte Carlo simulation. Psychological Methods, 24, 1–19. 10.1037/met0000195 [DOI] [PubMed] [Google Scholar]
- Baguley T (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100, 603–617. 10.1348/000712608X377117 [DOI] [PubMed] [Google Scholar]
- Banjanovic ES, & Osborne JW (2016). Confidence intervals for effect sizes: Applying bootstrap resampling. Practical Assessment, Research & Evaluation, 21, 1–18. [Google Scholar]
- Benichou J, & Gail MH (1989). A delta method for implicitly defined random variables. The American Statistician, 43, 41–44. [Google Scholar]
- Bollen KA, & Curran PJ (2006). Latent curve models: A structural equation perspective. Hoboken, NJ: Wiley. [Google Scholar]
- Borenstein M, Hedges LV, Higgins JPT, & Rothstein HR (2009). Introduction to meta-analysis. New York: Wiley. [Google Scholar]
- Cheung MW (2009). Comparison of methods for constructing confidence intervals of standardized indirect effects. Behavior Research Methods, 41, 425–438. 10.3758/BRM.41.2.425 [DOI] [PubMed] [Google Scholar]
- Chorpita BF, Daleiden EL, Park AL, Ward AM, Levy MC, Cromley T, … & Krull JL (2017). Child STEPs in California: A cluster randomized effectiveness trial comparing modular treatment with community implemented treatment for youth with anxiety, depression, conduct problems, or traumatic stress. Journal of Consulting and Clinical Psychology, 85, 13–25. 10.1037/ccp0000133 [DOI] [PubMed] [Google Scholar]
- Cohen J (1988). Statistical power analysis for the behavioral sciences (2nd ed). Hillsdale, NJ: Erlbaum. [Google Scholar]
- Cumming G (2013). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge. [Google Scholar]
- Dempster AP, Laird NM, & Rubin DB (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38. [Google Scholar]
- Efron B, & Tibshirani R (1993). An introduction to the bootstrap. Boca Raton, FL: Chapman & Hall. [Google Scholar]
- Feingold A (2009). Effect sizes for growth-modeling analysis for controlled clinical trials in the same metric as for classical analysis. Psychological Methods, 14, 43–53. 10.1037/a0014699 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feingold A (2013). A regression framework for effect size assessments in longitudinal modeling of group differences. Review of General Psychology, 17, 111–121. 10.1037/a0030048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feingold A (2015). Confidence interval estimation for standardized effect sizes in multilevel and latent growth modeling. Journal of Consulting and Clinical Psychology, 83, 157–168. 10.1037/a0037721 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feingold A (2017). Meta-analysis with standardized effect sizes from multilevel and latent growth models. Journal of Consulting and Clinical Psychology, 85, 262–266. 10.1037/ccp0000162 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feingold A (2019a). New approaches for estimation of effect sizes and their confidence intervals for treatment effects from randomized controlled trials. Quantitative Methods for Psychology, 15, 96–111. 10.20982/tqmp.15.2.p096 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feingold A (2019b). Time-varying effect sizes for quadratic growth models in multilevel and latent growth modeling. Structural Equation Modeling, 26, 418–429. 10.1080/10705511.2018.1547110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feingold A (in press). Growth curve modeling. In Asmundson GJG (Ed.), Comprehensive clinical psychology (2nd ed.). Amsterdam: Elsevier. Advance online publication. 10.1016/B978-0-12-818697-8.00014-5 [DOI] [Google Scholar]
- Feingold A, MacKinnon DP, & Capaldi DM (2019). Mediation analysis with binary outcomes: Direct and indirect effects of pro-alcohol influences on alcohol use disorders. Addictive Behaviors, 94, 26–35. 10.1016/j.addbeh.2018.12.018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feingold A, Tiberio SS, & Capaldi DM (2014). New approaches for examining associations with latent categorical variables: Applications to substance abuse and aggression. Psychology of Addictive Behaviors, 28, 257–267. 10.1037/a0031487 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Felder JN, Epel E, Lewis JB, Cunningham SD, Tobin JN, et al. (2017). Depressive symptoms and gestational length among pregnant adolescents: Cluster randomized control trial of CenteringPregnancy® plus group prenatal care. Journal of Consulting and Clinical Psychology, 85, 574–584. 10.1037/ccp0000191 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fleiss JL, & Berlin JA (2009). Effect sizes for dichotomous data. In Cooper H, Hedges LV, & Valentine JC (Eds.), The handbook of research synthesis (2nd ed., pp. 237–253). New York, NY: Sage. [Google Scholar]
- Goldstein H (2011). Multilevel statistical models (4th ed.). Hoboken, NJ: Wiley. [Google Scholar]
- Goodnight JA, Bates J,E, Holtzworth-Munroe A, Pettit GS, Ballard RH, et al. (2017). Dispositional, demographic, and social predictors of trajectories of intimate partner aggression in early adulthood. Journal of Consulting and Clinical Psychology, 85, 950–965. 10.1037/ccp0000226 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grimm KJ, Ram N, & Estabrook R (2016). Growth modeling: Structural equation and multilevel modeling approaches. NY: Guilford. [Google Scholar]
- Grissom RJ, & Kim JJ (2012). Effect sizes for research: Univariate and multivariate application (2nd ed.). New York: Routledge. [Google Scholar]
- Gueorguieva R, & Krystal JH (2004). Move over ANOVA: Progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry. Archives of General Psychiatry, 61, 310–317. 10.1001/archpsyc.61.3.310 [DOI] [PubMed] [Google Scholar]
- Hayes AF (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. New York: Guilford. [Google Scholar]
- Hedeker D, & Gibbons RD (2006). Longitudinal data analysis. Hoboken, NJ: Wiley. [Google Scholar]
- Hedges LV, Pustejovsky JE, & Shadish WR (2012). A standardized mean difference effect size for single case designs. Research Synthesis Methods, 3, 224–239. 10.1002/jrsm.1052 [DOI] [PubMed] [Google Scholar]
- Hinkelmann K, & Kempthorne O (2008). Design and analysis of experiments (Volume I): Introduction to experimental design (2nd. ed.). Hoboken, NJ: Wiley. [Google Scholar]
- Hodges JL, & Lehmann EL (2005). Basic concepts of probability and statistics (2nd. ed.). Philadelphia, PA: Society for Industrial and Applied Mathematics. [Google Scholar]
- Hox JJ, Moerbeek M, & van de Schoot R (2010). Multilevel analysis: Techniques and applications. New York: Routledge. [Google Scholar]
- Hosmer DW, & Lemeshow S (2000). Applied logistic regression (2nd ed.) New York: Wiley. [Google Scholar]
- Huttenlocher J, Haight W, Bryk A, & Seltzer M (1991). Early vocabulary growth: Relation to language input and gender. Developmental Psychology, 27, 236–248. 10.1037/0012-1649.27.2.236 [DOI] [Google Scholar]
- Jöreskog KG & Sörbom D (2006). LISREL 8.80 for Windows. Lincolnwood, IL: Scientific Software International, Inc. [Google Scholar]
- Kelley K, & Preacher KJ (2012). On effect size. Psychological Methods, 17, 137–152. 10.1037/a0028086 [DOI] [PubMed] [Google Scholar]
- Kuljanin G, Braun MT, & DeShon RP (2011). A cautionary note on modeling growth trends in longitudinal data. Psychological Methods, 16, 249–264. 10.1037/a0023348 [DOI] [PubMed] [Google Scholar]
- Kendall M, & Stuart A (1977). The advanced theory of statistics: Volume 1 (4th edition). New York: MacMillan. [Google Scholar]
- Lai MH (2020). Bootstrap confidence intervals for multilevel standardized effect size. Multivariate Behavioral Research, 10.1080/00273171.2020.1746902 [DOI] [PubMed]
- Lau RS, & Cheung GW (2012). Estimating and comparing specific mediation models in complex latent variable models. Organizational Research Methods, 15, 3–16. 10.1177/1094428110391673 [DOI] [Google Scholar]
- Leahey E, & Guo G (2001). Gender differences in mathematical trajectories. Social Forces, 80, 713–732. 10.1353/sof.2001.0102 [DOI] [Google Scholar]
- MacKinnon DP, Lockwood CM, & Williams J (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Behavioral Research, 39, 99–128. 99–128. 10.1207/s15327906mbr3901_4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marcoulides KM, & Khojasteh J (2018). Analyzing longitudinal data using natural cubic smoothing splines. Structural Equation Modeling, 25, 965–971. 10.1080/10705511.2018.1449113 [DOI] [Google Scholar]
- Masyn KE (2013). Latent class analysis and finite mixture modeling. In Little TD (Ed.), The Oxford handbook of quantitative methods, Volume2: Statistical analysis (pp. 551–611). New York: Oxford University Press. [Google Scholar]
- Maxwell SE, Delaney HD, & Kelley K (2018). Designing experiments and analyzing data: A model comparison perspective (3rd ed). New York: Routledge. [Google Scholar]
- Muthén BO, & Curran PJ (1997). General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods, 2, 371–402. [Google Scholar]
- Muthén BO, & Muthén LK (2000). The development of heavy drinking and alcohol-related problems from ages 18 to 37 in a U.S. national sample. Journal of Studies on Alcohol, 61, 290–300. [DOI] [PubMed] [Google Scholar]
- Muthén BO, Muthén LK,, & Asparouhov T (2016). Regression and mediation analysis using Mplus. Los Angeles, CA: Muthén & Muthén. [Google Scholar]
- Muthén LK, & Muthén BO (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling, 4, 599–620. 10.1207/S15328007SEM0904_8 [DOI] [Google Scholar]
- Muthén LK, & Muthén BO (2017). Mplus user’s guide (8th ed). Los Angeles, CA: Muthén & Muthén. [Google Scholar]
- Odgaard E,C, & Fowler RL (2010). Confidence intervals for effect sizes: Compliance and clinical significance in the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 78, 287–297. 10.1037/a0019294 [DOI] [PubMed] [Google Scholar]
- Parra-Cardona JR, Bybee D, Sullivan CM, Rodríguez MMD, Tams L, & Bernal G (2017). Examining the impact of differential cultural adaptation with Latina/o immigrants exposed to adapted parent training interventions. Journal of Consulting and Clinical Psychology, 85, 58–71. 10.1037/ccp0000160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pek J, & Flora DB (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological Methods, 23, 208–225. 10.1037/met0000126 [DOI] [PubMed] [Google Scholar]
- Preacher KJ, & Kelley K (2011). Effect size measures for mediation models: Quantitative strategies for communicating indirect effects. Psychological Methods, 16, 93–115. 10.1037/a0022658. [DOI] [PubMed] [Google Scholar]
- Preacher KJ, Wichman AL, MacCallum RC, & Briggs NE (2008). Latent growth modeling. Los Angeles, CA: Sage. 10.1037/a0022658 [DOI] [Google Scholar]
- Raudenbush SW, & Bryk AS (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed). Thousand Oaks, CA: Sage. [Google Scholar]
- Raudenbush SW, & Liu X (2001). Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychological Methods, 6, 387–401. [PubMed] [Google Scholar]
- Rosseel Y (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. 10.18637/jss.v048.i02 [DOI] [Google Scholar]
- SAS Institute Inc. (2011). SAS/STAT® 9.3 user’s guide. Cary, NC: SAS Institute Inc. [Google Scholar]
- Satorra A, & Saris W (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 51, 83–90. [Google Scholar]
- Rosenthal R, Rosnow RL, & Rubin DB (2000). Contrasts and effect sizes in behavioral research: A correlational approach. Cambridge,England: Cambridge University Press. [Google Scholar]
- Spybrook J, Raudenbush SW, Liu X, & Congdon R (2008). Optimal design for longitudinal and multilevel research: Documentation for the “Optimal Design” software. Unpublished manuscript, University of Michigan, Ann Arbor. [Google Scholar]
- Stice E, Rohde P, Shaw H, & Gau JM (2017). Clinician-led, peer-led, and internet- delivered dissonance-based eating disorder prevention programs: Acute effectiveness of these delivery modalities. Journal of Consulting and Clinical Psychology, 85, 883–895. 10.1037/ccp0000211 [DOI] [PMC free article] [PubMed] [Google Scholar]