Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Mar 1.
Published in final edited form as: Stata J. 2021 Mar 30;21(1):195–205. doi: 10.1177/1536867x211000022

Calculating level-specific SEM fit indices for multilevel mediation analyses

W Scott Comulada 1
PMCID: PMC8087284  NIHMSID: NIHMS1690647  PMID: 33935596

Abstract

Stata’s gsem command provides the ability to fit multilevel structural equation models (sem) and related multilevel models. A motivating example is provided by multilevel mediation analyses (ma) conducted on patient data from Methadone Maintenance Treatment clinics in China. Multilevel ma conducted through the gsem command examined the mediating effects of patients’ treatment progression and rapport with counselors on their treatment satisfaction. Multilevel models accounted for the clustering of patient observations within clinics. sem fit indices, such as the comparative fit index and the root mean squared error of approximation, are commonly used in the sem model selection process. Multilevel models present challenges in constructing fit indices because there are multiple levels of hierarchy to account for in establishing goodness of fit. Level-specific fit indices have been proposed in the literature but have not been incorporated into the gsem command. I created the gsemgof command to fill this role. Model results from the gsem command are used to calculate the level-specific comparative fit index and root mean squared error of approximation fit indices. I illustrate the gsemgof command through multilevel ma applied to two-level Methadone Maintenance Treatment data.

Keywords: st00!!, gsemgof, gsem, sem, multilevel, structural equation model, mediation analysis, fit index

1. Introduction

Stata’s generalized structural equation model (sem) command (gsem) extends the capabilities of the sem command (sem) to fit multilevel sem (Goldstein and McDonald 1988; Longford and Muthén 1992; Muthén 1994; Muthén and Satorra 1995) and related models to observations that are clustered across two or more levels of hierarchy. A motivating example is provided by Li et al. (2017) in their mediation analysis (ma) of baseline data from a randomized controlled trial. The trial evaluated an intervention designed to improve patient care provided by Methadone Maintenance Treatment (mmt) clinics in China. Patients at the first level were nested within clinics at the second level. Multilevel ma (mma) examined the mediating effects of patients’ treatment progression and rapport with counselors on their treatment satisfaction. The gsem command was used to conduct mma that accounted for patient observations at the first level clustered within clinics at the second level.

The sem command produces many popular fit indices to aid the model selection process, such as the comparative fit index (cfi; Bentler and Bonett 1980; Bentler 1990) and the root mean squared error of approximation (rmsea; Browne and Cudeck 1993; Steiger 1990). Multilevel models present challenges in constructing fit indices because there are multiple levels of hierarchy to account for in establishing goodness of fit. A lack of fit, especially at higher levels, can be masked if the levels are combined to produce a fit index. As a result, there is a lack of consensus on suitable fit indices for multilevel models. Two main approaches have been proposed in the literature, including the segregating procedure by Yuan and Bentler (2007) and level-specific model fit evaluation by Ryu and West (2009). At present, fit indices are not produced by the gsem command. No fit indices are presented by Li et al. (2017).

In this article, I show how the gsemgof command, a new command that uses results from models fit through the gsem command, can be used to produce level-specific cfi and rmsea fit indices. Both the level-specific and segregation procedures require additional computations to produce fit indices. The level-specific evaluation method is favored because it requires fewer programming steps and is less computationally intensive than the segregation procedure. We first review common fit indices in section 2. The level-specific evaluation method is discussed in section 3. Syntax for the gsemgof command is given in section 4. I then illustrate the level-specific evaluation method and the gsem command on mma applied to the mmt data in section 5. Discussion follows in section 6.

2. Standard SEM fit indices

The likelihood-ratio (lr) test serves as a basic fit index. The log likelihood (log ) for a hypothesized model with a given set of parameters θ is compared with the log for a saturated model with parameters θS. The saturated model has the maximum number of parameters allowed by the data so that the model is not overidentified. Examples of hypothesized and saturated models are given in section 5. Under model assumptions, such as multivariate normality and a large sample size, the lr test follows a central χ2 distribution and is expressed as

χHypothesized 2=2×{logl(θ^)logl(θ^S)} (1)

When the hypothesized model provides an exact fit in line with the saturated model, the χHypothesized 2 test statistic will take on a value of 0 in the population. Therefore, large χHypothesized 2 values with a p-value ≤ 0.05 indicate poor model fit. A problem with the lr test is sample-size sensitivity. For example, the lr test may yield larger p-values in smaller samples, even though the hypothesized model does not provide adequate fit. Numerous fit indices have been developed to scale the lr test so that the fit indices are less sample-size dependent. This article discusses the cfi and rmsea as two popular fit indices. Level-specific evaluation methods discussed in section 3 are applicable for the construction of other fit indices too.

The cfi incorporates both the χHypothesized 2 test statistic and a χBaseline 2 statistic that compares the log for a baseline model with the log for a saturated model. The baseline model fits means ànd variances for all variables and covariances between exogenous (independent) variables. No additional parameters are estimated, such as paths from exogenous variables to an outcome. The degrees of freedom (d.f.) for the hypothesized (d.f.Hypothesized) and baseline models (d.f.Baseline) are also incorporated into the cfi calculation. The cfi is expressed as

CFI=1Max{(χHypothesized2 d.f.Hypothesized),0}Max{(χBaseline 2d.f.Baseline),0} (2)

Larger cfi values indicate better model fit. As a rule of thumb, cfi values ≥ 0.90 indicate good model fit.

The rmsea is sometimes referred to as a “parsimony-adjusted” index because it adjusts the χHypothesized 2 test statistic by both the d.f. and the total sample size (N). The rmsea is expressed as

RMSEA=Max{(χHypothesized2d.f.Hypothesized d.f.Hypothesized (N1)),0} (3)

Similar to the chi-squared test statistic, rmsea values closer to 0 indicate better fit. rmsea values < 0.08 indicate good model fit.

3. Level-specific fit indices for multilevel models

For the sake of brevity, indices are discussed in terms of two-level models, but the method is applicable to multilevel models with additional levels. Chi-squared statistics for level-specific indices are formulated in a similar manner as the chi-squared test statistic shown in (1). Let χHypothesized,12 represent a chi-squared test statistic for a level-1 hypothesis, such as a hypothesized path between observed variables. Level-2 hypotheses correspond to random effects or latent variables, such as a hypothesized covariance between random effects.

Let θ1 and θ2 represent sets of parameters that correspond to level-1 and level-2 hypotheses, respectively. Let θ1,S and θ2,S represent sets of parameters for models that are saturated at levels 1 and 2, respectively. The χHypothesized,12 test statistic for a level-1 evaluation is

χHypothesized,12=2×{logl(θ^1,θ^2,S)logl(θ^1,S,θ^2,S)} (4)

The chi-squared test statistic for a level-2 evaluation is expressed as

χHypothesized,22=2×{logl(θ^1,S,θ^2)logl(θ^1,S,θ^2,S)}

The chi-squared test statistic for the baseline model that is used to calculate the level-specific cfi also incorporates a partially saturated model. For example, the log likelihood for a level-1 baseline model, logl(θ^1,θ^2,S), incorporates baseline parameters at level 1 (θ1) and parameters for a model that is saturated at level 2 (θ2,S). Level-specific cfi and rmsea formulas are similar to their standard formula counterparts in (2) and (3), respectively, except that level-specific chi-squared test statistics and d.f. replace the chi-squared statistics and d.f. for models that are not multilevel. The sample size is also adjusted in the level-specific rmsea formula. For level-1 and level-2 evaluations, N − 1 is replaced by NJ and J, respectively, where J is the number of level-2 clusters.

4. Syntax of gsemgof

The gsemgof command is run after an mma is fit to two-level data by using the gsem command. The mma should include an equation for a primary outcome and an equation for each mediator. Random effects need to be specified for the outcome in each equation. Importantly, level-1 fit indices require that level 2 is saturated, and vice versa for level-2 fit indices. The gsemgof command syntax is

gsemgof level [df]

where level is specified as 1 or 2 to request level-1 or level-2 fit indices, respectively. If level-2 fit indices are requested, df is specified to indicate the d.f. for the hypothesized model. Otherwise, df is left blank. The d.f. is automatically calculated by the gsemgof command for level-1 fit indices. Sample syntax for an mma with a single outcome and two mediators is shown in section 5.

The gsemgof command works by modifying gsem command-line text for the hypothesized model to create command lines for additional models that are needed to calculate fit indices, such as a model that is saturated at levels 1 and 2. Additional models are run using the gsem and sem commands. Stored results in e() and r(), such as log likelihoods, are extracted, saved as scalar quantities, and plugged into fit index formulas. The following steps occur when level-1 fit indices are requested (that is, when level is set to 1).

  1. Three numbers are extracted from e() and saved for later calculations: The total sample size (N); the total number of dependent variables, including the primary outcome and mediators; and the log likelihood for the hypothesized model [logl(θ^1,θ^2,S)].

  2. The name of the cluster variable is identified in the gsem command-line text and is used to calculate the number of clusters in the dataset (J).

  3. The gsem command fits an mma model that is saturated at levels 1 and 2. The log likelihood [logl(θ^1,S,θ^2,S)] that is stored in e() is extracted and saved for later calculations.

  4. The gsem command fits a level-1 baseline model that is saturated at level 2. The log is saved for later calculations.

  5. The d.f. for chi-squared test statistics is not produced by the gsem command. Appropriate d.f. can be obtained by first fitting the hypothesized model without level-2 random effects through the sem command. Afterward, fit statistics are requested using estat gof, and the d.f. for the hypothesized and baseline model evaluations are extracted from r(). Level 2 can be ignored in the calculation of the d.f. for level-1 tests because level 2 is saturated.

  6. Chi-squared statistics are calculated from log likelihoods by using (4).

  7. Finally, fit indices are calculated using stored chi-squared statistics, d.f., N, and J.

Calculations for level-2 fit indices follow the same steps except that the d.f. in step 5 cannot be retrieved through the sem command. The d.f. for the level-2 baseline chisquared statistic is calculated by the gsemgof command. The level-2 baseline d.f. is equal to the number of random-effects covariance parameters because they are not estimated and are assumed to be 0. Level-2 baseline model parameters that get estimated are the random-effects variances. Level 1 is saturated and does not factor into the d.f. calculations. If m is the number of random-effects variance parameters, then the level-2 baseline d.f. is m(m − 1)/2. The d.f. for the hypothesized model is specified in the gsemgof command by the user.

5. Level-specific fit index examples using the MMT data

The gsemgof command is illustrated through mma applied to the mmt data. There are 2,427 patient observations and 68 clusters in the dataset. Analyses modeled the mediating effects of both treatment progression and counselor rapport on treatment satisfaction. This section focuses on the relationship between treatment progression and satisfaction at levels 1 and 2. Figure 1 presents diagrams for mma that are evaluated. Diagrams were produced through Stata’s sem Builder visualization tool. Two level-1 evaluations are conducted. Figure 1a represents the model presented in Li et al. (2017). A path from treatment progression to satisfaction is modeled at level 1 through progression (progress) and satisfaction variables measured at the patient level. A path is also included from counselor rapport (rapport) to satisfaction at level 1.

Figure 1.

Figure 1.

mma diagrams for predictors of treatment satisfaction mediated by treatment progression and counseling rapport. Diagram 1a represents the hypothesized model in Li et al. (2017). Diagram 1b represents an mma that excludes the path from treatment progression to satisfaction (p-s); diagram 1c represents an mma that sets the random-effects covariance (r.e. cov) between treatment progression and satisfaction to 0; and diagram 1d represents an mma that fits a saturated model where all possible r.e. covariances at level 2 and paths at level 1 are modeled. In line with standard sem diagram representation, covariances between exogenous variables (for example, between age and male gender) are modeled but not represented in the diagrams.

The relationship between treatment progression and satisfaction is modeled at level 2 through the covariance between clinic-level random effects for treatment progression and satisfaction. Level 2 is saturated. The model includes random effects for both mediators and the outcome and three covariances between all possible pairs of random effects. There are six exogenous variables in the model: age, a yes-no indicator for male gender, a yes-no indicator for being married, the duration of mmt treatment (mmt_duration), a depressive symptoms score (depression), and a social and environmental support score (env_support). Figure 1b excludes the path from treatment progression to satisfaction.

A single level-2 evaluation is conducted. As shown in figure 1c, level 1 is saturated, and the covariance between random effects for treatment progression and satisfaction is fixed at 0. Models represented by figures 1a, 1b, and 1c are compared with a model that is fully saturated at levels 1 and 2 (figure 1d). Models are fit using Stata/SE 16. Syntax for level-1 and level-2 fit indices is presented in sections 5.1 and 5.2, respectively. Results are presented in section 5.3.

5.1. Level-1 fit indices

In this section, we view the syntax for level-1 evaluations of models represented by diagrams 1a and 1b in figure 1. I first read in a dataset containing the mmt data.

. use mmt

I then fit the hypothesized model in diagram 1a by using the gsem command. A grouping variable (clinic) is used to index random effects by clinic. The gsemgof command follows each gsem command with level set to 1.

. // Model (a): Hypothesized model
. gsem (satisfaction <- progress rapport age male married depression
> env_support Rs[clinic])
> (progress <- age male mmt_duration env_support Rp[clinic])
> (rapport <- age male depression env_support Rr[clinic]),
> cov(e.progress*e.rapport)
(output omitted)
. gsemgof 1
(output omitted)
. // Model (b): Hypothesized model without path from progress to satisfaction
. gsem (satisfaction <- rapport age male married depression
> env_support Rs[clinic])
> (progress <- age male mmt_duration env_support Rp[clinic])
> (rapport <- age male depression env_support Rr[clinic]),
> cov(e.progress*e.rapport)
(output omitted)
. gsemgof 1
(output omitted)

5.2. Level-2 fit indices

In this section, we view syntax for a level-2 evaluation of the model represented in diagram 1c of figure 1. The random-effects covariance structure for the hypothesized model is specified through covariance matrix Vre and incorporated into the gsem model specification. The gsemgof command follows with level set to 2. In contrast to requests for level-1 tests, the user specifies the d.f. for the χHypothesized,22 test statistic. Referring to the level-2 hypothesized model in diagram 1c and the fully saturated model in diagram 1d, the hypothesized model has one fewer parameter. Therefore, the d.f. is set to 1. The gsemgof command automatically sets the d.f. for the χBaseline,22 test statistic to 3, the number of random-effects covariance parameters [that is, m(m−1)/2 = 3(2)/2 = 3].

. // Model (c): Level 2 hypothesis, no random-effects covariance between
> progress and satisfaction
. matrix Vre = (., 0, . \ 0, ., . \ ., ., .)
. gsem (satisfaction <- progress rapport age male married mmt_duration
> depression env_support Rs[clinic])
> (progress <- age male married mmt_duration depression env_support
> Rp[clinic])
> (rapport <- age male married mmt_duration depression env_support
> Rr[clinic]), covstr(Rs[clinic] Rp[clinic] Rr[clinic],
> fixed(Vre)) cov(e.progress*e.rapport)
(output omitted)
. gsemgof 2 1
(output omitted)

5.3. Results

Table 1 shows fit indices based on models fit to the mmt data. Indices in the left-hand column are based on ma that ignore clustering at level 2. Models are fit through the sem command. Indices in the right-hand column are based on mma conducted through the gsem command. Fit indices are calculated by the gsemgof command.

Table 1.

Model fit indices obtained from ma and mma conducted on mmt study data. Level-1 indices evaluate fit for models with and without paths from treatment progression to satisfaction (p-s). Level-2 indices evaluate fit for a multilevel model where the random-effects covariance between treatment progression and satisfaction is fixed at 0.

ma mma
Level 2: Saturated
Level 1: p-s path
χ2 (5) 8.524 10.308
cfi 0.999 0.999
rmsea 0.017 0.021
Level 1: No p-s path
χ2 (6) 229.650** 152.144**
cfi 0.931 0.998
rmsea 0.124 0.102
Level 1: Saturated
Level 2: r.e. cov(p,s) = 0
χ2 (1) 4.334*
cfi 0.846
rmsea 0.221

note:

*

p < 0.05;

**

p < 0.01

Overall, standard and level-1 fit index values are similar. For level-1 evaluations, both the standard fit indices in the left-hand column and the level-1 fit indices in the right-hand column indicate that the model fitting a path from treatment progression to satisfaction provides a better fit than the model where the path is fixed at 0. The chi-squared test statistics and rmsea are smaller for the model with the path. The cfi is larger for the model with the path, but neither the standard cfi nor the level-1 cfi values are small enough to indicate a lack of fit for the model without the path.

Table 1 highlights a key feature of the level-specific evaluation method in its ability to evaluate level-2 fit. Level-2 fit indices indicate that the covariance between random effects for treatment progression and satisfaction should be modeled. A model fixing the covariance at 0 results in a statistically significant chi-squared test at level 2. The cfi value of 0.846 and the rmsea value of 0.221 are also outside the ranges of values that indicate good model fit.

6. Discussion

In this article, I showed how the gsemgof command can be used to calculate level-specific cfi and rmsea fit indices. I illustrated calculations on mma applied to the mmt data. Standard fit indices obtained from ma that did not include random effects and level-1 fit indices obtained from mma produced similar values. This is not completely surprising. Simulation studies by Ryu and West (2009) showed that level 1 tends to dominate in determining fit index values. However, it should not be assumed that the standard and level-specific approaches will always lead to similar conclusions for level-1 evaluations. Moreover, level-2 evaluations cannot be conducted using standard fit indices. In regard to the mmt data, level-specific indices indicated that relationships between treatment progression and satisfaction should be captured at both the patient (level 1) and clinic (level 2) levels.

Given the benefits of the level-specific evaluation method, a few caveats are in order. The d.f. is an important component of the fit index calculations and may need to be calculated by hand for more-complex models than the mma that were presented in this article. We refer to d.f. for level-1 tests shown in table 1 to illustrate hand calculations. The χHypothesized,12 test statistic based on diagram 1a of figure 1 has 5 d.f. for five possible paths from exogenous variables to mediators and the outcome that are excluded from figure 1a. The level-1 evaluation for diagram 1b excludes an additional path from treatment progression to satisfaction and has 6 d.f.

The d.f. for the χBaseline,12 test (not shown in table 1) is the difference in the number of parameters between the partially saturated baseline model and the fully saturated model in diagram 1d. The partially saturated baseline model excludes 18 paths from exogenous variables to mediators and the outcome, two paths from mediators to the outcome, and a covariance between the two mediators. The d.f. is 21. An alternative method to count d.f. is outlined in Acock (2013). Between the six exogenous variables, two mediators, and an outcome, there are nine variables contributing to the variance–covariance matrix for the fully saturated model. The variance–covariance matrix has 9(10)/2 = 45 parameters. The baseline model has 24 variance and covariance parameters, including 21 parameters for the six exogenous variables, a variance for each mediator, and an outcome variance. The χBaseline2 test has 45 − 24 = 21 d.f. General guidelines for calculating the d.f. for sem can be found in Rigdon (1994) and Acock (2013).

The rmsea formula also incorporates the sample size that is set to NJ for level-1 evaluations and J for level-2 evaluations. Balance in the number of observations per cluster is implied by the formula. I was fortunate in that the mmt study design aimed to sample the same number of patients per clinic. Moreover, missing data was minimal. As a result, there was a large degree of balance in the number of patients per clinic. More complex rmsea formulas may be needed to adequately assess model fit in the presence of large imbalances in the number of observations per cluster.

7. Acknowledgments

My time was supported by National Institutes of Health grants (P30MH058107; U19HD089886). mmt data were collected through a National Institute on Drug Abuse grant (R01DA033130).

About the author

W. Scott Comulada is an associate professor-in-residence in the Department of Psychiatry and Biobehavioral Sciences and in the Department of Health Policy and Management at the University of California, Los Angeles. He is also a methods core co-director for the Center for hiv Identification, Prevention and Treatment Services (chipts; P30MH058107) and an analytic core project lead for an Adolescent Trials Network U19 (U19HD089886).

Footnotes

8

Programs and supplemental materials

To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type
. net sj 21–1
. net install st00!! (to install program files, if available)
. net get st00!! (to install ancillary files, if available)

9 References

  1. Acock AC 2013. Discovering Structural Equation Modeling Using Stata. Rev. ed College Station, TX: Stata Press. [Google Scholar]
  2. Bentler PM 1990. Comparative fit indexes in structural models. Psychological Bulletin 107: 238–246. 10.1037/0033-2909.107.2.238. [DOI] [PubMed] [Google Scholar]
  3. Bentler PM, and Bonett DG. 1980. Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin 88: 588–606. 10.1037/0033-2909.88.3.588. [DOI] [Google Scholar]
  4. Browne MW, and Cudeck R. 1993. Alternative ways of assessing model fit. In Testing Structural Equation Models, ed. Bollen KA and Long JS, 136–162. Newbury Park, CA: SAGE. [Google Scholar]
  5. Goldstein H, and McDonald RP. 1988. A general model for the analysis of multilevel data. Psychometrika 53: 455–467. 10.1007/BF02294400. [DOI] [Google Scholar]
  6. Li L, Comulada WS, Lin C, Hsieh J, Luo S, and Wu Z. 2017. Factors related to client satisfaction with methadone maintenance treatment in China. Journal of Substance Abuse Treatment 77: 201–206. 10.1016/j.jsat.2017.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Longford NT, and Muthén BO. 1992. Factor analysis for clustered observations. Psychometrika 57: 581–597. 10.1007/BF02294421. [DOI] [Google Scholar]
  8. Muthén BO 1994. Multilevel covariance structure analysis. Sociological Methods & Research 22: 376–398. 10.1177/0049124194022003006. [DOI] [Google Scholar]
  9. Muthén BO, and Satorra A. 1995. Complex sample data in structural equation modeling. Sociological Methodology 25: 267–316. 10.2307/271070. [DOI] [Google Scholar]
  10. Rigdon EE 1994. Calculating degrees of freedom for a structural equation model. Structural Equation Modeling: A Multidisciplinary Journal 1: 274–278. 10.1080/10705519409539979. [DOI] [Google Scholar]
  11. Ryu E, and West SG. 2009. Level-specific evaluation of model fit in multilevel structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal 16: 583–601. 10.1080/10705510903203466. [DOI] [Google Scholar]
  12. Steiger JH 1990. Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research 25: 173–180. 10.1207/s15327906mbr2502_4. [DOI] [PubMed] [Google Scholar]
  13. Yuan K-H, and Bentler PM. 2007. Multilevel covariance structure analysis by fitting multiple single-level models. Sociological Methodology 37: 53–82. 10.1111/j.1467-9531.2007.00182.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES