Design Effect in Multilevel Settings: A Commentary on a Latent Variable Modeling Procedure for Its Evaluation

Tenko Raykov; Christine DiStefano

doi:10.1177/00131644211019447

. 2021 Jun 4;82(5):1020–1030. doi: 10.1177/00131644211019447

Design Effect in Multilevel Settings: A Commentary on a Latent Variable Modeling Procedure for Its Evaluation

Tenko Raykov ^1,^✉, Christine DiStefano ²

PMCID: PMC9386880 PMID: 35989726

Abstract

A latent variable modeling-based procedure is discussed that permits to readily point and interval estimate the design effect index in multilevel settings using widely circulated software. The method provides useful information about the relationship of important parameter standard errors when accounting for clustering effects relative to conducting single-level analyses. The approach can also be employed as an addendum to point and interval estimation of the intraclass correlation coefficient in empirical research. The discussed procedure makes it easily possible to evaluate the design effect in two-level studies by utilizing the popular latent variable modeling methodology and is illustrated with an example.

Keywords: design effect, interval estimation, intraclass correlation coefficient, latent variable modeling, multilevel setting, two-level study.

Multilevel studies have markedly increased in use in the educational and psychological sciences over the past several decades (e.g., Hox et al., 2018). This trend can be seen as a consequence of the wide-spread realization that collecting clustered data in these and related disciplines may be viewed at present as the rule rather than the exception (e.g., Raudenbush & Bryk, 2002). A main feature of such studies is the fact that the traditional assumption of independence of the units of analysis—which is a key premise underlying many traditional statistical methods—is violated due to subjects being nested within higher order units (e.g., Goldstein, 2011). When this independence assumption does not hold, standard or conventional (ordinary) methods based on it can be associated with seriously misleading results (e.g., Rabe-Hesketh & Skrondal, 2012).

Two-level models represent currently the majority of analytic means used in educational and psychological research with multilevel data. These methods permit the examination of potentially strong clustering effects resulting from the nesting of Level-1 units within Level-2 aggregates (clusters, groups). Thereby, unconditional versions of the models allow one to evaluate the degree of lack of independence (uncorrelatedness) of subject outcome scores within Level-2 units using the intraclass correlation coefficient (ICC; e.g., Hox et al., 2018). Recently, Raykov et al. (2017) discussed a related index defined as the ratio of between to within variance that may provide additional information regarding the extent to which between-group differences overwhelm within group-variability, or conversely. Neither of these two indices, however, directly informs an empirical researcher about the degree to which standard errors (SEs) of relevant model parameters and associated confidence intervals (CIs) could be affected (namely, deflated and made spuriously shorter, respectively) if the nesting effects are not accounted for. This degree is captured by the concept of design effect (DEFF; e.g., B. O. Muthén & Satorra, 1995, and references therein), which is the focus of the following discussion.

The present article is concerned with a latent variable modeling (LVM)-based procedure for the evaluation of a DEFF index in multilevel settings. The method permits its point and interval estimation and is readily applicable in two-level studies. The index informs about the relationship of important parameter SEs if accounting for clustering effects, relative to carrying out single-level analyses instead, that is, disregarding these effects. The goal of the article is to make it easier and routinely possible for applied educational and behavioral researchers to obtain with widely circulated software CIs of DEFFs in two level studies that are frequently conducted in educational and psychological research. The discussed approach is illustrated with an empirical example.

Background, Notation, and Assumptions

Throughout this note, we denote by y_ij the score on a response variable under consideration for the ith studied subject (level-1 unit) in the jth group (Level-2 unit; i = 1, . . ., n_j, j = 1, . . ., J; cf., e.g., Hox et al., 2018; J > 1, n_j > 1, j = 1, . . ., J). In two-level settings, Level-1 units (e.g., students, patients, clients, interviewees, or respondents) are clustered within Level-2 units (e.g., schools, clinicians, councilors, interviewers, or neighborhoods). For the sake of convenience in what follows, we assume that y_ij is an (approximately) continuous outcome measure (see also the Conclusion section for extensions and Raykov & Marcoulides, 2015).

Using widely adopted notation, the following response score related decompositions always hold (presuming the existence of the group and grand means that is essentially always, if not routinely, ensured in contemporary educational and behavioral research; e.g., Raudenbush & Bryk, 2002):

y_{i j} = β_{0 j} + r_{i j}

(1)

and

{β_{0}}_{j} = γ_{00} + u_{0 j},

(2)

where r_ij is the deviation of y_ij from the outcome mean β_0j of the jth group, while u_0j is the discrepancy between the jth group mean and the grand mean, γ₀₀, which is at times also referred to as group effect (i = 1, . . ., n_j, j = 1, . . ., J). We assume as usual that the Level-1 mean difference r_ij is normally distributed with mean 0 and variance σ² as well as uncorrelated with the Level-2 residual u_0j and denote by τ₀₀ the group effect variance (frequently also referred to as between-cluster or between-group variance; i = 1, . . ., n_j, j = 1, . . ., J).

The ICC, denoted ρ, is defined as the ratio of between-group variance to observed outcome variance

ρ = τ_{00} / (τ_{00} + σ^{2}),

(3)

and may also be interpreted as the degree of violation of the single-level modeling assumption of independent units of analysis (e.g., Rabe-Hesketh & Skrondal, 2012). The ICC, ρ, as well as the within-cluster variance, σ², are presumed positive throughout the remainder of this note.

Design Effect in Two-Level Studies

A main part of the motivation for using two-level rather than single-level analysis (i.e., one not accounting for clustering effects) is the realization that in the latter case the SEs tend to be spuriously small (cf. Rabe-Hesketh & Skrondal, 2012). This yields potentially seriously misleading statistical results, such as spuriously short (narrow) CIs as well as deflated SEs and p values for hypothesis tests. Such results in effect leave the impression of (unduly) high estimation precision, with all adverse consequences for ensuing statistical results and their substantive interpretation (e.g., Hox et al., 2018). For this reason, the question naturally arises as to how to quantify parameter SE deflation when using single-level or standard methods (e.g., for instance, ordinary least squares) in lieu of two-level modeling.

An index that accomplishes this aim with regard to the response mean is the DEFF (e.g., Kish, 1965). Prior discussions in the methodological literature have also been concerned with similar indices for other parameters, such as regression slopes (e.g., B. O. Muthén & Satorra, 1995), which will not be pursued in this article. The DEFF, denoted d in the remainder, represents the ratio of the squared SE (i.e., the sampling variance) of the grand mean estimate when evaluated within an explicit two-level setting, to its squared SE (i.e., sampling variance) if this parameter is evaluated disregarding the clustering effects and proceeded instead with traditional or standard, that is, single-level modeling. The DEFF is also representable as follows (e.g., Hox et al., 2018):

d = 1 + ρ (c - 1),

(4)

where c is the common cluster size. In two-level settings with differing cluster sizes (i.e., when n₁, . . ., n_J are not all the same integer number), the following useful DEFF version has been advanced as well (using for simplicity the same notation):

d = 1 + ρ (n / J - 1),

(5)

where n = n₁+. . . +n_J is the overall study sample size (cf. Kish, 1965, p. 162).

As one can readily see from Equations 4 and 5, the DEFF, d, is not bounded from above. Also, by definition the DEFF represents the number of times that the squared SE of the outcome grand mean when using single-level analysis (i.e., disregarding the clustering effects) is smaller than the squared SE for this parameter if estimated using two-level modeling (and thus taking into account the clustering effects). Since the DEFF reflects an important aspect of the loss in quality of estimation when using incorrectly a standard (ordinary) method in two-level settings, it may well be argued that this index represents a relevant quantity to also evaluate in multilevel educational and psychological studies. For this reason, point and especially interval estimation of the DEFF is of concern in this commentary. Its main goal is to contribute to promoting its regular evaluation in two-level empirical educational and psychological studies, based on straightforward applications of popular statistical modeling software.¹

Evaluation of the Design Effect Index

Point and Interval Estimation

As discussed in more detail for instance in Raykov (2011), the (fully) unconditional two-level model defined by Equations 1 and 2 involves two latent variables. This can be readily noticed by realizing that none of the variables u_0j or r_ij appearing in their right-hand sides is measured or observed (observable), unlike the recorded score y_ij that is measured in the studied Level-1 units, while u_0j and r_ij are both random variables (i = 1, . . ., n_j, j = 1, . . ., J; cf. Raykov et al., 2017). Furthermore, each corresponding Level-2 or Level-1 unit is associated with an individual realization of this random variable u_0j or r_ij, respectively (i = 1, . . ., n_j, j = 1, . . ., J). Hence, one can view Equations 1 and 2 as defining a latent variable model (cf. B. O. Muthén, 2002).

As such, this model can be fitted to data using popular LVM software, such as Mplus (L. K. Muthén & Muthén, 2021). In this way, one obtains point estimates of the between- and within-group variances, τ₀₀ and σ², respectively. With these estimates, through their corresponding substitution while employing Equation 5 in general (or preferably Equation 4 assuming applicable, with a common cluster size) one furnishes a point estimate of the DEFF d (see also Note 1). This would furnish the maximum likelihood (ML) DEFF estimate if ML is employed for parameter estimation (when applicable), due to the invariance property of the ML method of parameter estimation (e.g., Casella & Berger, 2002). This activity is readily carried out using the widely available LVM software Mplus via the introduction of the DEFF as an external parameter that is itself not a model parameter but represents a nonlinear function of such (see Appendix A for details on the software implementation, and the following section for an example illustration). This Mplus application has also the advantage of allowing one to readily account for some violations of the conventional normality assumption. In addition, its application provides a natural connection to more complicated latent variable models, which may be subsequently fitted to the overall data set in order to address research questions involving the examined outcome variable(s) and predictors possibly measured with error (e.g., Raykov & Marcoulides, 2006). Next, given potential violations of the normality assumption generally found in behavioral and social research, it may arguably be recommendable to use the bootstrap approach in order to obtain an approximate large-sample confidence interval (abbreviated as ACI in what follows) at a prespecified 100(1 −α)% level (0 < α < 1) of the DEFF index in a given empirical study, under the assumption of large initial sample size n and number of resamples (e.g., Efron & Tibshiriani, 1993; see also Das Gupta, 2008; notice that the DEFF is a continuously differentiable function of the model parameters involved in its definition, as in Equation 4, implying the efficacy of the bootstrap approach). For convenience, denote this interval as

(d_{l}, d_{u}),

(6)

where d_l and d_u symbolize its lower and upper endpoints, respectively. To furnish the ACI (Equation 6), one can employ the popular software Stata (see Appendix B for the needed source code, and a following section for an illustrative application).

Design Effect Point and Interval Estimate Interpretation and Their Utility in Empirical Educational and Psychological Research

As indicated earlier, the DEFF is defined as the number of times the correct sampling variance of the grand mean estimate on a response variable is larger than its spuriously small sampling variance obtained if disregarding the hierarchical nature of the data and proceeding instead with single-level analysis (e.g., Kish, 1965). This represents also the way in which the DEFF point estimate is to be interpreted in an empirical setting with two-level data. Furthermore, the DEFF ACI in Equation 6, (d_l, d_u), which is obtained as outlined above, can be interpreted as providing an approximate range of highly plausible (at the 100(1 −α)% confidence level) candidate values for the DEFF index in a studied population with a given, common Level-2 unit size (c, or perhaps alternatively average size n/J when group sizes are fairly similar; see Note 1). A more pragmatic and numerically oriented interpretation of this interval, is as such enclosing the “middle” 100(1 −α)% of the resample-based DEFF point estimates (Equation 5) when resampling with replacement, which is obtained from a given empirical study data set (under the assumptions pointed out in the preceding subsection; see also StataCorp, 2019, for specifics of the bootstrapping process used then, and Note 1).

An apparent rule of thumb that has been referred to in the literature, may be interpreted as suggesting that a multilevel model may not be needed when d < 2 (Lai & Kwok, 2015; Maas & Hox, 2004, 2005; Peugh, 2010). While in our opinion there is no widely accepted and rigorously justified, generally applicable rule of thumb in relation to interpreting in substantive terms the point estimate of the DEFF (or its positive square root yielding the ratio of respective estimator SEs), one may argue as follows in relation to the interpretation of its ACI (Equation 6). Specifically, if the lower endpoint of say the 95% ACI (Equation 6) is above a substantively meaningful number q > 1 in a given domain of application, then one may suggest that it may be practically rather plausible that the SE of the outcome mean could be unduly deflated at least √q times relative to its correct SE if the clustering effects were accounted for, in case of a common cluster size (c; the symbol “√” is used to denote positive square root). For instance, taking q = 2 say, this interpretation could then be as suggesting that practically it is rather plausible that the SE of the response mean disregarding the clustering effects could be spuriously at least 1.41 (= √2) times smaller than its proper SE, resulting when accounting for them.

We demonstrate next on empirical data the discussed procedure for DEFF evaluation in two-level studies.

Application on Data

For our illustration purposes here, we use adapted data from an educational research study conducted by Mortimore et al. (1988; the data set employed below can be obtained from the authors on request). To this end, we utilize the student scores on a Raven’s intelligence test from a random sample of n = 1,192 students attending J = 49 schools randomly sampled from the greater London (United Kingdom) area. We commence by fitting the (fully) unconditional model (1) and (2) (cf. Raykov, 2011), introducing as an external parameter the DEFF, d, via corresponding model parameter constraints representing Equation 5. (The needed Mplus source code is provided in Appendix 1, with annotating comments following an exclamation mark within pertinent command line.) The fit of this latent variable model is perfect (to the mean and covariance structure), since it is saturated and owing to the fact that the imposed constraint for defining this external parameter does not impact its fit: chi-square value (χ²) = 0 with degrees of freedom (df) = 0 (as found in the “Model Fit Information” output section produced by the software) The point estimate of the DEFF results thereby as 2.946 (and is provided in the “Model Results” section by the software). This suggests that the Raven test score mean’s SE would be deflated approximately 1.716 (= √2.946) times in this data set if disregarding the clustering effects. The latter can be seen as indicated in the analyzed data set due to the ICC being also estimated with this LVM-based approach at .083 (as supplied in the “New/Added Parameter” section by the software). Moreover, using on this estimate and its SE of .026—also provided in that output section—the method outlined in Raykov, 2011, renders a 95% CI for the ICC as (.044, .150). Since this CI is mostly greater than .05, one may well suggest that there is added evidence for notable clustering effects (Asparouhov, 2021, personal communication), which thus need to be accounted for or modeled in following analyses with Raven’s test score as outcome variable.

To obtain further evidence for such notable clustering effects, the ACI (Equation 6) for the DEFF is next obtained using the bootstrap module of the popular software Stata (StataCorp, 2019; Appendix 2 contains the two Stata commands needed for accomplishing this aim.) The 95% bias-corrected ACI for the DEFF is furnished thereby as (1.917, 3.697). One could interpret this finding as suggesting that in a two-level study with constant Level-2 unit size approximately equal to n/J in the present one, with same other characteristics as the latter, the used test mean’s SE when accounting for the clustering effects could practically highly plausibly be expected to be no less than 1.38 (= √1.917) times larger than its SE and as much as nearly twice that SE, if ignoring these effects (see also Note 1). Hence, in light of (1) what appears to be a frequently referred to “rule-of-thumb” advocating use of two-level modeling when DEFF > 2 (e.g., Lai & Kwok, 2015; Maas & Hox, 2004, 2005; Peugh, 2010), (2) the last finding of the 95% CI of the DEFF being positioned essentially above 2, and (3) the same-level CI for the ICC being mostly greater than .05, it may be suggested that a two-level rather than a single-level approach be used in ensuing analyses involving the Raven test as an outcome measure.

Conclusion

This note was concerned with a readily applicable LVM-based procedure for evaluation of a DEFF index in multilevel studies using popular and widely circulated statistical software. A two-level analysis approach was outlined for point and interval estimation of the ratio of the sampling variance of the response mean accounting for clustering effects relative to its sampling variance when they are disregarded. The discussed method is useful in two-level educational and behavioral studies also as a complement to the popular ICC. The procedure can be recommended for regular use in applied work, based on its direct applicability, as it permits empirical researchers to reach more informed conclusions about the extent to which ignoring violations of the classical assumption of independence of units of analysis may be consequential for the precision with which the outcome mean, and possibly other related parameters, is estimated.

The discussed method possesses several limitations that need to be noted. While it could be argued that up to mild violations of normality could be handled with this procedure using robust ML estimation (as in the illustration section; see also Appendix 1; cf. L. K. Muthén & Muthén, 2021), further research is needed before more confidence could be placed in such a conjecture and the extent as well as conditions under which it could be trustworthy in empirical research. Moreover, the approach is best used with continuous outcomes and large samples, due to the fact that it rests on ML estimation and bootstrapping that themselves are grounded in an asymptotic statistical theory (e.g., Casella & Berger, 2002; Efron & Tibshiriani, 1993). We emphasize the relevance of this asymptotic assumption, specifically with respect to the overall study sample size as well as number of resamples (and where applicable number of Level-2 units), due to the pertinent interval (Equation 6) being based on the bootstrap method that itself necessitates this requirement (e.g., Das Gupta, 2008). We encourage future research concerned with the development of possible guidelines that could be followed in evaluating needed sample sizes at which one could rely in practice on the underlying asymptotic theory (see also Note 1). Moreover, while the method could be trustworthy with discrete outcome variables having a considerable number of response options (e.g., in the double digits, with distributions exhibiting limited asymmetry if any notable) and robust ML estimation, further studies are needed in order to shed light on such a suggestion and its specifics that are yet to be clarified. Last but not least, the present article does not intend to imply that the bootstrap-based CI for the DEFF discussed in it is the only possible, viable, or available such interval (cf. Arnab, 2017, Chapter 25). Rather, as indicated earlier, the goal of this note is to comment on the contemporary ready availability of this CI in educational and behavioral research using popular and widely circulated software, thus contributing to the recommendation for regular interval estimation of DEFFs in empirical two-level research.

In conclusion, this commentary provides educational, behavioral, and social science researchers with a readily applicable means using popular software for evaluation of an informative design effect index reflecting the extent of important parameter SE and CI deflation if disregarding clustering effects in two-level studies.

Acknowledgments

We are grateful to G. A. Marcoulides for helpful comments on an earlier version of the article that have contributed considerably to its improvement, to two anonymous referees for their constructive criticism, and to B. Muthén, S. Penev, P. Doebler and U. Pötter for valuable discussions on estimation of design effect indices.

Appendix A

Mplus Source Code for Evaluation of the Design Effect Index in Two-Level Studies

TITLE: DEFF ESTIMATION IN TWO-LEVEL STUDIES.

DATA: FILE = <name of raw data file>;

VARIABLE: NAMES = SCH_ID STU_ID GENDER Y M1-M3 PH1-PH3 U; ! variable names

USEVARIABLE = Y;

CLUSTER = SCH_ID; ! name of level-2 identifier

MISSING = ALL(-9); ! missing value flag used in the data file

ANALYSIS: TYPE = TWOLEVEL; ! requests two-level analysis

ESTIMATOR = MLR;

MODEL:

%WITHIN% ! defines the level 1 model (within-group model)

Y (P1); ! assigns a parametric symbol to within-group variance

%BETWEEN% ! defines the level 2 model (between-group model)

Y (P2); ! assigns a parametric symbol to between-group variance

MODEL CONSTRAINT:

NEW (DEFF); ! DEFF formally defined next as an external parameter

DEFF = 1+(1192/49-1)*P2/(P1+P2); ! see Equation (5)

Note 1. After the descriptive title, the name of the raw data file is stated, and the outcome variable of interest is selected with the subcommand USEVARIABLE (cf. Raykov et al., 2011). The MODEL command first defines the within-group model (level-1 model) and then the between-group model (level-2 model; see Equations 1 and 2), assigning thereby parameter symbols used subsequently for defining and estimating the external parameter of relevance, the DEFF d, in the MODEL CONSTRAINT section following the corresponding Equation 5. (The DEFF index is per se not a model parameter, but a nonlinear function of model parameters.) The remaining commands are self-explanatory or readily found explicated in L. K. Muthén & Muthén, 2021).

Note 2. Use the two Stata commands in Appendix B after fitting the unconditional two-level model as above in this appendix to furnish the bootstrap-based interval (Equation 6) for the DEFF index of concern (see main text).

Appendix B

Stata Source Code for Interval Estimation of the DEFF in Two-Level Studies

. bootstrap DEFF=(1+(1192/49-1)*(_b[sigma_u:_cons]^2/(_b[sigma_u:_cons]^2+_b[sigma_e:_cons]^2))), reps(2000) seed(160957): xtreg iq, vce(cluster cid) mle

. estat bootstrap, bca

Note. The dot at the start of each of these two commands is the Stata prompt. A total of r = 2000 resamples (or more) with replacement could be recommended to take, as in the first command here, from the analyzed data set (Efron & Tibshiriani, 1993; for specifics of the bootstrap process implementation, see StataCorp, 2019). The use then of the post-estimation command “estat bootstrap, bca” furnishes the bias-corrected accelerated bootstrap CI (at the 95% confidence level, by default), as utilized in the illustration section.

^1.

The interpretation of the DEFF in Equation 4, which underlies this note, is as an index of relevance for an empirical study with constant cluster size, c. The inferential procedure discussed in the following section is based on an approximation to the sampling distribution of the DEFF for constant cluster size, which approximation is obtained via resampling with replacement from the analyzed data set under the assumption of large initial sample size, n, as well as number of resamples (and where applicable number of Level-2 units; cf. StataCorp, 2019; see also Conclusion section). Throughout this note, the DEFF index in Equation 5 is used merely as a “serviceable approximation” to the DEFF in Equation 4 (Kish, 1965, p. 162).

Footnotes

Declaration of Conflicting Interests: The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs: Tenko Raykov Inline graphic https://orcid.org/0000-0002-8911-5116

Christine DiStefano Inline graphic https://orcid.org/0000-0001-7504-6554

References

Casella G., Berger J. O. (2002). Statistical inference. Wadsworth. [Google Scholar]
Das Gupta A. (2008). Asymptotic theory of statistics and probability. Springer. [Google Scholar]
Efron B., Tibshiriani R. J. (1993). An introduction to the bootstrap. Chapman & Hall. 10.1007/978-1-4899-4541-9 [DOI]
Goldstein H. (2011). Multilevel statistical models. Arnold. 10.1002/9780470973394 [DOI]
Hox J. J., Moerbeek M., van de, Schoot R. (2018). Multilevel analysis. Techniques and applications. Taylor & Francis. 10.4324/9781315650982 [DOI]
Kish L. (1965). Sampling techniques. Wiley. [Google Scholar]
Lai M. H. C., Kwok O. (2015). Examining the rule of thumb of not using multilevel modeling: The “design effect smaller than two” rule. Journal of Experimental Education, 83(3), 423-438. 10.1080/00220973.2014.907229 [DOI] [Google Scholar]
Maas C. J., Hox J. J. (2004). The influence of violations of assumptions on multilevel parameter estimates and their standard errors. Computational Statistics & Data Analysis, 46(3), 427-440. 10.1016/j.csda.2003.08.006 [DOI] [Google Scholar]
Maas C., Hox J. (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1(3), 86-92. 10.1027/1614-2241.1.3.86 [DOI] [Google Scholar]
Mortimore P., Sammons P., Stoll L., Lewis D., Ecob R. (1988). School matters, the junior years. Open Books. 10.1525/9780520330375 [DOI]
Muthén B. O. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika, 29, 87-117. 10.2333/bhmk.29.81 [DOI] [Google Scholar]
Muthén B. O., Satorra A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316. 10.2307/271070 [DOI] [Google Scholar]
Muthén L. K., Muthén B. O. (2021). Mplus user’s guide. Muthén & Muthén. [Google Scholar]
Peugh J. L. (2010). A practical guide to multilevel modeling. Journal of School Psychology, 48(1), 85-112. 10.1016/j.jsp.2009.09.002 [DOI] [PubMed] [Google Scholar]
Rabe-Hesketh S., Skrondal A. (2012). Multilevel and longitudinal modeling with Stata. Stata Press. [Google Scholar]
Raudenbush S. W., Bryk A. S. (2002). Hierarchical linear models. Applications and data analysis methods (2nd ed.). Sage. [Google Scholar]
Raykov T. (2011). Intra-class correlation coefficients in hierarchical designs: Evaluation using latent variable modeling. Structural Equation Modeling, 18, 73-90. 10.1080/10705511.2011.534319 [DOI] [Google Scholar]
Raykov T., Marcoulides G. A. (2006). A first course in structural equation modeling. Lawrence Erlbaum. [Google Scholar]
Raykov T., Marcoulides G. A. (2015). Intra-class correlation coefficients in hierarchical design studies with discrete response variables: A note on a direct interval estimation procedure. Educational and Psychological Measurement, 75(6), 1063-1070. 10.1177/0013164414564052 [DOI] [PMC free article] [PubMed] [Google Scholar]
Raykov T., Marcoulides G. A., Akaeze H. (2017). Comparing between and within group variances in a two-level study: A latent variable modeling approach to evaluating their relationship. Educational and Psychological Measurement, 77(2), 351-361. 10.1177/0013164416634166 [DOI] [PMC free article] [PubMed] [Google Scholar]
StataCorp. (2019). Stata Base Reference Manual: Release 16. StataCorp LLC. [Google Scholar]

[bibr1-00131644211019447] Casella G., Berger J. O. (2002). Statistical inference. Wadsworth. [Google Scholar]

[bibr2-00131644211019447] Das Gupta A. (2008). Asymptotic theory of statistics and probability. Springer. [Google Scholar]

[bibr3-00131644211019447] Efron B., Tibshiriani R. J. (1993). An introduction to the bootstrap. Chapman & Hall. 10.1007/978-1-4899-4541-9 [DOI]

[bibr4-00131644211019447] Goldstein H. (2011). Multilevel statistical models. Arnold. 10.1002/9780470973394 [DOI]

[bibr5-00131644211019447] Hox J. J., Moerbeek M., van de, Schoot R. (2018). Multilevel analysis. Techniques and applications. Taylor & Francis. 10.4324/9781315650982 [DOI]

[bibr6-00131644211019447] Kish L. (1965). Sampling techniques. Wiley. [Google Scholar]

[bibr7-00131644211019447] Lai M. H. C., Kwok O. (2015). Examining the rule of thumb of not using multilevel modeling: The “design effect smaller than two” rule. Journal of Experimental Education, 83(3), 423-438. 10.1080/00220973.2014.907229 [DOI] [Google Scholar]

[bibr8-00131644211019447] Maas C. J., Hox J. J. (2004). The influence of violations of assumptions on multilevel parameter estimates and their standard errors. Computational Statistics & Data Analysis, 46(3), 427-440. 10.1016/j.csda.2003.08.006 [DOI] [Google Scholar]

[bibr9-00131644211019447] Maas C., Hox J. (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1(3), 86-92. 10.1027/1614-2241.1.3.86 [DOI] [Google Scholar]

[bibr10-00131644211019447] Mortimore P., Sammons P., Stoll L., Lewis D., Ecob R. (1988). School matters, the junior years. Open Books. 10.1525/9780520330375 [DOI]

[bibr11-00131644211019447] Muthén B. O. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika, 29, 87-117. 10.2333/bhmk.29.81 [DOI] [Google Scholar]

[bibr12-00131644211019447] Muthén B. O., Satorra A. (1995). Complex sample data in structural equation modeling. Sociological Methodology, 25, 267-316. 10.2307/271070 [DOI] [Google Scholar]

[bibr13-00131644211019447] Muthén L. K., Muthén B. O. (2021). Mplus user’s guide. Muthén & Muthén. [Google Scholar]

[bibr14-00131644211019447] Peugh J. L. (2010). A practical guide to multilevel modeling. Journal of School Psychology, 48(1), 85-112. 10.1016/j.jsp.2009.09.002 [DOI] [PubMed] [Google Scholar]

[bibr15-00131644211019447] Rabe-Hesketh S., Skrondal A. (2012). Multilevel and longitudinal modeling with Stata. Stata Press. [Google Scholar]

[bibr16-00131644211019447] Raudenbush S. W., Bryk A. S. (2002). Hierarchical linear models. Applications and data analysis methods (2nd ed.). Sage. [Google Scholar]

[bibr17-00131644211019447] Raykov T. (2011). Intra-class correlation coefficients in hierarchical designs: Evaluation using latent variable modeling. Structural Equation Modeling, 18, 73-90. 10.1080/10705511.2011.534319 [DOI] [Google Scholar]

[bibr18-00131644211019447] Raykov T., Marcoulides G. A. (2006). A first course in structural equation modeling. Lawrence Erlbaum. [Google Scholar]

[bibr19-00131644211019447] Raykov T., Marcoulides G. A. (2015). Intra-class correlation coefficients in hierarchical design studies with discrete response variables: A note on a direct interval estimation procedure. Educational and Psychological Measurement, 75(6), 1063-1070. 10.1177/0013164414564052 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr20-00131644211019447] Raykov T., Marcoulides G. A., Akaeze H. (2017). Comparing between and within group variances in a two-level study: A latent variable modeling approach to evaluating their relationship. Educational and Psychological Measurement, 77(2), 351-361. 10.1177/0013164416634166 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bibr21-00131644211019447] StataCorp. (2019). Stata Base Reference Manual: Release 16. StataCorp LLC. [Google Scholar]

PERMALINK

Design Effect in Multilevel Settings: A Commentary on a Latent Variable Modeling Procedure for Its Evaluation

Tenko Raykov

Christine DiStefano

Abstract

Background, Notation, and Assumptions

Design Effect in Two-Level Studies

Evaluation of the Design Effect Index

Point and Interval Estimation

Design Effect Point and Interval Estimate Interpretation and Their Utility in Empirical Educational and Psychological Research

Application on Data

Conclusion

Acknowledgments

Appendix A

Mplus Source Code for Evaluation of the Design Effect Index in Two-Level Studies

Appendix B

Stata Source Code for Interval Estimation of the DEFF in Two-Level Studies

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Design Effect in Multilevel Settings: A Commentary on a Latent Variable Modeling Procedure for Its Evaluation

Tenko Raykov

Christine DiStefano

Abstract

Background, Notation, and Assumptions

Design Effect in Two-Level Studies

Evaluation of the Design Effect Index

Point and Interval Estimation

Design Effect Point and Interval Estimate Interpretation and Their Utility in Empirical Educational and Psychological Research

Application on Data

Conclusion

Acknowledgments

Appendix A

Mplus Source Code for Evaluation of the Design Effect Index in Two-Level Studies

Appendix B

Stata Source Code for Interval Estimation of the DEFF in Two-Level Studies

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases