Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Sep 1.
Published in final edited form as: Psychol Methods. 2017 Apr 6;23(3):458–479. doi: 10.1037/met0000128

Multilevel Factorial Designs with Experiment-Induced Clustering

Inbal Nahum-Shani 1,*, John J Dziak 2, Linda M Collins 3
PMCID: PMC5630520  NIHMSID: NIHMS854875  PMID: 28383950

Abstract

Factorial experimental designs have many applications in the behavioral sciences. In the context of intervention development, factorial experiments play a critical role in building and optimizing high-quality, multi-component behavioral interventions. One challenge in implementing factorial experiments in the behavioral sciences is that individuals are often clustered in social or administrative units and may be more similar to each other than to individuals in other clusters. This means that data are dependent within clusters. Power planning resources are available for factorial experiments in which the multilevel structure of the data is due to individuals’ membership in groups that existed before experimentation. However, in many cases clusters are generated in the course of the study itself. Such experiment-induced clustering (EIC) requires different data analysis models and power planning resources from those available for multilevel experimental designs in which clusters exist prior to experimentation. Despite the common occurrence of both experimental designs with EIC and factorial designs, a bridge has yet to be built between EIC and factorial designs. Therefore, resources are limited or nonexistent for planning factorial experiments that involve EIC. This article seeks to bridge this gap by extending prior models for EIC, developed for single-factor experiments, to factorial experiments involving various types of EIC. We also offer power formulas to help investigators decide whether a particular experimental design involving EIC is feasible. We demonstrate that factorial experiments can be powerful and feasible even with EIC. We discuss design considerations and directions for future research.

Introduction

Factorial experimental designs have many potential advantages for behavioral scientists. In the context of intervention development, factorial designs play a critical role in building and optimizing multi-component interventions based on empirical evidence. Multi-component interventions are interventions that include several aspects (i.e., components), pertaining to the intervention’s content, type, methods of delivery, and/or implementation strategies (see Collins, Kugler, & Gwadz, 2016). Factorial experiments help investigators screen several candidate intervention components simultaneously and decide which are likely to offer a detectable benefit. With each factor corresponding to one intervention component, the purpose of screening experiments is to identify which intervention components have a substantial influence on the response variable and are, therefore, candidates for selection into a high-quality intervention package, which will be evaluated in a subsequent randomized controlled trial (RCT; Myers, Montgomery, & Anderson-Cook, 2016; Wu & Hamada, 2011). These screening experiments are an important step before confirming the efficacy/effectiveness of the intervention package as a whole via a RCT (Collins, Nahum-Shani, & Almirall, 2014; Collins et al., 2016).

One challenge in implementing factorial experiments in the behavioral sciences is that individuals are often clustered rather than independent: students are clustered in schools, employees are clustered in organizations, patients are clustered in clinics, and so on. These individuals may be more similar to each other than to individuals in other clusters, on average. Statistically, this means that their data may be dependent within clusters. When the experiment involves randomizing these pre-existing clusters to experimental conditions, statistical power is affected not only by the standard considerations that are relevant in any experiment (i.e., effect size, chosen Type I error rate, and sample size (N)), but also by the number of clusters (J), the number of individuals within each cluster (n), and the intraclass correlation (ICC). The ICC reflects the degree of dependence in the response (i.e., outcome) among individuals within clusters. To the extent that ICC is large or J is small, statistical power for an experiment with clustered individuals is expected to be lower compared to an experiment with unclustered individuals. Thus, sample size planning becomes somewhat more complex when an investigator is considering a multilevel experiment.

Dziak, Nahum-Shani, and Collins (2012) examined the feasibility of conducting factorial experiments in a multilevel setting. They showed that in scenarios with a reasonable number of clusters, number of individuals within clusters, and ICC, it is often possible to conduct a factorial experiment with adequate power for addressing scientific questions of primary interest, even when the target population is multilevel. However, their work focused on multifactor experiments in which the multilevel structure of the data is due to clusters that exist prior to experimentation (e.g., schools, clinics, organizations). In such settings, data may be dependent within clusters both at pretest and at posttest assessments.

By contrast, in many cases the multilevel structure of the data is due to clusters that are generated in the course of the study itself, such that while individuals are independent at pretest, their data may be dependent within clusters at posttest. This can occur for reasons that are practical, scientific, or both. Practical reasons for inducing clusters typically include the availability of resources and/or the feasibility of intervention delivery. For example, intervention science experiments commonly include a staff of therapists, each of whom delivers the intervention to a subset of individuals (e.g., Cloitre, Koenen, Cohen, & Han, 2002). Hence, individuals’ outcomes may be correlated due to shared provider effects. At other times, there are scientific or therapeutic reasons to induce clusters. Interventions are sometimes designed to be delivered in group settings, in order to facilitate therapeutic group processes and capitalize on social reinforcers such as social support, sense of belonging, cohesiveness, and social accountability (Schulz, Cowan, & Cowan, 2006). In these cases, the outcome for treated individuals may be correlated due to common experiences, informal processes of socialization, and group dynamics.

Experimental designs in which clusters are generated as part of the study are known as “individually randomized group treatment (IRGT) trials” (Pals, Murray, Alfano, Shadish, & Hannan, 2008; Candel & Van Breukelen, 2009), or “clinical trials with clustering effects due to treatment” (Roberts & Roberts, 2005). In this article we use the broader term experiment-induced clustering (EIC) to refer generally to designs in which one or more of the experimental conditions involve generating dependence between the units of analysis (i.e., individuals).

Many experiments in the behavioral sciences involve EIC. In applied and social psychology, these experiments are often used to study group dynamics and intergroup relations; among these, factorial designs are highly prevalent, although usually with only two (e.g., Kramer, Fleming, & Mannis, 2001; Nye, 2002; Valacich, Wheeler, Mennecke, & Wachter, 1995) or three factors (e.g., Erez & Arad, 1986; Derlega, Winstead, Wong, & Hunter, 1985; Karakowsky & McBey, 2001). In the context of intervention development, many experiments involve generating dependence between individuals, via the assignment of individuals to receive treatment in groups or from providers. Although only a few of these experiments have involved multiple factors (e.g., Charlesworth et al., 2011; Kasari, Rotheram-Fuller, Locke, & Gulsrud, 2012; Nackers et al., 2015; Wilson et al., 2015), interest in factorial designs with EIC is growing in the context of behavioral intervention development. This trend is facilitated by the increased use of factorial designs to screen multiple intervention components (e.g., Cook et al., 2016; Pellegrini, Hoffman, Collins, & Spring, 2014, 2015) and by technological advances that not only make the integration of group-based support (e.g., online group therapy; Chebli, Blaszczynski, & Gainsbury, 2016; Gainsbury & Blaszczynski, 2011) more feasible, but also enhance the experimenter’s control over the combination of intervention components delivered (Crespi, 2016; Dallery, Riley & Nahum-Shani, 2015; Peters, de Bruin, & Crutzen, 2015).

Experimental designs involving EIC require data analysis models and power planning resources that differ from those available for multilevel experimental designs in which clusters exist prior to experimentation. Existing resources for planning experimental designs that involve EIC are limited to the standard two-arm (i.e., a single-factor case) RCTs (Candel & Van Breukelen, 2009; Moerbeek & Wong, 2008; Tokola, Larocque, Nevalainen, & Oja, 2011; Roberts & Roberts, 2005). Factorial designs with EIC have received little methodological attention, and their power properties have not specifically been explored. Given the common occurrence of experimental designs with EIC, as well as the growing interest in (e.g., Baker, Gustafson, & Shah, 2014; Czajkowski et al., 2015; Jacobs & Graham, 2016) and use of (e.g., Cook et al., 2016; Howard & Jacobs, 2016; Pellegrini et al., 2014; 2015) factorial designs to inform the construction of high-quality multi-component behavioral interventions, it is critical to close this gap and build a bridge between these two experimental design features.

The Present Article

The present article has three purposes. The first is to extend prior models for EIC, developed for single-factor experiments, to factorial experiments involving various types of EIC. The second is to demonstrate that factorial experiments with EIC can be powerful and feasible. The third is to offer power formulas and Monte Carlo simulation results to help investigators decide whether a particular experimental design involving EIC is feasible to implement in their situation. The present article considers only models with normally distributed responses. However, this work can serve as the basis for future extension to generalized models with other response types (e.g., binary, count).

We begin by briefly reviewing complete and fractional factorial designs. We then discuss two common situations in which there is EIC: full EIC, in which all experimental conditions involve the creation of clusters, and partial EIC, in which only a subset of the experimental conditions involves the creation of clusters. We discuss modeling for a single-factor experiment and then for factorial experiments. To simplify the discussion, we assume that all factors in an experiment are dichotomous, and that the outcome is continuous and measured at the individual level. Finally, we discuss power-planning resources for factorial designs with EIC, using simulation studies to evaluate these resources and explore design elements that are likely to affect the power of factorial designs with EIC.

Factorial and Fractional Factorial Designs

Consider the following hypothetical example. Suppose an investigator wishes to develop an intervention program to promote weight loss among overweight individuals. There are three factors of theoretical interest to the investigator, each with two levels, which could be labeled On (experimental) and Off (control) for convenience (the levels could instead be “low” and “high” or any other dichotomy). The factors are whether or not the individual is offered (1) weekly videos that provide instructional and motivational training (Video), (2) encouraging text messages (Texts), and (3) meal replacement (Meals). We assume the investigator is interested in screening these intervention components in order to construct an efficient and high-quality intervention package. In particular, the investigator would like to address three scientific questions concerning the selection of these intervention components: (1) would the targeted outcome be improved by including weekly videos in the intervention? (2) would it be improved by including text messages? and (3) would it be improved by including meal replacement?

To simultaneously address all three scientific questions, the investigator can use a complete factorial design. With K dichotomous factors, a complete factorial design requires 2K conditions. In the current example, this would be a 2 × 2 × 2 (or 23) factorial design, which involves eight experimental conditions. Table 1 shows the conditions of this hypothetical factorial design. This factorial design is balanced at the condition level, meaning that for each factor, four of the eight conditions have the Off level, and four have the On level. This property makes the experiment more efficient, in other words, provides greater power for testing the effects of interest, relative to other design alternatives (see Collins, Dziak, & Li, 2009 for more details). Here, we use the term level when referring to the value of one of the independent factors (e.g., Video = On), and the term condition when referring to a combination of levels of all the factors (e.g., Video = Off, Texts = On, Meals = Off). We use the term main effect when referring to the difference between levels of a particular factor, averaging across all experimental conditions (e.g., the main effect of Video is the average difference in response between the four conditions with Video = On and the four with Video = Off; see Myers & Well, 2003).

Table 1.

Experimental Conditions in the Hypothetical 2×2×2 Factorial Design

Experimental condition Brief description V (Video) T (Texts) M (Meals)
1 Untreated Off Off Off
2 M only Off Off On
3 T only Off On Off
4 M and T Off On On
5 V only On Off Off
6 V and M On Off On
7 V and T On On Off
8 All three On On On

Table 2 shows the effect-coded design matrix, coding On = +1, Off = −1. The use of effect coding (−1 and +1) instead of dummy coding (0 and 1) is highly recommended when analyzing data from factorial screening experiments (Collins, Dziak, Kugler & Trail, 2014). Effect coding conveniently enables the significance tests of the regression coefficients to be directly interpreted as significance tests of main effects and interactions in an ANOVA framework (see Chakraborty, Collins, Strecher & Murphy, 2009; Myers & Well, 2003).

Table 2.

Effect Coding for a 2×2×2 Factorial Design in the Weight Loss Program Example

Condition Brief description Main Effects Interactions

V (Video) T (Texts) M (Meals) V×T V×M T×M V×T×M
Conditions in Complete Factorial
1 Untreated −1 −1 −1 +1 +1 +1 −1
2 M only −1 −1 +1 +1 −1 −1 +1
3 T only −1 +1 −1 −1 +1 −1 +1
4 M and T −1 +1 +1 −1 −1 +1 −1
5 V only +1 −1 −1 −1 −1 +1 +1
6 V and M +1 −1 +1 −1 +1 −1 −1
7 V and T +1 +1 −1 +1 −1 −1 −1
8 All three +1 +1 +1 +1 +1 +1 +1

Conditions Retained in a Fractional Factorial
2 M only −1 −1 +1 +1 −1 −1 +1
3 T only −1 +1 −1 −1 +1 −1 +1
5 V only +1 −1 −1 −1 −1 +1 +1
8 All three +1 +1 +1 +1 +1 +1 +1

As the number of factors in a factorial experiment increases, the number of experimental conditions increases rapidly, although the total sample size needed to maintain power may not increase appreciably. With a large number of factors a complete factorial experiment may not be feasible, due to the expense and complexity of implementing so many experimental conditions. In these cases, the investigator might consider a fractional factorial design as an alternative. Fractional factorial designs offer many of the advantages of a complete factorial design, while requiring considerably fewer experimental conditions (Kirk, 2003; Wu & Hamada, 2011). These designs are a variation upon factorial designs, involving the use of a subset of the experimental conditions of a complete factorial design, carefully chosen to preserve key statistical properties. Consider our weight loss example. One possible fractional factorial design would consist of only half of the conditions in the complete factorial design, represented by rows 2, 3, 5, and 8 from Table 2. This subset preserves the property that all effects are represented by a balanced number of conditions (e.g., each factor is −1 for half of the rows and +1 for the other half). Hence, the main effects can be efficiently tested without implementing all eight conditions. The 2, 3, 5, 8 design would be described as a 23–1 fractional factorial, indicating that this particular fractional factorial design is 2−1 = 1/2 fraction of the complete 23 factorial; for this reason it is also called a half-fraction factorial. The advantages of complete and fractional factorial designs for studying multi-component behavioral interventions are detailed in Collins et al. (2009).

Full and Partial EIC

The factorial experiment described above for the weight loss intervention example is a straightforward randomized factorial design in which there is no EIC. When a factorial design involves EIC, dependence in data is generated within clusters of individuals. In this context we distinguish between factorial designs with full EIC and factorial designs with partial EIC.

Example: Factorial Designs with Full EIC

Consider the hypothetical factorial experiment described above. Suppose the two levels of the Video factor both involve a treatment that is delivered in groups of about five individuals each. These are experiment-induced clusters. Each group meets weekly and involves discussions and social support facilitated by a trained practitioner. The videos are offered only to individuals randomized to the On level of Video, who view and discuss the videos at their weekly group meetings. Groups receiving the Off level still meet and discuss their experiences, but do not view the videos. In this design, individuals will be randomly assigned not only to a level of Video, but also to a group within their level of Video. To avoid contamination of experimental factors and/or perceptions of inequalities within the group, the other two factors (i.e., Texts and Meals) must also be held constant within each group. This can be accomplished by first randomly assigning individuals to groups, and then randomly assigning each group to the levels of each of the three factors. Thus, each individual is nested within a cluster (group), and each cluster will belong to one of the eight conditions in Table 3. In this scenario, the individual-level outcomes (e.g., weight measurements) are independent at pretest, but no longer independent at posttest, because group members potentially influence each other, and all will be influenced by the shared practitioner. For simplicity, we assume that it is not necessary to model a practitioner effect in addition to a group effect; we return to this in the Discussion section.

Table 3.

Conditions in a Factorial Experiment with Full EIC and a Factorial Experiment with Partial EIC

Condition Full EIC Partial EIC Factors
Clustered? X1 X2 X3
1 Yes No Off Off Off
2 Yes No Off Off On
3 Yes No Off On Off
4 Yes No Off On On
5 Yes Yes On Off Off
6 Yes Yes On Off On
7 Yes Yes On On Off
8 Yes Yes On On On

Notes. The table assumes that there are a total of three factors arranged in a complete factorial design. In the weight loss examples given in the text, X2, and X3 represent the Text and Meals factors. X1 represents the Video factor in the full-EIC factorial example and the Support factor in the partial EIC factorial example.

This situation differs from a typical cluster-randomized factorial experiment (as described by Dziak et al. 2012), in which the clusters are units that exist prior to experimentation, such as schools, clinics, or workplaces. In a cluster-randomized factorial experiment, the response is expected to have a positive intraclass correlation (ICC) both at pretest (i.e., prior to the intervention) and at posttest (following the intervention). However, in the current example, the clusters are created as part of the study by random assignment. Thus, the expected pretest ICC is zero because individuals have no shared experience prior to the intervention, whereas the expected posttest ICC is positive because individuals from the same group are likely to have shared experiences during the study. Here, we use factorial designs with full EIC to label factorial experiments in which clusters are generated for all individuals in the course of the study. The models and power formulas from Dziak et al. (2012) do not apply to these studies, because Dziak et al. assumed that a positive ICC exists at both pretest and posttest.

Example: Factorial Designs with Partial EIC

Consider yet another variation of the weight loss intervention example, in which all individuals receive the weekly videos, so that Video is no longer a factor in the design. Instead, the investigator considers a different factor, Support, aiming to assess the efficacy of group support (i.e., Support = On) vs. no group support (i.e., Support = Off). Suppose that only individuals randomized to the On level of Support are assigned to groups of about five individuals each, who meet weekly to watch and discuss the videos. Those randomly assigned to the Off level of Support are simply given the videos and asked to view them at their own convenience. The other two factors (Texts and Meals) remain the same. As before, each individual in a given support group is given the same levels of all of the assigned experimental factors (i.e., the same levels of Text and Meals) as his/her fellow group members. In this scenario, the individuals in the On level of Support are clustered, whereas those in the Off level remain independent. The multilevel data generated from such studies is known as partially nested (Bauer, Sterba, & Hallfors, 2008). We use factorial designs with partial EIC to label factorial experiments in which clusters are generated by experimentation for only a subset of the individuals. Once again, these studies require new regression models and power formulas.

Summary of Examples

In both full and partial EIC it is necessary to take cluster-level variation into account when planning the sample size for powering the experiment and when analyzing the resulting data. However, there is an important difference. In full EIC the multilevel structure of the data is the same for all experimental conditions; in partial EIC it may not be (Baldwin, Bauer, Stice, & Rohde, 2011; Moerbeek & Wong, 2008). Because factorials with full EIC and factorials with partial EIC may induce different structures of variation in the outcome, power planning and data analysis should be conducted in a way that appropriately reflects each approach.

In the following section we discuss modeling issues separately for factorial designs with full EIC and factorial designs with partial EIC. For clarity, we begin with the simple case of a single-factor experiment (i.e., a RCT), and continue with factorial experiments. Throughout, for consistency, we use the multilevel modeling approach, rather than the ANOVA framework for analyzing multilevel data. The former offers more flexibility in that it does not require each cluster to contain exactly the same number of individuals. Readers more familiar with the latter can rely on the extant literature, which explains the link between ANOVA and multilevel models in detail (e.g., Kenny, Bolger & Kashy, 2002; Hox & Kreft, 1994).

Modeling for a Single-Factor EIC

A Single-Factor Experiment with Full EIC

As noted earlier, when clusters are generated for all individuals, cluster-level variance might be non-zero for all individuals, but only at posttest. Therefore, if there is no pretest, it is reasonable to simply treat this design as a between-clusters design, as in Model 2 of Dziak et al. (2012) or standard references on cluster-randomized experiments (e.g., Donner & Klar, 2000; Murray, 1998; Raudenbush, 1997). Specifically, let X denote the treatment variable, where 0 represents the control condition and 1 represents the experimental condition, and let Y denote the outcome of interest at posttest, which might correspond to the individual’s weight at a 6-month follow-up. With Level 1 representing the individual and Level 2 representing the cluster, the response Yij for individual i in cluster j can be modeled as

Level1:Yij=β0j+eijLevel2:β0j=γ00+γ01Xj+ujCombined:Yij=γ00+γ01Xj+uj+eij, (1)

where eijN(0,σe2) and uj~N(0,τu2), where N() represents the normal distribution with the mean and variance specified. Because cluster-randomized designs have been described in detail in the references mentioned above, we do not elaborate on this scenario.

Model 1 can be extended to include individual-level pretest response Pij as a covariate, where Pij might correspond to the individual’s weight at baseline. Other covariates can be included in a similar manner:

Level1:Yij=β0j+β1jPij+eijLevel2:β0j=γ00+γ01Xj+uj β1j=γ10Combined:Yij=γ00+γ01Xj+γ10Pij+uj+eij, (2)

where, as before, eij~N(0,σe2), and uj~N(0,τu2). Here, σe2 describes the random error after adjusting for the pretest and cluster membership, and uj represents the between-cluster variation. Throughout, for simplicity, we work with the assumptions that the effect of pretest does not vary between clusters and that the pretest does not interact with treatment (e.g., Model 30 in Bauer et al., 2008). An interesting feature of Model 2 is that it does not require assuming zero ICC at pretest or no systematic differences between the control and the experimental conditions in pretest responses. Hence, it remains appropriate even when individuals are not randomly assigned to clusters and conditions. An alternative to Model 2 might be a repeated-measures model for changes in the response between pretest and posttest. Such a model will include three levels for observations time (Level 1), individual (Level-2) and cluster (Level-3). Throughout, we focus solely on covariate-adjusted models because they require fewer levels of nesting and hence are computationally simpler than the corresponding repeated-measures models. However, either approach might be deemed by investigators to be more relevant to the specific process they are studying. Therefore, as a supplement we provide a technical report (Dziak & Nahum-Shani, 2016) containing models and power formulas for the repeated-measures approach as well.

A Single-Factor Experiment with Partial EIC

Where there is partial EIC, it is necessary to take cluster-level variation into account for individuals in the experimental condition, but not for those in the control condition, because the latter are not clustered. However, following Roberts and Roberts (2005) and Bauer et al. (2008), and in order to conveniently use multilevel notation for the entire sample, we treat individuals in both conditions as clustered, with control participants comprising clusters of size one. This requires setting up the model in a special way. We continue to dummy-code the levels of X as 0 (for control condition) and 1 (for experimental condition), and assume for the moment that there are no pretest measurements. With Levels 1 and 2 representing the individual and the cluster, respectively, the response Yij for individual i in cluster j can be modeled as

Level1:Yij=β0j+eijLevel2:β0j=γ00+(γ01+uj)XjCombined:Yij=γ00+(γ01+uj)Xj+eij, (3)

where eij~N(0,σe2) and uj~N(0,τu2). The control-condition mean response is γ00 and the experimental-condition mean response is γ00 + γ01 thus γ01 is the overall treatment effect. An interesting feature of Model 3 is that the Level 2 slope is allowed to vary across clusters, while the Level 2 intercept is fixed. This feature allows a cluster-level variation in mean response, but only for those individuals who were randomized to the experimental condition (i.e., who have Xj = 1, not Xj = 0). If a random β0j had been present for individuals with Xj = 0 (control condition) also, then they too would have a cluster-level variation in mean response. However, because individuals in the control condition are, in fact, not clustered (except trivially in clusters of size one), such cluster-level variation cannot be properly estimated or interpreted in their case. To avoid this difficulty, Model 3 is specified in a way that eliminates the cluster-level random component for individuals in the control condition. This allows the response to be treated as cluster-correlated for individuals in the experimental condition (Yij = γ00 + γ01 + uj + eij) and as independent for individuals in the control condition (Yij00+eij).

Model 3 can be extended further to allow the variance of the Level 1 residuals σe2 to differ between the unclustered control and the clustered experimental conditions (see Roberts & Roberts, 2005; and Bauer et al., 2008 for more details). It would be somewhat unreasonable to expect the variance of the Level 1 residuals to be the same for clustered and unclustered individuals (see Bauer et al., 2008), and therefore, for the remainder of this paper, we allow them to differ whenever partial EIC is being used. We denote the Level 1 error variance for unclustered individuals as σe02 and for clustered individuals as σe12.

As with Model 2, individual-level pretest Pij (as well as other covariates) can be added to Model 3 as follows:

Yij=γ00+(γ01+uj)Xj+γ10Pij+eij (4)

where eij~N(0,σe2) and uj~N(0,τu2).

Modeling for a Multiple-Factor Case

The models discussed so far have assumed only a single randomized independent variable, but they can all be extended to allow multiple factors in a factorial experiment. The models developed in this section can accommodate either factorial designs with full EIC or factorial designs with partial EIC. However, in order to simplify the description, additional assumptions are made about how the clustering is determined in each case. Specifically, in the case of full EIC, we assume that each individual is assigned to only one experiment-induced cluster, and that to minimize the risk of contamination (see Slymen & Howell, 1997) all factors X1,…, XK, are assigned in such a way that all members of a cluster will share the same levels. In the case of partial EIC, we assume that only one of the factors, denoted X1, induces clustering, in the sense that individuals assigned to the +1 level of X1 are clustered and those assigned to the −1 level of X1 are not. The other factors X2,…,XK are assumed to have levels that can be delivered both to clustered and unclustered individuals. Hence, in this setting, each individual might belong (or not) to only one experiment induced cluster. As before, to minimize the risk of contamination, we assume that the other factors X2,…,XK are assigned in such a way that all members of a cluster will share the same levels. Factorial designs in which an individual may belong to several different clusters are beyond the scope of this article.

Earlier we explained that in the case of partial EIC it is convenient to dummy-code the X variable for group treatment (i.e., to represent Off and On levels as 0 and 1 respectively) so that the random cluster effect will automatically become zero for individuals in the unclustered condition. However, the dummy-coding approach is disadvantageous when multiple factors are under investigation, because the resulting coefficients in a linear model would not be independent in their distributions or interpretations and would not correspond to main effects or interactions in the usual ANOVA sense (see Chakraborty et al., 2009). In this case, the test of main effects and interactions has to be done by using linear combinations of model coefficients, which can be a source of confusion and inconvenience. Alternatively, as noted by Dziak et al. (2012), effect coding (−1 and +1) can be used to make the interpretation of the regression coefficients in this setting more convenient. As we explain below and in Appendix A, when effect coding is used, scientists can directly interpret each coefficient in the linear model for multiple factors as an independent test of a main effect or an interaction, without having to test more complicated linear combinations of model parameters. Hence, to be able to both conveniently model the random cluster effect only for individuals in the clustered condition and interpret the effects of multiple factors in a straightforward manner, we employ an approach that combines both a dummy-coded and an effect-coded clustering factor. The justification for this seemingly unusual approach is given below and elaborated upon in Appendix A.

Continuing with our three-factor example for simplicity, let X1, X2 and X3 be the effect-coded representations (−1 for Off and +1 for On) of the first, second, and third factors. Also, let C be an indicator of whether the individual has been assigned to a nontrivial cluster (C = 1) or to a trivial cluster of size 1 (C = 0). In the case of partial EIC, C is a dummy-coded version of the assumed cluster-generating factor X1 that is, C = 1 if X1 = +1, and C = 0 if X1 = −1. In the case of full EIC, C = 1 for everyone, since all individuals are assigned to nontrivial clusters. Thus, we integrate Models 1 and 3 and extend them to multiple factors as follows:

Level1:Yij=β0j+eijLevel2:β0j=γ00+γ01X1j+γ02X2j+γ03X3j+γ04X1jX2j+γ05X1jX3j+γ06X2jX3j+γ07X1jX2jX3j+ujCj.

The corresponding mixed model is

Yij=γ00+γ01X1j+γ02X2j+γ03X3j+γ04X1jX2j+γ05X1jX3j+γ06X2jX3j+γ07X1jX2jX3j+ujCj+eij, (5)

where eij~N(0,σe2), and uj~N(0,τu2).

As noted earlier, Model 5 can be used to describe factorial experiments with either full or partial EIC. If Cj = 1 for all individuals, it represents a factorial design with full EIC, and uj becomes a familiar additive random effect (random intercept term) representing cluster-level variability. If Cj = 1 for only those having X1= +1 and Cj = 0 otherwise, then there is partial EIC, and the cluster-level random effect uj becomes relevant only for a subset of the individuals.

Even though the variables Cj and X1j represent the same factor, they are not confounded in Model 5. This is because X1j is used to model the fixed effect of that factor, and Cj is used to model the cluster-specific random effects of that factor. One way to clarify this distinction is to interpret γ01X1j as the overall average effect of receiving group treatment, and to interpret ujCj − γ01X1j as the cluster-specific random deviation from that average (i.e., how well or poorly a particular group functioned). Hence, inferences for γ01 and τu2 are not confounded even though the values of Cj and X1j are collinear. This approach is similar to the one employed in Models 3 and 4, as well as the models employed by Roberts and Roberts (2005) and Bauer et al. (2008), where X1j is effectively used twice, once to model the fixed effect of group treatment (γ01X1j) and another time to model the random effect of group treatment (ujX1j).

As noted earlier, because effect coding is used for the fixed effects part of the model, model coefficients can be interpreted as tests of main effects in the usual ANOVA sense. For any given cell of the 2×2×2 design, Cj is a constant and uj is a random variable with expectation zero. Hence, the expectation of the product Cjuj is zero and the mean of any cell in the design is given by E(Yij|X1j,X2j,X3j).=γ00+γ01X1j02X2j+γ03X3j04X1jX2j05X1jX3j06X2jX3j07X1jX2jX3j+0+0. In other words, the structure of the clustering does not affect the interpretation of the linear regression coefficients.

For example, based on Model 5 the main effect of X1 is E(Yij|X1j = +1) − E(Yij|X1j = −1) = (+γ01) − (−γ01) = 2γ01. This is because the other effects, including main effects and interactions, average out when contrasting the average of the four means of the cells with X1 = 1 with the four means of the cells with X1 = −1. More generally, a test of the main effect of any factor k is a test of the difference in average expected responses between the two levels of Xk (defining each level’s average by averaging the means of the cells composing the level), namely by (+γk) − (−γk) = 2γk. Because effect coding is used, all coefficients other than γk cancel out when calculating the main effect of factor k.

Similarly, the interaction of X1 and X2, which represents the extent to which the difference in the response between the two levels of X1 varies across the two levels of X2, averaging across X3, is [E(Yij|X1j = +1,X2j = +1) − E(Yij|X1j = −1,X2j = +1)] − [E(Yij|X1j = + 1,X2j = −1) − E(Yij|X1j = −1,X2j = −1)] = (γ01 + γ02 + γ04) − (−γ01 + γ02γ04) − (γ01γ02γ04) + (−γ01γ02γ04) = 4γ04. More generally, a test of the interaction between any two factors Xa and Xb is a test of the difference in averaged simple effects of Xa between levels of Xb or vice versa, namely 4γa,b. Each simple effect is defined by averaging across means of cells composing a specific combination of levels of Xa and Xb, and calculating the difference in averages between the On and Off levels of one factor (Xa) for each specific level of the other factor (Xb). Researchers sometimes define the interaction effect as half this quantity (i.e., 2γa,b) to make its scale comparable to the main effects. Regardless, a test of whether γa,b = 0 is a test of whether factors Xa and Xa interact in the ANOVA sense, averaging over levels of the other factors under investigation.

Note that Model 5 includes all interactions, but it could alternatively include only lower-order ones (e.g., two-way but not three-way, by constraining γ07 = 0). Because of the effect coding, the lower-order coefficients would have roughly the same interpretation, regardless of whether or not the higher-order interactions were included (see more details in Myers & Well 2003, and Dziak et al., 2012). This is because we assume either that cell sizes are balanced, or that a weighted average across cells is used in defining the main effects and interactions.

In summary, despite the unusual form of the error structure, the fixed effects coefficients have the same interpretations as in the linear model representation of classic ANOVA without clustering (see Myers & Well, 2003) and as in the clustered factorial designs previously described in the literature (e.g., Dziak et al., 2012). This is explained further in Appendix A.

Model 5 can also be extended to include a pretest or other covariates as follows:

Yij=γ00+γ10Pij+γ01X1j+γ02X2j+γ03X3j+γ04X1jX2j+γ05X1jX3j+γ06X2jX3j+γ07X1jX2jX3j+ujCj+eij, (6)

where eij~N(0,σe2) and uj~N(0,τu2). As noted earlier, the error variance in Model 6 (as well as in Models 3–5) can be allowed to differ between unclustered (σe02) and clustered (σe12) individuals. Further, one could extend Model 6 (as well as Models 2 and 4) to allow the effect of pretest to vary between clusters or allow the pretest to interact with treatment. Note that if Cj is a dummy-coded version of X1j (i.e., the partial EIC setting), then Model 6 becomes a multiple-factor generalization of Model 4; if Cj is 1 for all individuals (i.e., the full EIC setting), then Model 6 becomes a multiple-factor generalization of Model 2. Thus, Models 5 and 6 can be used with factorial experiments having either full or partial EIC. In the following section we propose formulas for calculating power for these models.

Estimating Power

Because the test for a main effect or an interaction in any of the designs considered in this paper can be viewed as a significance test for a coefficient in a linear mixed model, it is reasonable to estimate the power for this test using the noncentral F distribution, as in Dziak et al. (2012). That is, we assume that the power can be approximated by the probability that a noncentral F1,v variate, having noncentrality parameter λ, exceeds the critical value κ of the test to be performed. Here, κ is the value such that a central F1,v variate has a probability α of exceeding κ under H0. The λ parameter represents the amount of evidence against H0 that the sample is expected to provide; it is calculated as

λ=γ2Var(γ^) (7)

where γ is the regression parameter in question. The numerator degrees of freedom of the test of a given main effect or interaction, assuming dichotomous factors, is 1, because a single coefficient is being set to zero under H0. A good estimate for the denominator degrees of freedom v in the full-EIC design is the number of clusters minus the number of regression coefficients to be estimated. This is also a good conservative initial estimate of the degrees of freedom v when planning power for partial EIC designs. However, for actual data analysis in partial EIC designs v should be empirically estimated using Satterthwaite’s approximation (Roberts & Roberts, 2005).

The formula for Var(γ^) depends on whether a pretest is present and whether full or partial EIC is being used. The formulas for full EIC with or without a pretest are presented in Table 4, and the formulas for partial EIC with or without a pretest are presented in Table 5. These formulas apply to both main effects and interactions. The derivations for these formulas are described in Appendix B. The variance formulas in Tables 4 and 5 are presented in two forms: one expressed directly in terms of the different variance components and one re-expressed in terms of the posttest variance, the pretest-posttest correlation, and the posttest ICC. The latter form may be easier to use in sample size planning because the posttest variance cancels out in practice if standardized effect sizes are being used, and plausible values for the correlations can be found in the literature.

Table 4.

Sampling Variances for Regression Coefficients in Factorial Designs with Full EIC

Pretest Variance of regression coefficient for effect
No
(Model 5, Cj ≡ 1)
Var(γ^)=τu2J+σe2Jn=σY2(ρY(1ρY)J+1Jn)
where σY2=σe2
Yes
(Model 6, Cj ≡ 1)
Var(γ^)=τu2J+σe2Jn=σY2(ρY(1ρY)J+1ρpre,post2Jn)
where σY2=γP2σP2+σe2

Note. The “variance of regression coefficient for effect” column shows how to calculate Var(γ^) for expressions (7), (8), or (9) from the variance components in the context of each model. J represents the total number of clusters summed across all conditions, and n is the number of members per cluster. In Model 5, σe2 is the individual-level error variance, and τu2 is the cluster-level variance. In Model 6, γP is the pretest effect on posttest; σP2 is the variance of the pretest; σe2 is the individual-level error variance after adjusting for pretest; and τu2 is the cluster-level variance in posttest after adjusting for pretest. In each case, σY2 represents the total posttest variance after adjusting for any cluster or treatment effects but not adjusting for pretest; ρY is the ICC at posttest; and ρpre,post is the pretest-posttest correlation after adjusting for treatment and cluster membership (i.e., the within-person ICC). Note that γP is written as γ10 in Model 6, but it seemed clearer to write it as γP here to emphasize that it represents the pretest and not the first factor.

Table 5.

Sampling Variances for Regression Coefficients in Factorial Designs with Partial EIC

Pretest Variance of regression coefficient for effect
No
(Model 5; Cj = 1 if X1 = 1, Cj = 0 if X1 = −1)
Var(γ^)=τu24J1+σe024J0+σe124J1n=σY2(ρY4(1ρY)J1+14J1n+14J0)
if σe02=σe12, where σY2=σe2
Yes
(Model 6; Cj = 1 if X1 = 1, Cj = 0 if X1 = −1)
Var(γ^)=τu24J1+σe024J0+σe124J1n=σY2(ρY4(1ρY)J1+1ρpre,post24J1n+1ρpre,post24J0)
if σe02=σe12, where σY2=γP2σP2+σe2

Note: The “variance of regression coefficient for effect” column shows how to calculate Var(γ^) for expressions (7), (8), or (9) from the variance components in the context of each model. J1, J0 and n represent the number of clusters, number of unclustered individuals, and number of members per cluster respectively, so that J1n is the number of clustered individuals. In Model 5, σe02 and σe12 are the individual-level variances for unclustered and clustered individuals, both designated as σe2 if they are assumed equal; and τu2 is the cluster-level variance. In Model 6, γP is the pretest effect on posttest; σP2 is the variance of pretest; σe02 and σe12 are the individual-level error variances after adjusting for pretest; and τu2 is the cluster-level variance for the clustered conditions after adjusting for pretest. In each case, σu2 represents the total posttest variance after adjusting for any cluster or treatment effects but not adjusting for pretest; ρY is the posttest ICC for clustered individuals, adjusting for treatment; and ρpre,post represents the pretest-posttest correlation after adjusting for treatment for unclustered individuals, and the pretest-posttest correlation after adjusting for treatment and cluster membership for clustered individuals. Note that γP is written as γ10 in Model 6, but it seemed clearer to write it as γP here to emphasize that it represents the pretest and not the first factor.

The size of the coefficient γ in Expression 7 expresses the magnitude of its corresponding main effect or interaction. As noted earlier, in Models 5 and 6, the main effect of any factor Xk is quantified by ME = 2γk. Thus, if for power planning purposes the minimum detectable main effect of Xk is desired to be a quantity ME, then set γk = ME/2 in the numerator of Expression 7. The expression can also be restated in terms of Cohen’s standardized difference d = ME/σY, where σY is the posttest standard deviation, adjusting for any cluster and treatment effects that may exist but not adjusting for pretest. Specifically,

λ=σY2d24Var(γ^). (8)

Similarly, an interaction can be quantified by 4γa,b, with γa,b representing the coefficient for the interaction between Xa and Xb (e.g., 4γ04 in the case of the interaction between X1 and X2 in Models 5 and 6). Hence, if for power planning purposes the minimum detectable interaction is desired to be some quantity denoted q, then set γa,b = q/4 in the numerator of Expression 7 (see Dziak et al., 2012, Appendix A). Alternatively, if the interaction is defined as half the difference in simple effects, it is represented by 2γa,b, and one would set γa,b = q/2. Recall that these equalities hold regardless of the random effects structure because the population mean for each of the random effects is always zero for each cell in the design.

Finally, under the usual assumption of asymptotic normality of the maximum likelihood estimate, Var(γ^) can be used to estimate the minimum detectable effect γMD. Supposing without loss of generality that γ > 0, assuming for simplicity that Var(γ^) is known instead of estimated (as is usual in power calculations), and considering H0 rejections only in the correct direction, γ^ would be judged statistically significant if γ^/Var(γ^)>z1α2, where z1α21.96 for an α =.05 two-sided test. Then γMD for a desired power 1 − β (e.g., .80) is the value of γ such that

P((γ^γ)Var(γ^)>z1α2γVar(γ^))=1β,

where (γ^γ)/Var(γ^) has a standard normal distribution under H1. So z1α2γMDVar(γ^)=z1β,where z1−β ≈.8414 for 1−β=.80. Then

γMD=Var(γ^)(z1α2z1β). (9)

The minimum detectable difference and minimum detectable scaled difference (e.g., Oakes and Feldman, 2001) between levels of the factor of interest would then be 2γMD and 2γMDY, respectively. That is, 2γMD/σY would be the minimum detectable Cohen’s d.

In the formulas for factorial designs with full EIC (Table 4), it is assumed that all experiment-induced clusters have the same size n. Additionally, balance is assumed on all of the factors. This means that each cell defined by the combinations of X1, X2, …XK has the same number of clusters, each containing the same number of individuals. In the formulas for factorial designs with partial EIC (Table 5), it is assumed that all experiment-induced clusters have the same size n. Additionally, balance is assumed on all of the factors, except for X1 Thus, it is assumed that each cell defined by the combinations of X2,…,XK has the same number of clusters and the same number of individuals. Of course, such balance will not hold exactly in practice, but in the simulation experiments presented later we found that the power formulas still perform very well under small imbalances.

The formulas for partial EIC (Table 5) do not assume that the same number of individuals have X1 = +1 as X1 = −1 (i.e., that the same number of individuals are clustered as are unclustered), because that might not be feasible or even desirable. Cells with clustering are subject to cluster-level variance in addition to their individual-level variance, so the means of cells with X1 = +1 might be estimated with higher design effects. Note that the design effect is defined as the ratio of the sampling variance in the clustered population to the corresponding sampling variance obtained if individuals were independent; the larger the design effect, the larger the sample size required for achieving adequate power (see Dziak et al., 2012 for a detailed discussion). Therefore, perhaps more total individuals should be assigned to the X1 = + 1 cells to compensate for the unequal amount of estimation error. Of course, it is also not assumed that the number of clusters in the X1 = +1 conditions is the same as the number of individuals in the X1 = −1 conditions. A reasonable conjecture, which we test in Simulation Study 2, is that the optimal allocation would be somewhere between these two extremes.

The formulas presented here are informally derived in Appendix B. Further evidence that they are valid can be obtained via simulation. In the following section we report the results of simulation studies illustrating the performance of the power formulas in factorial designs with full EIC (Simulation Experiment 1) or partial EIC (Simulation Experiment 2), respectively.

Monte Carlo Simulation Studies

We conducted two simulation studies to address three primary questions about the feasibility of factorial designs with EIC. First, are the null hypothesis tests for the main effects and interactions valid (i.e., is the Type I error rate no higher than α when H0 is true)? Second, is there acceptable statistical power for main effects and interactions in the context of a screening experiment with a realistic number of individuals? Third, do the proposed power formulas give reasonably accurate estimates of the power over a range of situations (i.e., for either main effects or interactions, and in complete or fractional factorial designs with EIC, and for a range of sample sizes, cluster sizes, and ICCs)? We address these questions separately for factorial designs with full EIC (Study 1) and partial EIC (Study 2). In the case of partial EIC, we investigate two additional questions—whether individuals should be assigned in a balanced or intentionally unbalanced way on the clustering factor X1, and whether the answers to the above questions change appreciably depending on whether error variances are assumed equal between clustered and unclustered individuals. The data were simulated and analyzed using SAS.

Simulation Study 1: Factorial Designs with Full EIC

Methods

Data-generating model

Each simulated dataset was based on a simulated randomized experiment with five effect-coded dichotomous factors X1j, X2j, …, X5j, using the following ANCOVA model:

Yij=γPPij+uj+γ1X1j+γ3X3j+γ1,3X1jX3j+eij. (10)

Here, eij~N(0,σe2) is the individual-level random error, and uj~N(0,τu2) is the random effect of cluster. This model is essentially the full-EIC (Cj = 1 for all j) version of Model 6, but with five factors instead of three, and some coefficients set to zero. We use slightly simplified notation here instead of the more formal multilevel subscripts in the earlier models; for example we use γP (rather than γ10 as in Model 6) to denote the pretest effect on posttest. Also, for simplicity we set the intercept to zero because the intercept cancels out in the contrasts of interest and therefore does not matter for power of the tests of interest.

For convenience we assume that the pretest variance σP2 is 1 and that σY2, which denotes the posttest variance after controlling for treatment and cluster effects, is 1. As in the simulations of Dziak et al. (2012), we assume that ρpre,post, which denotes the pretest-posttest correlation after controlling for treatment and cluster, is 0.65. Because the model implies that σY2=Var(Yij|uj)=γP2σP2+σe2 and because ρpre,post=γP/σY2, some algebra gives γP = 0.65, and σe2=10.6520.5775. We assume that the posttest ICC ρY is either 0.10 or 0.20. Because the model implies that ρY=τu2/(τu2+γP2σP2+σe2), some algebra gives τu2=ρY/(1ρY). This means we set τu2 to either 0.1/(1 − 0.1) ≈ .1111 or 0.2/(1 − 0.2) = .25.

We assume for simplicity that γ13= γ1,3 and set this value to 0, 0.1, 0.15, or .25. The conditions in which the coefficients are set to 0 enable estimation of the Type I error rate, and the conditions in which the coefficient is nonzero allow estimation of power for different effect sizes. Because σY = 1, setting γ = 0.1 for a main effect corresponds to d = 2γ/σY = 0.2 (“small” in Cohen, 1988), γ = 0.15 corresponds to d = 0.3 (“small to moderate”), and γ = 0.25 corresponds to d = 0.5 (“moderate”).

Design

We assumed two scenarios regarding the overall design of the study. The first was a complete factorial. In this case, each of the 2×2×2×2×2 = 32 possible conditions defined by the five factors were used. The second was a fractional factorial, specifically a half fraction, in which only 16 of these conditions were used. This half factorial design was the same as the one shown in Table 4 of Dziak et al. (2012). In this design, each main effect is aliased with a four-way interaction, and each two-way interaction is aliased with a three-way interaction.

Simulated sample size

Depending on the simulation scenario, we modeled an experiment with 300, 400, 500, or 600 total individuals available, to be assigned to clusters of size 5 or 10. Note that the upper end of this range of sample sizes is not uncommon in health behavior intervention research; for example, in a meta-analysis of randomized trials (or quasi-experimental designs) in the area of tailored health behavior change interventions, Noar, Benac, and Harris (2007) found that over 50% of the reviewed studies had a sample size greater than 500. We simulated the data such that each simulated individual drops out of the study with independent random probability 0.20, which sometimes causes the cluster sizes to be unequal. When predicting power using the power formula, the 20% dropout was taken into account by treating the cluster size as 4 or 8 instead of 5 or 10, but the inequality of the cluster sizes was not further taken into account. This provides an opportunity to use simulation results to check for the robustness of the formula to somewhat unbalanced cluster sizes.

Assumed model for performing tests

In real life, a researcher analyzing data from a factorial study does not know for certain which interactions are negligible and which are not. That is, the investigator does not know in advance that many of the possible interactions have coefficients of zero. Therefore, we assume that the investigator fits the following model.

Yij=γ0+γPPij+uj+γ1X1j+γ2X2j+γ1,2X1jX2j+γ1,3X1jX3j+γ1,4X1jX4j+γ1,5X1jX5j+γ2,3X2jX3ij+γ2,4X2jX4j+γ2,5X2jX5j+γ3,4X3jX4j+γ3,5X3jX5j+γ4,5X4jX5j+eij (11)

Model 11 does not include three- or four-way interactions, for three reasons. First, they are zero in the true data-generating model, although of course this would not be known to a real-world investigator. Second, researchers in practice may choose to fit parsimonious models lacking third- and higher-order interactions because such complex interaction effects are typically difficult to detect and interpret. Third, in the half factorial scenario it would be impossible to fit a model that includes these interactions, because each would be aliased with an effect that is already in Model 10 (see Collins et al., 2009). The SAS code for fitting this model is provided in Appendix C.

Other technical details

The significance of each effect was decided by a marginal (“Type III” in SAS) significance test on the appropriate coefficient. For the purpose of predicting power, the assumed denominator degrees of freedom was counted as the number of clusters minus the number of regression parameters (the latter was counted as 17, for one intercept, one coefficient for pretest, five main effects and (52) = 10 interactions). When actually performing the test, a Satterthwaite approximation was used to provide slightly more power.

Summary

Table 6 summarizes all the simulation study scenarios defined by the characteristics described above (128 scenarios). For each scenario, 5000 datasets were generated

Table 6.

Scenarios in Simulation Study 1

Independent variable Levels
Design of experiment Complete factorial or fractional (half) factorial
Number of individuals 300 or 400 or 500 or 600
Size of each cluster (before dropout) 5 or 10
True effect size 0 or 0.10 or 0.15 or 0.25
True intraclass correlation 0.10 or 0.20

Results

For simplicity, we focus only on γ1 when reporting the Type I error rate and power, although we could equivalently have examined a different parameter. Results for the main effect γ3 or the interaction γ1,3 are very similar to those for γ1, in each of the 128 scenarios. The absolute difference in Type I error or in power between these parameters was never more than .026 in any scenario, and was often less than .005. Moreover, as expected, there was little systematic difference, either in Type I error or in power, between complete and fractional factorial experiments for comparable conditions. When comparing these designs, the absolute difference in Type I error was always less than .015, and the absolute difference in power was always less than .04 and usually less than .02. Thus, we consider the two types of design together when reporting the results. Below, we summarize the results by considering each of the motivating questions in turn.

First, the tests of the main effects and interactions were valid, in that for conditions with true effect size of zero, the Type I error rate for γ1 was very close to nominal (between .038 and .059 for a nominal .05 test). Similarly, Type I error for the effects omitted from Model 9 (e.g., γ2, γ4, γ5, γ1,2, γ1,4) was always between .039 and .060, regardless of scenario.

Second, the results (in Table 7) indicated that acceptable statistical power can be obtained with feasible sample sizes. For example, suppose that 500 individuals are assigned to clusters of size 5. If ICC is 0.1, then we observe a power of 0.82 for detecting a small to moderate (d = .3) effect size. As expected, the same scenario with a larger ICC (0.2) yielded lower power (0.65) for detecting a small to moderate (d = .3) effect size, yet acceptable power for detecting a moderate (d = .5) effect size.

Table 7.

Power Estimates in Simulation Study 1

d = .2 d = .3 d = .5

N ni Complete Fractional Predicted Complete Fractional Predicted Complete Fractional Predicted
ICC= .1
300 5 0.29 0.30 0.32 0.55 0.59 0.61 0.94 0.96 0.96
300 10 0.18 0.22 0.36 0.43 0.76 0.84
400 5 0.40 0.41 0.41 0.72 0.73 0.74 0.99 0.99 0.99
400 10 0.27 0.27 0.29 0.55 0.52 0.56 0.93 0.93 0.94
500 5 0.49 0.49 0.50 0.82 0.82 0.83 1.00 1.00 1.00
500 10 0.34 0.35 0.36 0.63 0.67 0.67 0.97 0.98 0.98
600 5 0.57 0.56 0.57 0.88 0.89 0.90 1.00 1.00 1.00
600 10 0.40 0.40 0.42 0.70 0.73 0.76 0.99 0.99 0.99

ICC= .2
300 5 0.20 0.21 0.23 0.39 0.42 0.44 0.81 0.84 0.85
300 10 0.13 0.15 0.21 0.27 0.52 0.61
400 5 0.28 0.28 0.29 0.54 0.55 0.56 0.93 0.94 0.94
400 10 0.18 0.18 0.19 0.34 0.35 0.36 0.74 0.73 0.76
500 5 0.33 0.35 0.35 0.65 0.64 0.66 0.97 0.97 0.98
500 10 0.21 0.22 0.23 0.41 0.44 0.44 0.83 0.86 0.86
600 5 0.41 0.41 0.41 0.72 0.74 0.74 0.99 0.99 0.99
600 10 0.25 0.25 0.27 0.47 0.50 0.52 0.89 0.90 0.92

Notes. N denotes the total sample size before dropout, ni the cluster size before dropout (so that N/ni is the number of clusters), “Complete,” “Fractional,” and “Predicted” refer to the simulated power for the complete factorial, the simulated power for the fractional factorial, and the predicted power from the proposed formula. d refers to the Cohen’s d for a main effect (i.e., twice the γ parameter), and ICC refers to posttest intraclass correlation. Impossible designs are marked with a dash.

Third, the simulations indicated that the power formula provides a very good estimate of power, except in cases with only 30 clusters. In these cases, power was limited because of the extremely limited degrees of freedom (17 parameters to estimate but only 30 independent experimental units). In most cases, the power formula over-predicted power by about 1% or 2%, perhaps due to failure to correct for the effects of unequal numbers of members per cluster or unequal numbers of clusters per condition, but this is a very small difference considering the large amount of uncertainty inherent in power analysis.

Simulation Study 2: Factorial Designs with Partial EIC

In addition to the three primary questions noted earlier, this simulation study was also designed to address two questions concerning the effects of the allocation proportion and the equality of error variances, two topics that are of special interest in partial EIC settings. Specifically, the fourth question is as follows: for maximum power, should individuals be assigned in a balanced way on X1? Ordinarily, it is desirable to have balanced assignment on factors. However, as discussed earlier, individuals with X1 = +1 will be subject to cluster-level variance in addition to their individual-level variance, so the means of cells with X1 = +1 essentially have larger sampling variance. Thus, we hypothesized that more individuals should be assigned to the +1 level to counteract this unequal amount of estimation error, such that the optimal allocation proportion (of individuals to X1 = +1) would be more than 50%. The fifth question was as follows: does the proposed power formula continue to accurately predict power in cases where the error variances for unclustered (σe02) and clustered (σe12) individuals are not equal? Specifically, we considered the possibility that σe02>σe12, so that being in a cluster makes individuals more similar, above and beyond the shared cluster intercept.

Methods

Each simulated dataset was based on a simulated randomized experiment with five dichotomous effect-coded factors X1j, X2j, …, X5j, using the following data-generating model:

Yij=γPPij+ujCj+γ1X1j+γ3X3j+γ1,3X1jX3j+eij. (12)

Here, uj~N(0,τu2), and eij~N(0,σe02) for unclustered individuals, and eij~N(0,σe12)for clustered individuals. As in Simulation Study 1, we assume that γ13 = γ1,3, the pretest variance (σP2) is 1, the posttest variance after controlling for treatment and cluster effects (σY2) is 1, and γP = 0.65. However, unlike the scenario used for Simulation Study 1, here the data-generating model includes a cluster-generating treatment factor X1j, as well as Cij −a dummy-coded version of X1j, which is set to 0 if X1j = −1 and to 1 if X1j = +1. Hence, the posttest ICC (ρY) is relevant only for the clustered individuals, and for them it is set to either 0.10 or 0.20. In the equal variance scenario, we set σe02=σe12=1γp2=0.5775 to achieve the desired pretest-posttest correlation ρpre,post=γP/σY2=.65,, and we set τu2=ρY/(1ρY) to achieve the desired posttest ICC of ρY=.10 or.ρY=.20. In the unequal variance scenario, we set the error variance to be twice as large for unclustered compared to clustered individuals, so that σe12=23(1γp2), and σe02=43(1γp2). As a result, the pretest-posttest correlation is somewhat different between clustered and unclustered individuals, namely ρpre,post=γPγp2+23(1γp2)=.723 and ρpre,post=γPγp2+43(1γp2)=.595, respectively. We set τu2=γp2+23(1γp2)ρY1ρY to achieve the desired posttest ICC in the unequal variance scenario.

Additionally, for simplicity we set γ1,γ3,γ1,3 in the current scenario to either 0 or 0.15 (roughly corresponding to Cohen’s d of 0 or 0.3). Last, we assume that the investigator chooses to assign levels of X1j either in a naively balanced way (50% clustered, 50% unclustered) or else in a seemingly unbalanced way that allocates either 60% or 70% of the individuals to clustered conditions. Note that allocating a larger portion of individuals to the clustered conditions is done in a way that increases the number of clusters in the clustered conditions, rather than the number of individuals within each cluster. Allocating more individuals to the clustered condition by increasing cluster size would be a less efficient way to improve power, because the benefit of additional observations per cluster is partially balanced out by the increased design effect associated with greater cluster sizes for a given ICC (see Baldwin et al., 2011).

Design

We assumed either a complete or half factorial, using the same designs as in Simulation Study 1, except that now X1 not only served as an experimental factor with a fixed effect, but also determined whether an individual is assigned to a cluster or not.

Simulated sample size

As before, we modeled an experiment with 300, 400, 500, or 600 total individuals available, to be assigned to clusters of size 5. Each individual has independent random probability 0.20 of dropping out. This is then taken into account just as before when calculating power.

Assumed model for performing tests

As before, we assume that the investigator fits a model with all main effects and two-way interactions. That is, we use Model 11 as before, except that we replace uj with Cjuj. The SAS code for fitting this model is shown in Appendix C.

Other technical details

As in Simulation Study 1, the significance of each effect was decided by a marginal (“Type III” in SAS) significance test on the appropriate regression coefficient. The assumed denominator degrees of freedom of the test for predicting power was conservatively estimated as the number of clusters minus the number of regression parameters. When actually performing the test, a Satterthwaite approximation was used. Residual error variances were allowed to differ between clustered and unclustered individuals in the analysis.

Summary

Table 8 summarizes all simulation study scenarios defined by the characteristics described above (192 scenarios). For each scenario, 5000 datasets were generated.

Table 8.

Scenarios in Simulation Study 2

Independent variables Levels
Design of experiment Complete factorial or fractional (half) factorial
Number of individuals 300 or 400 or 500 or 600
Error variances ratio between unclustered and clustered individuals 1:1, or 2:1
True effect size 0 or 0.15
True intraclass correlation 0.10 or 0.20
Allocation proportion 50% or 60% or 70%

Note. The cluster size (before dropout) was fixed at 5, unlike in Simulation Study 1.

Results

The absolute difference in Type I error between γ1 and either γ3 or γ1,3 in the d = 0 scenarios did not exceed 0.012. Also, results for γ1 with respect to power were similar to those for γ3 or γ1,3. Specifically, the absolute difference in power between γ1 and either γ3 or γ1,3 in the d=.30 scenarios did not exceed 0.023 in any scenario. Therefore, only results for γ1 are presented and discussed. The equivalence among these effects is not immediately intuitive, given that γ1 is the main effect of a factor that determines clustering (X1), while γ3 is the main effect of a factor that was assigned after clustering (X3), and γ1,3 is the interaction between the two. Still, as we explain in detail in Appendix B, such equivalence is reasonable mathematically, because of the factorial structure of the experiment. That is, all effects are estimated using data from all individuals; therefore all effect estimates are subject in a similar manner to the entire cluster structure of the design.

Simulated and predicted power are compared in Table 9. The difference in power between complete and fractional factorial designs was usually extremely small, but sometimes there was a slight advantage to fractional factorials, especially when N=300. This advantage has no obvious theoretical explanation but may be an artifact of randomization. With 300 individuals, 50% allocation to clustering, and cluster size 5, there are 30 clusters. For the fractional factorial design, 3 clusters were randomized to each of the 8 clustered conditions, and 6 clusters were left over. The leftover clusters were randomized independently to any clustered condition. For the complete factorial design, 1 cluster was assigned to each of the 16 clustered conditions. There were 14 leftover clusters, which were then assigned independently to any clustered condition, allowing a higher probability of poorer balance. However, this could be avoided in practice by restricted randomization. Thus, the five simulation questions can be addressed in a straightforward way for complete or fractional factorial designs together.

Table 9.

Power Estimates in Simulation Study 2

Equal variances Unequal variances
N Alloc. Complete Fractional Predicted Complete Fractional Predicted
ICC = .1
300 50% 0.69 0.69 0.67 0.71 0.72 0.70
60% 0.72 0.71 0.70 0.73 0.72 0.70
70% 0.69 0.69 0.68 0.66 0.68 0.65
400 50% 0.82 0.82 0.82 0.84 0.86 0.84
60% 0.84 0.85 0.83 0.84 0.84 0.83
70% 0.81 0.82 0.81 0.79 0.79 0.78
500 50% 0.91 0.91 0.90 0.92 0.93 0.92
60% 0.91 0.91 0.91 0.91 0.91 0.91
70% 0.89 0.89 0.89 0.87 0.87 0.87
600 50% 0.95 0.95 0.95 0.96 0.96 0.96
60% 0.95 0.96 0.95 0.95 0.96 0.95
70% 0.94 0.94 0.94 0.92 0.93 0.93
ICC = .2
300 50% 0.55 0.57 0.55 0.59 0.61 0.58
60% 0.61 0.61 0.59 0.61 0.63 0.61
70% 0.59 0.60 0.59 0.57 0.60 0.58
400 50% 0.71 0.72 0.70 0.75 0.75 0.74
60% 0.73 0.74 0.73 0.77 0.78 0.75
70% 0.73 0.73 0.73 0.73 0.73 0.72
500 50% 0.81 0.80 0.80 0.84 0.84 0.84
60% 0.84 0.84 0.83 0.85 0.85 0.84
70% 0.82 0.82 0.82 0.81 0.82 0.81
600 50% 0.87 0.87 0.87 0.90 0.91 0.90
60% 0.89 0.90 0.89 0.90 0.91 0.90
70% 0.88 0.88 0.89 0.87 0.87 0.88

Note. N denotes total sample size before dropout. “Alloc.” refers to the proportion of the sample assigned to clusters. “Complete,” “Fractional,” and “Predicted” refer to simulated power for complete factorial, simulated power for fractional factorial, and predicted power from the proposed formula. ICC refers to posttest ICC. In each case, the effect size is d = .3 (where d refers to the Cohen’s d for a main effect, which is twice the γ parameter) and the pre-dropout cluster size is ni = 5.

Regarding the first question, about validity of the null hypothesis tests, we found that for scenarios in which the true effect of Factor 1 was zero, the Type I error rate was between .039 and .058 for this effect in every scenario. In addition, coefficients that were missing from the data-generating model had Type I error rates between .040 and .064 in every scenario. It is reasonable to conclude that Type I error rate is essentially nominal, as desired.

Regarding the second question, about the feasibility of factorial designs with partial EIC, the results (in Table 9) indicate that acceptable statistical power can be obtained in such a setting. For example, a total of 400 individuals, with half allocated to clusters of size 5, yields acceptable power (slightly above 0.8) for detecting a small to moderate effect size, given a small ICC.

Regarding the third question, about the performance of the proposed power formulas, as shown in Table 9, the power formulas appear to be very accurate in terms of predicting power for complete and fractional factorial designs with partial EIC, although the estimates tended to be slightly conservative due to the intentionally conservative degrees of freedom estimate.

Regarding the fourth question, about allocation proportion, there was usually little difference between 50%, 60%, and 70% allocation to clusters. 60% allocation to clusters was often, although not always, slightly more powerful than 50% allocation. 70% allocation was generally slightly less powerful than 50% or 60% allocation. It is reasonable to recommend either equal allocation or only slightly greater allocation to the clustered condition.

Regarding the fifth question, about the performance of the proposed power formula in cases where the error variances for unclustered and clustered individuals are not equal, we found the error variance scenarios to have very little systematic influence on power; the power formula performed well in either scenario. Consistent with the power formula in Table 5, these results indicate that power generally improves to the extent that the overall level of error variance is lower, regardless of whether the clustered and unclustered conditions differ in terms of their respective error variances. To clarify this, recall that we compared the unequal and equal error variance scenarios while holding the overall amount of error variance fixed. Specifically, we set the error variance either to .5775 for all individuals, or to .385 for clustered and .770 for unclustered individuals, two numbers whose average is .5775. Table 5 makes it clear that what matters to power is a combination of σe02 and σe12. Although this combination is weighted according to allocation (J1n versus J0), allocation proportions in this simulation study were set to be equal or near-equal, and therefore σe02 and σe12 contributed about equally to the overall sampling variance of treatment effect. Reducing one while increasing the other, then, had little net effect on sampling variance, and therefore little effect on predicted or simulated power.

In practice, it is not clear how someone planning a partial EIC factorial experiment could predict whether error variances would be equal or not. Therefore, since it is difficult to decide on their relative sizes, and their relative sizes do not matter very much, it seems reasonable to assume equal variances for simplicity when using the power formula, and then allow unequal variances for greater robustness when analyzing the data.

Discussion

In the current article we discussed modeling and power considerations for factorial designs with full and partial EIC. There has recently been increased interest in factorial designs as a tool for improving and developing high-quality multi-component interventions (e.g., Cook et al., 2016; Howard & Jacobs, 2016; Pellegrini et al., 2014; 2015). However, EIC is prevalent in psychological and intervention research (Baldwin et al., 2011; Roberts & Roberts, 2005), and, to our knowledge, no past literature has explained how to implement factorial designs in such a setting. Therefore, the current study serves as a bridge between these two design literatures, helping investigators plan and properly analyze data arising from factorial designs with EIC. The results of our simulation studies indicated that under reasonable scenarios of number of clusters, number of individuals within clusters, and ICC, adequate power can be achieved for detecting main effects and selected two-way interactions in factorial designs with full or partial EIC. Factorial designs with partial EIC usually offered better power than those with full EIC; however, it is possible to obtain adequate power even with full EIC.

The pattern of results obtained here is consistent with the results seen in between-cluster RCTs (Murray, 1998) and multilevel factorial designs (Dziak et al., 2012). Specifically, power increases to the extent that ICC is lower. Moreover, power increases to the extent that the number of clusters and the number of individuals within a cluster increase, with the number of clusters having more influence on power than the number of individuals within a cluster. Although our results showed that in some scenarios the number of clusters had to be rather large to obtain adequate power, this is not a result of the presence of multiple factors, but rather a result of the small to moderate effect size assumed in these scenarios. Indeed, the simulated power per factor in the five-factor screening scenario can be predicted reasonably with power formulas that assume a single factor (except for adjustment of the degrees of freedom). However, because small effect sizes were assumed in most scenarios, the sample size requirements were relatively high for some conditions. Small effect sizes were assumed here because in the context of screening experiments, the goal is to detect effects for each individual intervention component. These effects would reasonably be expected to be smaller than the effect of an entire intervention package consisting of multiple active components.

Complete vs. Fractional Factorials with EIC

As expected, we found in both the full and partial EIC simulations that the complete and fractional designs were about equally powerful given an equal total sample size. Hence, the decision of whether to use complete or fractional factorial should depend on other practical, ethical and scientific considerations. Specifically, although a fractional factorial design requires the same number of individuals as a complete factorial, fewer experimental conditions are needed. Hence a fractional factorial design might be easier to implement and/or less costly than a complete factorial design (see Collins et al., 2009).

A potential disadvantage of fractional factorial designs is that they always involve some aliasing of effects. The strategy behind the use of fractional factorial designs involves deliberately aliasing effects of primary scientific interest, typically main effects and lower-order interactions, with higher-order interactions that are not of primary scientific interest and that can be assumed to be negligible in size. In these cases, the estimate of the aliased effect, which is an estimate of a combination of two or more effects, is attributed to the effect of primary interest. In cases where the assumption about the size of these higher-order interactions is incorrect, this attribution is also incorrect, and the resulting effect estimate will be an under- or over-estimate of the effect of primary interest. This, in turn, has an impact on the Type I error rate and power (Collins et al., 2009; Dziak et al., 2012). In our simulation studies we did not include substantial higher-order interactions when we generated the data. Hence, as expected, there was little systematic difference in terms of Type I error and power for the effects of interest between complete and fractional factorial experiments. In practice, software can be used to properly plan the aliasing structure of a fractional factorial design so as to reduce the risk of substantial bias (see Wu & Hamada, 2011 for a more detailed discussion of aliasing).

Power Planning Resources

In the current article, we provide power planning formulas for factorial designs with either full or partial EIC. Our simulation results indicated that the proposed power planning approach provides a reasonable approximation to the actual power for both cases. While the simulation results in Tables 7 and 9 cover only a few selected scenarios, the formulas in Tables 4 and 5 can be used very widely to guide investigators in planning factorial designs with EIC. Still, the results indicated that the power formula slightly over-predicted power for factorial designs with full EIC and slightly under-predicted power for factorial designs with partial EIC. These discrepancies were generally quite minor in practical terms.

Cluster Allocation in Factorial Designs with Partial EIC

Our results with respect to cluster allocation in factorial designs with partial EIC are consistent with prior investigations of cluster allocation in RCTs with partial EIC (Baldwin et al., 2011). Specifically, our results indicate that allocating more individuals to the clustered condition provides a small increase in power compared with equal allocation.

Roberts & Roberts (2005) provided the following formula for optimal allocation of individuals in large RCTs with partial EIC:pq=1+(m1)ICC, where p is the proportion of individuals assigned to the clustered condition, q is the proportion of individuals assigned to the unclustered condition, m is the number of individuals within each cluster, and ICC is the posttest ICC. Based on this formula, for a cluster size of 5, the optimal cluster allocation proportion (p) is 54% clustered in the case of ICC = 0.1, and 57% clustered in the case of ICC = 0.2, regardless of sample size. This may apply to the factorial case as well, although our simulations were not precise enough to distinguish between these values. More precise estimates in specific circumstances could be achieved from specially designed simulation studies. There is probably a range of reasonable values for the proportion allocated to clusters, and these values can be chosen based on considerations other than power (e.g., ethical or practical). For example, if group treatment is highly expensive, a balanced allocation might be more warranted.

Randomization Plans for Factorial Designs with EIC

When planning the randomization scheme for factorial designs with EIC, careful consideration should be given to the potential for contamination. If individuals within a cluster receive different experimental conditions, then the potential for contamination could be high (see Dziak et al., 2012). Therefore, it will often be helpful to have a randomization plan that assures that everyone in a given cluster is also in the same condition. This can be done in various ways, depending on whether full or partial EIC is used. When planning factorial designs with full EIC, investigators can either begin by assigning individuals to clusters and then randomly assigning clusters to conditions, or they can equivalently begin by assigning individuals to the conditions and then assign individuals to clusters within each condition. Either method assures that all the members of a given cluster receive the same condition.

In the case of factorial designs with partial EIC, individuals cannot be assigned to clusters before they are assigned to conditions, because certain conditions are clustered and others are not. Therefore, one option would be to assign individuals first to conditions, and then randomly assign individuals to clusters within each clustered condition. Another option would be to divide the randomization scheme into three steps. First, assign individuals to the two levels of the clustering factor X1. Second, randomly assign individuals in the On level (i.e., the clustering level) of X1 to clusters. Finally, randomly assign clusters in the On level of X1, as well as individuals in the Off level of X1, to the experimental conditions resulting from crossing the remaining (non-clustering) factors X2 …,XK. Once again, either method leads to clusters in which all members have the same treatment condition.

Limitations and Directions for Future Research

The discussion of factorial designs with partial EIC in this article is limited to designs in which only one of the factors involves generating clusters. Note that this does not mean that only one factor is cluster-level or is affected by clustering, but rather that there is only a single way in which the individuals are clustered. However, other kinds of partial EIC factorial designs are possible. For example, consider the investigation of five intervention components, of which one involves generating in-person support groups, and another involves generating online support groups via social media tools. In this case, there will be two cluster-generating factors, one aiming to assess the efficacy of in-person group support (i.e., In-Person Support = On) vs. no group support (i.e., In-Person Support = Off); and the other aiming to assess the efficacy of online group support (i.e., Online Support = On) vs. no online group support (i.e., Online Support = Off). Such an experiment is possible but beyond the scope of this paper.

Further, in the current study we assumed only one level of clustering. More complicated scenarios might occur if, for example, individuals are assigned to groups, and then multiple groups are led by each of a limited number of therapists. In these cases, there are technically two levels of clustering, as individuals are nested within groups and groups are nested within therapists. Studies involving such clustering structure use various means to address this dependency, such as balancing conditions across therapists, such that all therapists facilitate an approximately equal number of groups in each condition in an attempt to reduce therapist-level effects (e.g., Herbert et al., 2009); ignoring the therapist-level in primary analyses (Peterson, Mitchell, Crow, Crosby, & Wonderlich, 2009), sometimes after showing no differences in outcome between therapists in each treatment condition (Lecomte, Leclerc, Corbiere, Wykes, Wallace, & Spidel, 2008); or adding therapists as a covariate to control for possible significant therapist effects (e.g., Bergraff et al., 2014). The adequacy of these approaches is debated (e.g., Murray, 1998; De Jong, Moerbeek, & Van der Leeden, 2010; Wampold & Bolt, 2006). Hence, future investigations of factorial designs with EIC might focus on modeling considerations and power planning resources that accommodate more complicated designs with multiple levels of nesting.

Supplementary Material

1

Translational Abstract: Multilevel Factorial Designs with Experiment-Induced Clustering.

There are many ways to design an experiment, depending on what question a study is trying to answer. For a variety of reasons, however, most experiments in psychological science are conducted using a single design, the randomized controlled trial. Another type of design, the factorial experiment, holds great promise in psychological science research. In many instances, factorial experiments can be used to develop more efficient, more effective, and more personalized treatments, to name just a few advantages. However, during an experiment, participants are often clustered in units like schools, social support groups, or neighborhoods, such that participants in the same cluster may be more similar to each other than to participants in other clusters. These similarities can alter the outcomes of experiments in ways that reduce the reliability of the experiment’s results. Sometimes, these clusters are created by the experiment itself. Until now, there has been no way for researchers to account for this type of experiment-induced clustering (EIC) when conducting factorial experiments. Tools for dealing with EIC do exist for other types of experiments, however. This article provides statistical tools to allow researchers to decide whether a particular factorial experimental design involving EIC is feasible, as well as tools for analyzing data from such trials. The authors demonstrate that factorial experiments can be powerful and feasible even with EIC.

Acknowledgments

This project was supported by Awards P50 DA010075, P50 DA039838, P01 CA180945, R01 DK097364, R01 AA022931, and R01 DA039901 from the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

The authors thank Dr. Michael Russell (Methodology Center, Penn State) for extremely valuable assistance in adapting their SAS code to estimate different error variances between conditions. They thank Amanda Applegate for her expert proofreading and editing help. John Dziak thanks Jessica Dolan for planning assistance and guidance.

Appendix A

Justification of the Proposed Regression Approach for Factorial Designs with Partial EIC

In this paper we proposed a new approach to modeling treatment effects in a factorial experiment with partial EIC. In that approach, we represent the cluster-generating factor twice, once with an effect-coded variable X1 (+1 for clustered and −1 for unclustered), and once with a dummy-coded C (1 for clustered and 0 for unclustered). We asserted that this approach allows investigators both to conveniently model the random cluster effect only for individuals in the clustered condition and to interpret the effects of multiple factors in a straightforward manner while using standard procedures in common analysis software (e.g., PROC MIXED in SAS; see Littell, Milliken, Stroup, Wolfinger & Schabenberger, 2006) to analyze data arising from a factorial design with partial EIC. Here, we explain in more detail why this approach does not result in a confounded model, despite the fact that the same information seems to be used twice. We begin by proposing an intuitively reasonable model for factorial experiments with partial EIC, one that many researchers would find acceptable but which cannot easily be fit using standard software. We then consider a simple way to re-express this model using only dummy codes, which closely resembles our Model 3, based in turn on Roberts and Roberts (2005) and Bauer et al. (2008). We show that this dummy-coding approach has some inconvenient features when dealing with more than one treatment factor, which is why we elected instead to use our hybrid parameterization using a dummy code and an effect code. We describe how the results of our hybrid parameterization can be translated to and from the results of the dummy-coding approach. For simplicity, we do not consider attrition or missingness in this presentation. Also, for simplicity of notation and without loss of generality, we suppose that there are two randomized factors, that each cluster consists of three individuals per cluster, and that error variances are the same in clustered and unclustered conditions. We ignore the pretest as in Model 5, but the basic structure of our arguments also applies to models such as 6.

To begin, consider that a reasonable model for analyzing data from a randomized experiment ought to take into account the way in which the randomization took place. In the factorial experiment with partial EIC as we described it, a typical individual is first randomly assigned to a level of factor 1. Suppose for now that factor 1 is dummy-coded and denote it as D1. If D1= 1, then individual i is randomly assigned to a cluster j, which includes other individuals. If D1= 0, then individual i is still given a cluster number j for bookkeeping reasons, but is the only individual in this “trivial cluster” of size one. Cluster j (whether trivial or nontrivial) is then assigned to a level of the second factor D2, also dummy-coded as 1 or 0. Note that the assignment is at the cluster level. If D1= 1, then clusters are assigned to the levels of the second factor to avoid contamination within a treatment group. If D1= 0, then assignment can still be said to be at the cluster level, because there is no difference between randomizing the individual and randomizing the one-person cluster.

Now consider modeling the cluster responses Yj for cluster j conditionally on D1and D2. Such a model would most naturally be expressed in parts. If D1 = 0 then Yj=[Y1j] is a vector of length 1, and a reasonable linear model would be

(Y1j|D1j=0,D2j=d2)~N(λ1+λ2d2,σe2),

for some regression parameters and λ1 and λ2variance. σe2. Equivalent to the above would be

If D1j=0thenY1j=λ1+λ2D2j+e1j, (13)

with e1j~N(0,σe2).

If D1 = 1 then Yj = [Y1j, Y2j, Y3j] is a vector of length 3, and a reasonable multivariate normal model would be

([Y1jY2jY3j]|X1j=1,X2j=d2)~N([λ3+λ4d2λ3+λ4d2λ3+λ4d2],[σe2+τu2τu2τu2τu2σe2+τu2τu2τu2τu2σe2+τu2])

for some λ3 and λ4, some cluster-level variance τu2, and some individual-level variance σe2. Equivalent to the above would be

IfD1j=1thenYij=λ3+λ4D2j+uj+eij,fori=1,3, (14)

with uj~N(0,τu2) and each eij~N(0,σe2). Taken together, Expressions 13 and 14 completely specify the assumed population distribution of the responses Y conditional on D1 and D2. One could numerically optimize the log-likelihood function implied by Expressions 13 and 14 in terms of the parameters, and thus obtain maximum-likelihood or restricted maximum-likelihood estimates of the parameters. This could be done either by writing one’s own code to implement an appropriate algorithm or by writing code in a general log-likelihood optimizer such as SAS PROC NLMIXED (see Littell et al., 2006). Unfortunately, this would be time-consuming and error-prone, relative to being able to use standard off-the-shelf software written for more familiar models. However, standard off-the-shelf software does not easily handle models constructed in a two-part way as in Expressions 13 and 14. Because of this dilemma, we sought a way to express Expressions 13 and 14 together in a single linear mixed effects model that could be fit more easily with more familiar procedure such as SAS PROC MIXED. Notice that Expressions 13 and 14 can be expressed simultaneously as

Yij=θ0+θ1D1j+θ2D2j+θ12D1jD2j+ujD1j+eij. (15)

Specifically, λ1 = θ0, λ2 = θ2, λ3 = θ0 + θ1, and λ4 = θ2 + θ12. There is no term for uj alone, without being multiplied by D1j, but intuitively there should not be such a term because it is not reasonable to make inferences about the cluster-level variability of people who are alone. A somewhat similar situation which can arise with conceptually nested covariates is described in Henry and Dziak (2016).

The dummy-coding notation in Model 15 is entirely adequate for running an interpretable linear mixed model. However, the disadvantage of dummy coding is that the coefficients do not correspond to ANOVA main effects or interactions and are not independent in their distributions or interpretations. For example, testing the main effect of the second factor should be equivalent to testing whether E(Yij|D2j = 1) − E(Yij|D2j = 0) is zero. However, from Model 15 it can be seen that E(Yij|D2j=1)E(Yij|D2j=0)=θ2+12θ12; there is no single regression parameter for E(Yij|D2j = 1) − E(Yij|D2j = 0) (see, e.g., Chakraborty, Collins, Strecher & Murphy, 2009; Kugler, Trail, Dziak & Collins, 2012). This is not inherently a problem, because the test can still be done using a linear combination of parameters, but it is a source of potential confusion and inconvenience. We have observed in practice that researchers often interpret the first-order coefficients of variables as main effects, regardless of whether they are truly main effects in the ANOVA sense or not. This is problematic because the value and interpretation of the coefficients depends on whether the factors are dummy-coded or effect-coded, and the first-order coefficients of dummy-coded factors are not actually main effects but simple effects. In an attempt to avoid this risk of confusion, we consider another way to express Models 13 and 15, which represents the same assumptions as Model 15 but whose parameters are more convenient to interpret when the design involves multiple factors.

The approach we propose involves using effect coding for the two factors. We denote the effect-coded factors X1 and X2, so that, for example, X1j = +1 if D1j = 1 and X1j = −1 if D1j = 0; in other words, X1j = 2D1j − 1 and, similarly, X2j = 2D2j − 1. We also define the clustering indicator Cj = D1j. Now let γ0=θ0+12θ1+12θ2+12θ12, γ1=12θ1+14θ12, γ2=12θ2+14θ12. and γ12=14θ12. Then expression 15 is algebraically equivalent to

Yij=γ0+γ1X1j+γ2X2j+γ12X1jX2j+ujC1j+eij, (16)

which is essentially the two-factor version of Model 5. Thus either Model 15 or Model 16 will serve as a valid way of expressing Models 13 and 14 together. As argued in the main text of this paper and elsewhere, tests of the γ parameters correspond directly to tests of main effects and interactions in the factorial ANOVA sense.

It would alternatively be possible to simply specify that

Yij=γ0+γ1X1j+γ2X2j+γ12X1jX2j+uj+eij

but constrain the variance of the uj to be zero for the subsample with X1j = −1. However, this would not be possible to do directly in many software packages, and it is logically the same as Expression 16, because Expression 16 multiplies uj by zero whenever X1j does not equal +1. Therefore, we argue that Expression 16, although initially counterintuitive, actually represents the most convenient way to write the conceptual model for this partially nested factorial design.

Appendix B

Derivation of the Sampling Variance Formulas for Calculating Power

Here, we provide rationales for the sampling variances given in Tables 4 and 5.

Full EIC, No Pretest (Model 5; Cj = 1)

Consider the regression coefficient γk for the main effect of an effect-coded factor Xk. Let μ^(Xk=L) be the average of all cell means having Xk = L. For example, if there are three factors in total, then we use μ^(Xk=+1) to denote the average of the (+1,+1,+1), (+1,+1,−1), (+1,−1,+1) and (+1,−1,−1) cells. If the cell sizes (number of individuals per cell) are equal, then this is also the average of all individuals having Xk = L; if they are unequal, then it is a weighted average (by the effective size of each cell). In either case, μ^(Xk=L) is the maximum likelihood estimate of E(Y|Xk = L) for a population with balanced allocation. The corresponding sample estimate of γk is

γ^k=μ^(Xk=+1)μ^(Xk=1)(+1)(1)=12(μ^(Xk=+1)+μ^(Xk=+1)).

Notice that μ^(Xk=+1) and μ^(Xk=1) are independent, because we assume random assignment to clusters and no contamination. Therefore, the sampling variance of γ^k is

Var(γ^k)=Var(12μ^(Xk=+1))+Var(12μ^(Xk=1))=14Var(μ^(Xk=+1))+14Var(μ^(Xk=1)).

By the Var(·) operator here we implicitly mean Var(·|X) (i.e., we are conditioning on the design). This is typical in linear models, especially for experiments with fixed effects factors assigned directly by the experimenter, such as those considered in the context of this paper.

To proceed further, we now assume balance in cell sizes and cluster sizes. Then, because the random effects terms in Model 5 are the same for every individual, and there is the same number of individuals in every cell, Var(μ^(Xk=+1))=Var(μ^(Xk=1)), so Var(γ^k) simplifies further to Var(γ^k)=12Var(μ^(Xk=+1)). Two terms in Model 5 contribute to this variance: the uj terms with variance τu2 and the eij terms with variance σe2. For purposes of computing this variance conditionally upon treatment assignment, the fixed effects terms do not matter, so we treat Var(μ^(Xk=+1)) as the variance of the sum of the averages of the random terms. The cluster-level term uj is averaged over J/2 clusters having Xk = +1, where J is the total number of clusters. The individual-level term eij is averaged over nJ/2 cluster members having Xk= +1, where n is the cluster size. In other words, Var(μ^(Xk=+1))=Var(i=1JujJ/2)+Var(i=1nJeijnJ/2)=τu2J/2+σe2nJ/2 such that

Var(γ^k)=12(τu2J2+σe2nJ2)=τu2J+σe2nJ,

as given in Table 4.

A similar argument can be made for interactions. Let μ^(Xa=La,Xb=Lb) be the average of all cell means having Xa =Laand Xb=Lb. For example, if there are three factors in total, then we use μ^(X1=+1,X3=+1) to denote the average of the (+1,+1,+1) and (+1,−1,+1) cells. Then

γ^a,b=μ^(Xa=+1,Xb=+1)μ^(Xa=+1,Xb=1)2μ^(Xa=1,Xb=+1)μ^(Xa=1,Xb=1)2(+1)(1)

So

Var(γ^a,b)=(14)2Var(μ^(Xa=+1,Xb=+1))+(14)2Var(μ^(Xa=+1,Xb=1))+(14)2Var(μ^(Xa=1,Xb=+1))+(14)2Var(μ^(Xa=1,Xb=1)).

Recall that μ^(Xa=La)=12μ^(Xa=La,Xb=+1)+12μ^(Xa=La,Xb=1) (this is true by our definition even if cell sizes in the observed sample are not balanced). This implies that Var(μ^(Xa=La))=(12)2Var(μ^(Xa=La,Xb=+1))+(12)2Var(μ^(Xa=La,Xb=1)). Thus,

Var(γ^a,b)=14Var(μ^(Xa=+1))+14Var(μ^(Xa=1))=Var(γ^a).

This means that the same formula for the sampling distribution of the regression coefficient applies both to main effects and to two-way interactions here.

This finding implies that the test of an interaction will have the same power as the test of a main effect, if they have the same true effect size (expressed as an effect-coded regression coefficient). This seems to conflict with findings elsewhere (see Peterson & George, 1993; Murray, 1998), in which an interaction has less power than a main effect if they have the same true effect size, but there is no actual conflict. The latter result comes from a different metric, which treats main effects and interactions somewhat differently (see Appendix A in Dziak, Nahum-Shani, & Collins, 2012). Some authors consider the interaction to be

μ^(Xa=+1,Xb=+1)μ^(Xa=+1,Xb=1)2μ^(Xa=1,Xb=+1)μ^(Xa=1,Xb=1)2=2γ^a,b,

while others define it as an unscaled difference of differences:

(μ^(Xa=+1,Xb=+1)μ^(Xa=+1,Xb=1))(μ^(Xa=1,Xb=+1)μ^(Xa=1,Xb=1))=4γ^a,b.

However, the effect-coded main effect is always expressed as, for example, 2γ^a rather than 4γ^a. If the interaction is “equal to the main effect” in terms of the former definition, then γ^a,b=γ^a; but if they are equal in terms of the latter definition, it follows that γ^a,b=12γ^a. The ambiguous meaning of “equal size” here can lead to contradictory recommendations, which demonstrates the need for thoughtful consideration of the meaning of effect sizes in sample size planning.

Therefore, for both main effects and interactions, we have the variance formula Var(γ^k)=τu2J+σe2nJ. However, one remaining inconvenience is that it may be difficult to find a reasonable value for τu2 and σe2 for power planning. Fortunately, it is only necessary to specify the posttest ICC, for the following reason. The covariance between two observations in the same cluster is Cov(Yij,Yij)=Cov(uj+eij,uj+eij)=τu2. The total variance of one observation is σe2+τu2. This means that the intraclass correlation is ρY=τu2σe2+τu2. Therefore, ρY1ρY=τu2σe2, such that

τu2J+σe2nJ=σe2(ρY(1ρY)J+1nJ),

as in Table 4.

Researchers may be able to find realistic values for ρY in their field of study by doing a literature review. Specifying σe2 can also be avoided if, as in Expression 8 one specifies γ as a standardized coefficient (i.e., as a multiple of σe rather than a raw score difference). Notice that in this setting, because there is no pretest, σe2 equals σY2 in Expression 7 which is the posttest variance, adjusting for any cluster and treatment effects that may exist but not adjusting for pretest. Hence, when specifying a standardized coefficient, σe2=σY2cancels out and only ρY is needed.

Full EIC with Pretest as a Covariate (Model 6; Cj = 1)

Continuing to assume balance (i.e., that cluster sizes are equal and the number of individuals in each condition is equal), now let the pretest adjusted response be Aij = Yij− γ10Pij and define μ^(Xk=L)(A) to equal the sample estimate of E(A|Xk = L). Then Var(γ^k)Var(12(μ^(Xk=+1)(A)μ^(Xk=1)(A))). The equality is not precise because γ10 must also be estimated and is not really a known constant; however, this is likely to be only a minor inaccuracy, similar to the way that estimation error in coefficients for effects other than Xk was ignored in the previous derivations. Thus, the only remaining random terms are uj and eij, and so, following the same logic as in the no-pretest scenario above, we have

Var(γ^k)Var(12(μ^(Xk=+1)(A)μ^(Xk=1)(A)))=14(Var(μ^(Xk=+1)(A))+Var(μ^(Xk=1)(A)))=12Var(μ^(Xk=+1)(A))=12(τu2J2+σe2nJ2)=τu2J+σe2nJ.

As above, this can be re-expressed in terms of correlations and marginal variances. Within a cluster, the total posttest variance without knowing the pretest (i.e., the variance conditioning on X and u but not conditioning on P) is σY2=γ102σP2+σe2, where σP2 is the variance of the pretest and γ10 is the egression coefficient of the pretest on the posttest. The pretest-posttest correlation conditioning on cluster membership will be

ρpre,post=γ10σP2σPσY=γ10σPσY,

so σe2=σY2γ102σP2=(1ρpre,post2)σY2. Thus, at least in theory, using the pretest never hurts, although it is more helpful when the pretest-posttest correlation is larger. This agrees with the findings of Vickers (2001) in classic ANCOVA with independent data. Finally, the total posttest variance without knowing the pretest (i.e., adjusting for X but not P) is σY2+τu2, so the posttest ICC is ρY=τu2/(σY2+τu2). Thus, τu2=σY2ρY/(1ρY). Therefore, (γ^k) can be re-expressed as

Var(γ^k)=τu2J+σe2nJ=σY2(ρY(1ρY)J+1ρpre,post2nJ), (17)

as in Table 4.

Recall that without a pretest, τu2J+σe2nJ=σY2(ρY(1ρY)J+1nJ) Intuitively, this is the same as Expression 17 but with ρpre,post = 0. More precisely, the reason why the formulas differ when expressed in terms of correlations, but appear not to differ when expressed in terms of variance components, is that σe2 has a slightly different interpretation when a pretest is present. When a pretest is not included, σe2 is the individual-level error in the posttest response. When a pretest is included, σe2 is the individual-level error in the pretest-adjusted posttest response.

In the full-EIC case, Model 6 appears essentially identical to the pretest-adjusted model for between-clusters experiments in which clusters exist prior to experimentation, as given in Dziak, Nahum-Shani, and Collins (2012). This is because, although the model implies that the posttest scores are clustered, it does not explicitly specify whether the pretest scores are clustered or not. However, in our power formula derivation we do use the assumption that pretest scores are unclustered. Dziak and colleagues (2012) were not able to provide an exact power formula in their context, because the relationship of the pretest cluster-level variability to the posttest cluster-level variability is important for power but is not specified in the analysis model. Dziak and colleagues (2012) therefore had to provide an approximate formula based on a related model (specifically, a three-level model in which pretest is included as a repeated measure). In the context of this paper, this limitation is removed because in the EIC setting it is assumed that pretest scores are unclustered.

Therefore, deriving a power formula was more straightforward for factorial designs with full EIC than for between-clusters factorial experiments. Deriving a power formula for partial EIC is slightly more complicated because in this setting some individuals have a different variance structure from others. However, as we explain below, a power formula can be derived by adapting the reasoning used earlier in the case of full EIC.

Partial EIC, No Pretest (Model 5; Cj is 0 or 1 depending on X1)

We now continue to the partial EIC (partial nesting) scenario with no pretest. We need to determine variance formulas for the main effect coefficients γk, both for k=1 (the cluster-generating factor) and k>1 (the remaining factors). First consider k= 1. Then Var(γ^k)=Var(12(μ^(x1=+1)μ^(x1=1))).

Therefore,

Var(γ^k)=(12)2Var(μ^(x1=+1))+(12)2Var(μ^(x1=1)).

Analogously to the full-EIC case, Var(μ^(x1=+1))=τu2J1+σe12J1n, where J1 is the number of nontrivial clusters, and n is the cluster size. However, cluster-level variability does not apply to those individuals with X1 = −1, so we simply have Var(μ^(x1=1))=σe02J0, where J0 is the number of unclustered individuals (trivial clusters). Combining these, we have

Var(γ^k)=τu24J1+σe024J0+σe124J1n,

as in Table 5.

Superficially comparing this to the corresponding full-EIC formula, the reader may notice a “4” in the denominator of this expression that was not found in Expression 17. This is simply a consequence of the use of different mathematical expressions (e.g., J1 and J0 instead of J/2 and J/2 for the number of clusters per factor level) and is not necessarily an indication of more precise results for one design relative to the other.

Now suppose that the main effect of one of the other factors, say Xk, is being tested. At first, this appears to be a different case from testing X1, because the variance structure depends only on X1 and not on Xk. However, μ^(Xk=xk)=12μ^(Xk=xk,X1=+1)+12μ^(Xk=xk,X1=1) and Var(μ^(Xk=xk))=14Var(μ^(Xk=xk,X1=+1))+14Var(μ^(Xk=xk,X1=1)) for both xk = +1 and xk = −1. Because we define the main effect as an average of cell means, not individuals, we do not need to assume either equal variances or balance between levels of X1 in order to conclude this. Therefore, even if k≠1,

Var(γ^k)=14Var(μ^(Xk=+1))+14Var(μ^(Xk=1))=116Var(μ^(X1=+1,Xk=+1))+116Var(μ^(X1=+1,Xk=1))+116Var(μ^(X1=1,Xk=+1))+116Var(μ^(X1=1,Xk=1))=14Var(μ^(X1=+1))+14Var(μ^(X1=1))=Var(γ^1).

Thus, the variance for each main effect is equal to the variance for the first main effect. A similar argument can be made that the variance for each interaction is equal to the variance for the first main effect, although the algebra would be slightly more involved, generally requiring eight terms of the form μ^(X1,Xa,Xb) rather than four terms of the form μ^(X1,Xk). Therefore, from here on we consider only Var(γ^1), because the theoretical sampling variance for the other coefficients will be the same.

Partial EIC with Pretest as Covariate (Model 6; Cj is 0 or 1 depending on X1)

Finally, we consider ANCOVA with partial EIC and the adjusted outcomes Aij as before. The variance of interest is Var(γ^k)=14Var(μ^(X1=+1)(A))+14Var(μ^(X1=1)(A)), and the random effects part of Aij is Cjuj + eij, so Var(μ^(X1=+1)(A))=τu2J1+σe12nJ1, and Var(μ^(X1=1)(A))=σe02J0.

Therefore,

Var(γ^k)=τu24J1+σe024J0+σe024J1n (18)

as in Table 5.

In order to find a way to express this formula in terms of correlations, we make the simplifying assumption that σe02=σe12=σe2. Then the posttest variance, after adjusting for cluster and treatment, is σY2=γ102σP2+σe2. The pretest-posttest correlation is ρpre,post = γ10 σpy as before, so σY2(1ρpre,post)=σY2γ102σP2=σe2. Also, as before, the posttest intraclass correlation is ρY=τu2/(σY2+τu2), so τu2=σY2ρY/(1ρY) We conclude that

Var(γ^k)=σY2(ρY4(1ρY)J1+1ρpre,post24J1n+1ρpre,post24J0),

as in Table 5. Note that this holds only under the assumption of equal error variances, and Expression 18 is more general.

Under the same assumptions, for the no-pretest scenario, we have (as in Table 5)

Var(γ^k)=σY2(ρY4(1ρY)J1+14J1n+14J0).

Appendix C

Sample SAS Code for Modeling Experiments with Full and Partial EIC

Below is the SAS code used for Model 11 in the context of Simulation Study 1. Here j represents the cluster, x1 through x5 represent the factors, Pij represents the pretest, and Yij represents the posttest.

PROC MIXED DATA=wide NOCLPRINT;
  CLASS x1 x2 x3 x4 x5 j;
  MODEL Yij = Pij x1 x2 x3 x4 x5
   x1*x2 x1*x3 x1*x4 x1*x5
   x2*x3 x2*x4 x2*x5
   x3*x4 x3*x5
   x4*x5/DDFM=SATTERTHWAITE;
   RANDOM INTERCEPT/SUBJECT = j(x1 x2 x3 x4 x5);
   ODS OUTPUT TESTS3=OutputAncova COVPARMS=cp;
  QUIT;

Below is the SAS code used for Model 12 in the context of Simulation Study 2. The variables have the same meaning as above, except that j can now identify either a real cluster or a trivial cluster (i.e., unclustered individual).

PROC MIXED DATA=wide NOCLPRINT;
 CLASS x1 x2 x3 x4 x5 j;
 MODEL Yij = Pij x1 x2 x3 x4 x5
  x1*x2 x1*x3 x1*x4 x1*x5
  x2*x3 x2*x4 x2*x5
  x3*x4 x3*x5
  x4*x5/DDFM=SATTERTHWAITE;
 RANDOM clustered/SUBJECT = j(x1 x2 x3 x4 x5);
 REPEATED/SUB=i LOCAL=EXP(clustered) TYPE=VC;/* Remove this REPEATED statement in order to assume equal error variances*/
 ODS OUTPUT TESTS3=OutputAncova COVPARMS=cp;
QUIT;

Footnotes

Author Note

The simulated data were analyzed using the SAS 9.2 package. The results were organized and summarized using R 3.0.2. R 3.0.2 software is copyright 2013 by The R Foundation for Statistical Computing. SAS 9.2 software is copyright 2002–2010 by SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA.

Contributor Information

Inbal Nahum-Shani, Institute for Social Research, 426 Thompson Street, Suite 2204, Ann Arbor, MI 48104-2321, University of Michigan.

John J. Dziak, The Methodology Center, 404 Health and Human Development Building, Penn State, University Park, PA 16802, Pennsylvania State University

Linda M. Collins, Department of Human Development & Family Studies and The Methodology Center, 404 Health and Human Development Building, Penn State, University Park, PA 16802, Pennsylvania State University

References

  1. Baker TB, Gustafson DH, Shah D. How can research keep up with eHealth? Ten strategies for increasing the timeliness and usefulness of eHealth research. Journal of Medical Internet Research. 2014;16(2):e36. doi: 10.2196/jmir.2925. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Baldwin SA, Bauer DJ, Stice E, Rohde P. Evaluating models for partially clustered designs. Psychological Methods. 2011;16(2):149–165. doi: 10.1037/a0023464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bauer DJ, Sterba SK, Hallfors DD. Evaluating group-based interventions when control participants are ungrouped. Multivariate Behavioral Research. 2008;43:210–236. doi: 10.1080/00273170802034810. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Berggraf L, Ulvenes PG, Øktedalen T, Hoffart A, Stiles T, McCullough L, Wampold BE. Experience of affects predicting sense of self and others in short-term dynamic and cognitive therapy. Psychotherapy. 2014;51(2):246. doi: 10.1037/a0036581. [DOI] [PubMed] [Google Scholar]
  5. Candel MJ, Van Breukelen GJ. Varying cluster sizes in trials with clusters in one treatment arm: Sample size adjustments when testing treatment effects with linear mixed models. Statistics in Medicine. 2009;28(18):2307–2324. doi: 10.1002/sim.3620. [DOI] [PubMed] [Google Scholar]
  6. Chakraborty B, Collins LM, Strecher VJ, Murphy SA. Developing multicomponent interventions using fractional factorial designs. Statistics in Medicine. 2009;28:2687–2708. doi: 10.1002/sim.3643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Charlesworth G, Burnell K, Beecham J, Hoare Z, Hoe J, Wenborn J, Orrell M. Peer support for family carers of people with dementia, alone or in combination with group reminiscence in a factorial design: study protocol for a randomized controlled trial. Trials. 2011;12(1):1. doi: 10.1186/1745-6215-12-205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chebli JL, Blaszczynski A, Gainsbury SM. Internet-based interventions for addictive behaviours: a systematic review. Journal of Gambling Studies. 2016:1–26. doi: 10.1007/s10899-016-9599-5. [DOI] [PubMed] [Google Scholar]
  9. Cloitre M, Koenen KC, Cohen LR, Han H. Skills training in affective and interpersonal regulation followed by exposure: A phase-based treatment for PTSD related to childhood abuse. Journal of Consulting and Clinical Psychology. 2002;70(5):1067–1074. doi: 10.1037//0022-006x.70.5.1067. [DOI] [PubMed] [Google Scholar]
  10. Cohen J. Statistical power analysis for the behavioral sciences. 2nd. Hillsdale, NJ: Erlbaum; 1988. [Google Scholar]
  11. Collins LM, Dziak JJ, Kugler KC, Trail JB. Factorial experiments: Efficient tools for evaluation of intervention components. American Journal of Preventive Medicine. 2014;47(4):498–504. doi: 10.1016/j.amepre.2014.06.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Collins LM, Dziak JJ, Li R. Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods. 2009;14:202–224. doi: 10.1037/a0015826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Collins LM, Kugler KC, Gwadz MV. Optimization of multicomponent behavioral and biobehavioral interventions for the prevention and treatment of HIV/AIDS. AIDS and Behavior. 2016;20(1):197–214. doi: 10.1007/s10461-015-1145-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Collins LM, Nahum-Shani I, Almirall D. Optimization of behavioral dynamic treatment regimens based on the sequential, multiple assignment, randomized trial (SMART) Clinical Trials. 2014;11(4):426–434. doi: 10.1177/1740774514536795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Cook JW, Collins LM, Fiore MC, Smith SS, Fraser D, Bolt DM, Loh WY. Comparative effectiveness of motivation phase intervention components for use with smokers unwilling to quit: a factorial screening experiment. Addiction. 2016;111(1):117–128. doi: 10.1111/add.13161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Crespi CM. Improved designs for cluster randomized trials. Annual Review of Public Health. 2016;37:1–16. doi: 10.1146/annurev-publhealth-032315-021702. [DOI] [PubMed] [Google Scholar]
  17. Czajkowski SM, Powell LH, Adler N, Naar-King S, Reynolds KD, Hunter CM, Epel E. From ideas to efficacy: The ORBIT model for developing behavioral treatments for chronic diseases. Health Psychology. 2015;34(10):971. doi: 10.1037/hea0000161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Dallery J, Riley WT, Nahum-Shani I. Research Designs to Develop and Evaluate Technology-Based Health Behavior Interventions. In: Marsch L, Lord S, Dallery J, editors. Leveraging Technology to Transform Behavioral Healthcare. Oxford University Press; 2015. [Google Scholar]
  19. De Jong K, Moerbeek M, Van der Leeden R. A priori power analysis in longitudinal three-level multilevel models: an example with therapist effects. Psychotherapy Research. 2010;20:273–284. doi: 10.1080/10503300903376320. [DOI] [PubMed] [Google Scholar]
  20. Derlega VJ, Winstead BA, Wong PT, Hunter S. Gender effects in an initial encounter: A case where men exceed women in disclosure. Journal of Social and Personal Relationships. 1985;2(1):25–44. [Google Scholar]
  21. Donner A, Klar N. Design and analysis of cluster randomization trials in health research. London, England: Arnold; 2000. [Google Scholar]
  22. Dziak JJ, Nahum-Shani I. Three-level modeling for factorial experiments with experimentally induced clustering. University Park, PA: The Methodology Center, Penn State; 2016. (Technical Report No. 16-133). [Google Scholar]
  23. Dziak JJ, Nahum-Shani I, Collins LM. Multilevel factorial experiments for developing behavioral interventions: power, sample size, and resource considerations. Psychological Methods. 2012;172:153–175. doi: 10.1037/a0026972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Erez M, Arad R. Participative goal-setting: Social, motivational, and cognitive factors. Journal of Applied Psychology. 1986;71(4):591. [Google Scholar]
  25. Gainsbury S, Blaszczynski A. A systematic review of Internet-based therapy for the treatment of addictions. Clinical Psychology Review. 2011;31(3):490–498. doi: 10.1016/j.cpr.2010.11.007. [DOI] [PubMed] [Google Scholar]
  26. Herbert JD, Gaudiano BA, Rheingold AA, Moitra E, Myers VH, Dalrymple KL, Brandsma LL. Cognitive behavior therapy for generalized social anxiety disorder in adolescents: A randomized controlled trial. Journal of Anxiety Disorders. 2009;23:167–177. doi: 10.1016/j.janxdis.2008.06.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Henry K, Dziak JJ. A nested covariate approach to the inclusion of two-part predictors in a regression model. 2016 Manuscript submitted for publication. [Google Scholar]
  28. Hox JJ, Kreft IG. Multilevel analysis methods. Sociological Methods & Research. 1994;22(3):283–299. [Google Scholar]
  29. Howard MC, Jacobs RR. The multiphase optimization strategy (MOST) and the sequential multiple assignment randomized trial (SMART): two novel evaluation methods for developing optimal training programs. Journal of Organizational Behavior. 2016 online version. [Google Scholar]
  30. Jacobs MA, Graham AL. Iterative development and evaluation methods of mHealth behavior change interventions. Current Opinion in Psychology. 2016;9:33–37. [Google Scholar]
  31. Karakowsky L, McBey K. Do my contributions matter? The influence of imputed expertise on member involvement and self-evaluations in the work group. Group & Organization Management. 2001;26(1):70–92. [Google Scholar]
  32. Kasari C, Rotheram-Fuller E, Locke J, Gulsrud A. Making the connection: randomized controlled trial of social skills at school for children with autism spectrum disorders. Journal of Child Psychology and Psychiatry. 2012;53:431–439. doi: 10.1111/j.1469-7610.2011.02493.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kenny DA, Bolger N, Kashy DA. Traditional methods for estimating multilevel models. In: Moskowitz DS, Hershberger S, editors. Modeling intraindividual variability with repeated measures data: Method and applications. Englewood Cliffs, NJ: Erlbaum; 2002. pp. 1–24. [Google Scholar]
  34. Kirk R. Experimental design: Procedures for the behavioral sciences. Los Angeles, CA: SAGE; 2003. [Google Scholar]
  35. Kramer TJ, Fleming GP, Mannis SM. Improving face-to-face brainstorming through modeling and facilitation. Small Group Research. 2001;32(5):533–557. [Google Scholar]
  36. Kugler KC, Trail JB, Dziak JJ, Collins LM. Effect coding versus dummy coding in analysis of data from factorial experiments (No. 12-120) University Park, PA: The Methodology Center, Pennsylvania State University; 2012. Accessed at http://methodology.psu.edu/media/techreports/12-120.pdf. [Google Scholar]
  37. Lecomte T, Leclerc C, Corbiere M, Wykes T, Wallace CJ, Spidel A. Group Cognitive Behavior Therapy or Social Skills Training for Individuals With a Recent Onset of Psychosis?: Results of a Randomized Controlled Trial. Journal of Nervous and Mental Disease. 2008;196:866–875. doi: 10.1097/NMD.0b013e31818ee231. [DOI] [PubMed] [Google Scholar]
  38. Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schaabenberger O. SAS(R) for Mixed Models. 2nd. Cary, NC: SAS Institute Inc; 2006. [Google Scholar]
  39. Moerbeek M, Wong WK. Sample size formulae for trials comparing group and individual treatments in a multilevel model. Statistics in Medicine. 2008;27(15):2850–2864. doi: 10.1002/sim.3115. [DOI] [PubMed] [Google Scholar]
  40. Murray DM. Design and analysis of group-randomized trials. Vol. 29. Oxford University Press; 1998. [Google Scholar]
  41. Myers RH, Montgomery DC, Anderson-Cook CM. Response surface methodology: process and product optimization using designed experiments. Hoboken, NJ: Wiley; 2016. [Google Scholar]
  42. Myers JL, Well AD. Research design and statistical analysis. 2nd. Mahwah, NJ: Erlbaum; 2003. [Google Scholar]
  43. Nackers LM, Dubyak PJ, Lu X, Anton SD, Dutton GR, Perri MG. Group dynamics are associated with weight loss in the behavioral treatment of obesity. Obesity. 2015;23(8):1563–1569. doi: 10.1002/oby.21148. [DOI] [PubMed] [Google Scholar]
  44. Noar SM, Benac CN, Harris MS. Does tailoring matter? Meta-analytic review of tailored print health behavior change interventions. Psychological Bulletin. 2007;133:673–93. doi: 10.1037/0033-2909.133.4.673. [DOI] [PubMed] [Google Scholar]
  45. Nye JL. The eye of the follower information processing effects on attributions regarding leaders of small groups. Small Group Research. 2002;33(3):337–360. [Google Scholar]
  46. Oakes JM, Feldman HA. Statistical power for nonequivalent pretest-posttest designs. The impact of change-score versus ANCOVA models. Evaluation Review. 2001;25:3–28. doi: 10.1177/0193841X0102500101. [DOI] [PubMed] [Google Scholar]
  47. Pals SL, Murray DM, Alfano CM, Shadish WR, Hannan PJ, Baker WL. Individually randomized group treatment trials: a critical appraisal of frequently used design and analytic approaches. American Journal of Public Health. 2008;98:1418–1424. doi: 10.2105/AJPH.2007.127027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Pellegrini CA, Hoffman SA, Collins LM, Spring B. Optimization of remotely delivered intensive lifestyle treatment for obesity using the Multiphase Optimization Strategy: Opt-IN study protocol. Contemporary Clinical Trials. 2014;38(2):251–259. doi: 10.1016/j.cct.2014.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Pellegrini CA, Hoffman SA, Collins LM, Spring B. Corrigendum to Optimization of remotely delivered intensive lifestyle treatment for obesity using the multiphase optimization strategy: Opt-IN study protocol. Contemporary Clinical Trials. 2015;45:468–469. doi: 10.1016/j.cct.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Peters GJY, de Bruin M, Crutzen R. Everything should be as simple as possible, but no simpler: towards a protocol for accumulating evidence regarding the active content of health behaviour change interventions. Health Psychology Review. 2015;9(1):1–14. doi: 10.1080/17437199.2013.848409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Peterson B, George SL. Sample size requirements and length of study for testing interaction in a 2 × k factorial design when time-to-failure is the outcome (corrected) Controlled Clinical Trials. 1993;14:511–522. doi: 10.1016/0197-2456(93)90031-8. See also erratum in Controlled Clinical Trials, 15: 326. [DOI] [PubMed] [Google Scholar]
  52. Peterson CB, Mitchell JE, Crow SJ, Crosby RD, Wonderlich SA. The efficacy of self-help group treatment and therapist-led group treatment for binge eating disorder. American Journal of Psychiatry. 2009;166:1347–54. doi: 10.1176/appi.ajp.2009.09030345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Raudenbush SW. Statistical analysis and optimal design for cluster randomized trials. Psychological Methods. 1997;2:173–185. doi: 10.1037/1082-989x.5.2.199. [DOI] [PubMed] [Google Scholar]
  54. Roberts C, Roberts SA. Design and analysis of clinical trials with clustering effects due to treatment. Clinical Trials. 2005;2:153–162. doi: 10.1191/1740774505cn076oa. [DOI] [PubMed] [Google Scholar]
  55. Schulz MS, Cowan CP, Cowan PA. Promoting healthy beginnings: A randomized controlled trial of a preventive intervention to preserve marital quality during the transition to parenthood. Journal of Consulting and Clinical Psychology. 2006;74(1):20–31. doi: 10.1037/0022-006X.74.1.20. [DOI] [PubMed] [Google Scholar]
  56. Slymen DJ, Hovell MF. Cluster versus individual randomization in adolescent tobacco and alcohol studies: illustrations for design decisions. International Journal of Epidemiology. 1997;26(4):765–771. doi: 10.1093/ije/26.4.765. [DOI] [PubMed] [Google Scholar]
  57. Tokola K, Larocque D, Nevalainen J, Oja H. Power, sample size and sampling costs for clustered data. Statistics & Probability Letters. 2011;81:852–860. [Google Scholar]
  58. Valacich JS, Wheeler BC, Mennecke BE, Wachter R. The effects of numerical and logical group size on computer-mediated idea generation. Organizational Behavior and Human Decision Processes. 1995;62(3):318–329. [Google Scholar]
  59. Vickers AJ. The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: a simulation study. BMC Medical Research Methodology. 2001;1:6. doi: 10.1186/1471-2288-1-6. Accessed at http://www.biomedcentral.com/1471-2288/1/6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Wampold BE, Bolt DM. Therapist effects: Clever ways to make them (and everything else) disappear. Psychotherapy Research. 2006;16:184–187. [Google Scholar]
  61. Wilson DK, Kitzman-Ulrich H, Resnicow K, Van Horn ML, George SMS, Siceloff ER, Coulon S. An overview of the Families Improving Together (FIT) for weight loss randomized controlled trial in African American families. Contemporary Clinical Trials. 2015;42:145–157. doi: 10.1016/j.cct.2015.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Wu CFJ, Hamada M. Experiments: Planning, analysis, and parameter design optimization. Vol. 552. New York, NY: Wiley; 2011. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES