Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Jul 19.
Published in final edited form as: J Clin Epidemiol. 2017 Jan 16;83:108–109. doi: 10.1016/j.jclinepi.2016.12.011

“Cross-sectional” stepped wedge designs always reduce the required sample size when there is no time effect

Xin Zhou 1,*, Xiaomei Liao 2, Donna Spiegelman 3
PMCID: PMC5517056  NIHMSID: NIHMS865886  PMID: 28093263

To the Editor

A recent article [1] suggested that the stepped wedge cluster-randomized trial (SWD) is more efficient than a parallel cluster-randomized trial (pCRD). Then, a subsequent letter [2] pointed out that the article [1] incorrectly applied the variance for a cross-sectional SWD to that corresponding to a cohort design and provided a counter example where a cross-sectional SWD requires a larger sample size than a pCRD. To review terms, in a “cross-sectional” SWD, only one measurement of the outcome is taken per subject and new subjects enroll at each new step, whereas in a “cohort” SWD, subjects enroll at the beginning of the study and have repeated measures taken over different steps. The authors of the original article [1] then revised their conclusion in a follow-up letter [3], agreeing that a “cross-sectional” SWD is not always more powerful than a pCRD. The conclusion was drawn from a variance formula which takes into account time effects, as given by Hussey and Hughes [4]. However, time effects, if they exist, do not need to be considered in a pCRD because randomization will, on average, achieve balance between the intervention groups with respect to such effects. Hence, a comparison between an SWD which incorporates time effects and a pCRD which does not might be unfair.

Many SWDs are conducted over a relatively short period, where, even if there were some modest background time trends operating over the long term, they would likely have little or no effect on outcome rates during the relatively short study period. For example, it is unlikely that the 1-year retention and 6-month viral suppression rates will change in any materially important way during the 3-year study period of the MaxART Early Access to ART in Swaziland SWD currently underway [5]. Similarly, 3 years was the duration of one of the largest and most well-known examples of the use of the SWD design, the evaluation of former Harvard School of Public Health’s Dean Julio Frenk’s PROGRESA (Programa de Educación, Salud, y Alimentación) program, launched when Frenk served as Minister of Health for Mexico. PROGRESA was a conditional cash transfer program targeting the country’s poorest families, providing them with financial incentives for uptake of recommended public health and nutrition practices and for keeping children in school, which was evaluated through a staggered roll-out of these policies among 495 communities [6]. Finally, soon to complete in Nepal, Tanzania, and Sri Lanka, an SWD to assess the effectiveness of postpartum intrauterine device insertion as a safe and acceptable means of preventing unintended pregnancy closely following a current birth, the FIGO study, is running for a 1-year enrollment period of a projected 64,000 pregnant women, each with 1.5 years of follow-up [7].

Correlations between intervention status and time are high in SWDs, for example, in FIGO, the empirical correlation between the intervention and time is 0.67, and in MaxART, it is 0.69. Considering a standard SWD with T steps as in Hussey and Hughes [4], it is shown in the Appendix at www.jclinepi.com to this letter that the correlation is T+13(T-1), which is always greater than 0.58. Hence, time trends need to be very small, or the results discussed in the following will not apply, as confounding will otherwise lead to biased estimates.

Following the same notation as Hussey and Hughes [4], we consider a “cross-sectional” SWD with I clusters, T steps, and N participants sampled per cluster per step. Note that N × T is the cluster size, that is, the number of participants in each cluster, and N × T × I is the total sample size of the study. Here, we provide a simplified version of the Var(θ̂) and the design effect for SWDs with a time effect and T steps fit as discrete time effects in Hussey and Hughes’ model [4]. Simplifying Equation (8) in Hussey and Hughes, as shown in the Appendix at www.jclinepi.com to this letter,

Var(θ^)=4Var(Y)NTI(T-1)(1+(NT-1)ρ)(T-2)(23+13N(T+1)ρ1-ρ), (1)

and its corresponding design effect is

DESW=(T-1)(1+(NT-1)ρ)(T-2)(23+13N(T+1)ρ1-ρ).

Above, θ is the treatment effect, Y is the continuous outcome with Var(Y)=σe2+τ2, where σe2 and τ2 are the within- and between- cluster variances, respectively, and ρ=τ2/(σe2+τ2) is the intraclass correlation (ICC). Liao et al. [8] derived a variance formula for the estimated intervention effect in an SWD that applies when time is unrelated to the outcome and thus can be ignored in the analysis. Using the same notation as mentioned previously, but simplifying Liao et al.’s formula, as shown in the Appendix at www.jclinepi.com to this letter,

Var(θ~)=4Var(Y)NTI1+(NT-1)ρ1+23N(T+1)ρ1-ρ, (2)

where the tilde is used to distinguish this estimator from the previous one in Equation (1). Thus, the design effect for a cross-sectional SWD with no time effect is

DESW=1+(NT-1)ρ1+23N(T+1)ρ1-ρ.

Noting that the design effect of a pCRD is DEP = 1 + (NT − 1)·ρ, for fixed cluster size NT, it is easy to see that DESW < DEP. Thus, when there is no time effect, the SWD always requires smaller sample size. We used these design effect formulas in the following illustrative example.

For example, consider a study that would require 950 participants under individual randomization. Next, consider several cluster-randomized alternatives, with an ICC of 0.01:

  1. Under a pCRD with 100 participants per cluster, the design effect is 1.99. The minimum total sample size required would then be 1,891 participants, which would be distributed across 19 clusters, for a total sample size of 1,900.

  2. Under an SWD with no time effect, with 100 participants per cluster as mentioned previously and, for example, five steps, there will be 20 participants per cluster per step, leading to a design effect of 1.10 and a minimum total sample size of 1,045. Because the number of clusters needs to be a multiple of 4, with five steps, this design will have a minimum of 12 clusters, rounding up to a total sample size of 1,200.

  3. Under an SWD with time effects, with five steps and 20 participants per cluster per step as mentioned previously, the design effect is 2.48, using the formula as mentioned previously, a simplification of Hussey and Hughes [4]. The minimum total sample size will then be 2,356, and rounding up to obtain a number of clusters which is a multiple of 4, there will be a minimum of 24 clusters, and a total sample size is 2,400.

In conclusion, when there is no effect of time on the outcome, the SWD is always more efficient than the pCRD. This is most likely to occur in SWDs of short duration, as is often the case, but caution is needed as the correlation between time and intervention is typically high in SWDs.

Supplementary Material

Appendix

Acknowledgments

Funding: This work was supported by the grants NIH/R01AI112339 and NIH/DPES025459.

Footnotes

Contributor Information

Xin Zhou, Departments of Epidemiology and Biostatistics, Harvard T.H. Chan School of Public Health, 677 Hungtington Avenue, Boston, MA 02115, USA.

Xiaomei Liao, AbbVie Inc., Data and Statistical Sciences, 1 North Waukegan Road, North Chicago, IL 60064, USA.

Donna Spiegelman, Departments of Epidemiology and Biostatistics, Harvard T.H. Chan School of Public Health, 677 Hungtington Avenue, Boston, MA 02115, USA.

References

  • 1.Woertman W, de Hoop E, Moerbeek M, Zuidema SU, Gerritsen DL, Teerenstra S. Stepped wedge designs could reduce the required sample size in cluster randomized trials. J Clin Epidemiol. 2013;66:752–8. doi: 10.1016/j.jclinepi.2013.01.009. [DOI] [PubMed] [Google Scholar]
  • 2.Hemming K, Girling A. The efficiency of stepped wedge vs. cluster randomized trials: stepped wedge studies do not always require a smaller sample size. J Clin Epidemiol. 2013;66:1427–8. doi: 10.1016/j.jclinepi.2013.07.007. [DOI] [PubMed] [Google Scholar]
  • 3.de Hoop E, Woertman W, Teerenstra S. The stepped wedge cluster randomized trial always requires fewer clusters but not always fewer measurements, that is, participants than a parallel cluster randomized trial in a cross-sectional design. In Reply J Clin Epidemiol. 2013;66:1428. doi: 10.1016/j.jclinepi.2013.07.008. [DOI] [PubMed] [Google Scholar]
  • 4.Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials. 2007;28:182–91. doi: 10.1016/j.cct.2006.05.007. [DOI] [PubMed] [Google Scholar]
  • 5.Walsh F, Bärnighausen T, Delva W, Fleming Y, Khumalo G, Lejeune C, et al. Impact of early initiation versus national standard of care of antiretroviral therapy in Swaziland’s public sector health system: a stepped-wedge randomized implementation study. 2016 doi: 10.1186/s13063-017-2128-8. Under review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Paul Schultz T. School subsidies for the poor: evaluating the Mexican Progresa poverty program. J Development Econ. 2004;74(1):199–250. [Google Scholar]
  • 7.Canning D, Shah I, Pearson E, Pradhan E, Karra M, Senderowicz L, et al. Institutionalizing postpartum intrauterine device (IUD) services in Sri Lanka, Tanzania, and Nepal: study protocol for a cluster-randomized stepped-wedge trial. BMC Pregnancy Childbirth. 2016;16:362. doi: 10.1186/s12884-016-1160-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Liao XM, Zhou X, Spiegelman D. A note on “Design and analysis of stepped wedge cluster randomized trials”. Contemp Clin Trials. 2015;45:338–9. doi: 10.1016/j.cct.2015.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

RESOURCES