Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jul 1.
Published in final edited form as: Am J Public Health. 2017 May 18;107(7):1078–1086. doi: 10.2105/AJPH.2017.303707

REVIEW OF RECENT METHODOLOGICAL DEVELOPMENTS IN GROUP-RANDOMIZED TRIALS: PART 2 - ANALYSIS

Elizabeth L Turner 1,2, John A Gallis 3,4, Fan Li 5, Melanie Prague 6,7, David M Murray 8
PMCID: PMC5463203  NIHMSID: NIHMS903077  PMID: 28520480

Abstract

In 2004, Murray et al. published a review of methodological developments in both the design and analysis of group-randomized trials (GRTs). Over the last 13 years, there have been many developments in both areas. The goal of the current paper is to review developments in analysis, with a companion paper to focus on developments in design. As a pair, these papers update the 2004 review. This analysis paper includes developments in topics included in the earlier review, such as methods for parallel-arm GRTs, inference for conditional and marginal effects, and new topics including methods to account for multiple levels of clustering and alternative estimation methods such as augmented GEE, targeted maximum likelihood and quadratic inference functions. We also examine developments in dealing with missing outcome data, including doubly robust approaches, software available for analysis, and analysis of alternative group designs (including stepped wedge GRTs, network-randomized trials, pseudo-cluster randomized trials and individually-randomized group treatment trials). These alternative designs, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis.

INTRODUCTION

In a group-randomized trial (GRT), the unit of randomization is a group and outcome measurements are obtained on members of those groups.1 Also called a cluster-randomized trial or community trial,25 a GRT is the best comparative design available if the intervention operates at a group level, manipulates the physical or social environment, cannot be delivered to individual members of the group without substantial risk of contamination, or under other circumstances (e.g., a desire for herd immunity in studies of infectious disease).15

In GRTs, outcomes on members of the same group are likely to be more similar to each other than to outcomes on members from other groups.1 Such clustering must be accounted for in the design to avoid an under-powered study and in the analysis to avoid under-estimated standard errors and inflated type I error for the intervention effect.15 For analysis, regression modeling approaches are generally preferred and most commonly used because of their ease of implementation.6 Several textbooks now address these and other issues.15

In 2004, Murray et al.7 published a review of methodological developments in both the design and analysis of GRTs. In the 13 years since, there have been many developments in both areas. The goal of the current paper is to focus on developments in analytic methods, including those relevant to designs described in a companion paper that focuses on developments in GRT design.8 As a pair, these papers update the 2004 review. With both papers, we seek to provide a broad and comprehensive review to guide the reader to seek out appropriate materials for their own circumstances.

DEVELOPMENTS IN THE ANALYSIS OF PARALLEL GROUP-RANDOMIZED TRIALS

Methods for Superiority, Equivalence, and Non-Inferiority

In GRTs, superiority trials are more common than equivalence or non-inferiority trials: a PubMed search by one of the authors (DMM) of studies published in 2015 identified 562 superiority GRTs but only 1 equivalence GRT and 2 non-inferiority GRTs. Similarly, developments in the methods literature have focused on superiority GRTs, with developments for equivalence and non-inferiority GRTs limited to small sections in two of the more recent textbooks2,5 and a review paper on sample size methods.9 As a consequence, the current review paper focuses on superiority GRTs.

Methods for Intention-To-Treat and Alternative Intervention Effects

In GRTs, protocol violations can lead to non-compliance at either the group- or member-level.5 In order to minimize bias, intention-to-treat (ITT) principles are recommended at both levels rather than “on-treatment” and “per-protocol” analyses.2,4,5 While group-level protocol violations are usually easy to identify, member-level compliance may be more difficult to ascertain in practice.2 Jo et al. demonstrate that analyses which ignore compliance information could be underpowered to detect an ITT effect and propose a multilevel model combined with a mixture model.10 Implications of group-level non-compliance can be considerable in GRTs, given the small number of groups that are randomized in many GRTs.

Methods Based on the Randomization Scheme

Matching or stratification in the design has been recommended for some time as a way to ensure baseline balance on important potential confounders,1 with constrained randomization more recently developed.11 Recent reports suggest that most GRTs follow this advice.1215 Matching and stratification in the design can be ignored in the analysis of intervention effects, without harm to the type I error rate, and often the saved degrees of freedom will improve power.16,17 Recently, Donner et al. reported that ignoring matching can adversely affect other analyses, such as analyses that examine the relationship between a risk factor and an outcome;18 for this reason, investigators considering pair-matching should consider small strata instead (e.g., strata of 4). Li et al.19 compared model-based and permutation methods in the context of constrained randomization adjusting for group-level covariates. They found that both the adjusted F-test and permutation test maintained the nominal size and had improved power under constrained randomization compared to simple randomization.

Model-Based Methods

Model-based methods can be broadly classified according to the interpretation of the model parameters. Conditional model parameters are typically estimated using mixed-effects regression via maximum likelihood estimation (MLE) and are referred to as cluster-specific effects (or as subject-specific effects in the longitudinal analysis literature). Effects are conditional on the random effects used to account for clustering and on other covariates included in the analysis. Conditional models are often recommended for studies focused on change within members or on mediation analyses.7 Parameters of marginal models are usually estimated using generalized estimating equations (GEE).20,21 They define the marginal expectation of the dependent variable as a function of the independent variables and assume that the variance is a function of the mean; they separately specify a working correlation structure for observations made on members of the same group. Marginal models are often preferred for analyses of population-level effects because the intervention effect coefficient is interpreted as a population-averaged effect. In practice, marginal models are less frequently used than conditional models.6

Marginal and conditional intervention effects are equal for identity and log links22 and the distinction between them is only important for link functions such as the logit for binary outcomes. Although some authors have advocated for the log instead of logit link for binary outcomes,23 this approach is not widely used, possibly because of model convergence problems for some data.24,25 Alternatively, a modified Poisson approach with log-link and robust standard errors could be used in the GEE framework,26 since it does not suffer from the same convergence problems as the binomial model with log link,27 but it may be less common because of the familiarity of logistic regression among epidemiologists and biostatisticians.

In practice, the question about which of conditional or marginal effects are desired depends on the research question. It is essential to understand the underlying assumptions of each method: conditional models rely on correct specification of untestable aspects of the data distribution, while marginal models rely on a correct definition of the population of interest, which can make it difficult to generalize results to other populations.28 We address each of the two approaches in more detail below.

Conditional Approaches

If the mixed effects model used to estimate conditional effects is misspecified, the estimates are difficult to interpret and, even if regression diagnostics can help,29 standard errors (SEs) are not robust. Fortunately, Murray et al.30 and Fu31 have shown that mixed models are robust to substantial violation of the normality assumptions for member- and group-level errors, so long as balance is maintained at the group level. Parameter estimation by restricted maximum likelihood estimation (REML) is preferred to MLE when few groups are available.3234 For binary outcomes, alternative methods for specifying the test degrees of freedom have been examined in small sample GRTs and the between-within method is recommended.32,35

Multiple Levels of Clustering in Conditional Models

GRTs may involve multiple levels of clustering due to repeated measures on individuals or groups or additional hierarchical levels in the design. Murray1 distinguished between mixed-effects models based on the number of measurements included in the analysis and recommended mixed-effects analysis of variance (ANOVA) or covariance (ANCOVA), or mixed-effects repeated measures ANOVA/ANCOVA, for analyses involving 1 or 2 measurements per person or per group; those models can account for all sources of random variation in such data if they are properly specified.36 However, that is not the case in analyses involving 3 or more measurements per person or per group, where the sources of random variation may be different; instead, such analyses require a random coefficients model in which random trends and intercepts are calculated for each member (in cohort GRT designs) and group (in cohort and cross-sectional GRT designs), average trends and intercepts are calculated for each study arm, and the intervention effect is the net difference in the average study-arm trends.36 Trends are often estimated as linear slopes, but can take another form.

Variable Group Size in Conditional Models

Johnson et al. focused on the analysis of Gaussian outcomes from GRTs with variable group size.37 They compared ten model-based approaches and found that a one-stage mixed model with Kenward-Roger32 degrees of freedom and unconstrained variance components performed well for GRTs with 14 or more groups per study arm. A two-stage model weighted by the inverse of the estimated theoretical variance of the group means and with unconstrained variance components performed well for GRTs with 6 or more groups per study arm. A number of other models resulted in an inflated type I error rate when there was substantial variability in group size.

Marginal Approaches

When the GEE approach is used to estimate marginal effects, unbiased intervention effects can be estimated even if the working correlation structure is incorrect (e.g. using robust SEs via the sandwich estimator), although precision is increased if the working matrix is correct. Where degrees of freedom are limited for the test of interest, as often happens in GRTs, SE estimation is often biased downward and no method corrects for it in all cases, although several have been proposed.3844

Multiple Levels of Clustering in Marginal Models

While multilevel clustering is easy to account for in mixed-effects regression, there is less literature for the GEE approach. The alternating logistic regression approach45 for binary and ordinal outcomes can be used to account for correlation due to repeated measures on individuals within groups and can be implemented within a GEE framework in both R (the alr package) and SAS (PROC GEE).46 The second-order GEE approach which, in contrast to regular GEE, models the working correlation structure as a function of covariates, can be implemented in R ( geepack in R47).48 For more general working correlation matrices, the user typically needs to perform additional programming in order to provide the appropriate covariance matrix and convergence may not be achieved. In addition, although the intervention effect is unbiased when the marginal model is not correctly specified, the SEs estimated using GEE may be too small. To correct this, a robust sandwich estimator of the variance can be used but such an approach leads to loss of power.49 Because of this accuracy-power trade-off, mixed-effects models may be a better option to deal with GRTs involving more than two levels, although the effects estimated in such models are conditional rather than marginal effects.

Variable Group Size in Marginal Models

Although GEE analysis can accommodate variable group size, informative group size can negatively impact efficiency. In this case, Williamson et al.50 showed that GEE weighted by group size can correct bias in the estimated intervention effect. This approach is equivalent and less computationally demanding than within-cluster resampling.51

Advanced GEE Approaches to Improve Efficiency

For binary outcomes, GEE is more conservative (i.e. the intervention effect will be estimated closer to the null) than mixed-effects models.28,52 Moreover, the SE of the estimated intervention effect is also typically larger when using GEE so that much recent effort has focused on efficient estimation. GEE is most efficient when the true correlation structure of the data is chosen as the working correlation structure. Hin et al. compared multiple selection criteria for the working correlation matrix.53 An alternative approach is augmented GEE (AU-GEE), a method developed for independent data using a causal inference framework,54 which has been extended to clustered data.55 AU-GEE uses covariate information to improve efficiency in a two-stage approach that specifies a model for the potential outcomes under the treatment not received. AU-GEE is unbiased and robust to misspecification of the potential outcome model, though correct specification improves efficiency. As for the analysis of all trials, only baseline covariates should be included in AU-GEE for the analysis of GRT data because adjustment for post-baseline covariates may lead to bias.56 Alternative methods are available to account for post-baseline, time-varying confounding.5759

Alternatives to GEE

The quadratic inference function (QIF) method is an alternative to GEE for the estimation of marginal effects. Song et al.60 demonstrate that QIF has advantages over GEE: it is more efficient and more robust to outliers; it has a goodness-of-fit test of the marginal mean model and permits straightforward extensions to model selection. In large samples, QIF is more efficient than GEE when the working correlation structure for the data is misspecified.61 However, the SEs may be under-estimated for small and medium sample size or for variable group size.62 More recent work by Westgate63,64 provides improvements by using a bias-corrected sandwich covariance estimate and by simultaneously selecting the QIF or GEE while selecting the best working correlation structure.65 Despite the many attractive properties of QIF, at this time there are few applications in public health.6668

A second alternative estimation method is targeted maximum likelihood estimation (tMLE).69 tMLE is a maximum likelihood-based G-computation estimator that targets the fit of the data-generating distribution to reduce bias in the parameter of interest. It is based on a machine learning approach that fluctuates an initial estimate of the conditional mean outcome and minimizes a loss function to provide an estimate of the parameter of interest.70 The approach has been used in public health71,72 and shows much promise for GRTs73,74 because it can improve efficiency by simultaneously accounting for missing data and chance baseline covariate imbalance without committing to a specific functional form.75

Permutation Methods

Permutation analysis was introduced for GRTs by Gail et al. for the COMMIT trial.76 They found that the permutation test had nominal type I and II error rates across a variety of settings common to GRTs, when the member-level errors were Gaussian or binomial, even when very few heterogeneous groups were randomized to each study arm, and even when the ICC was large, so long as there was balance at the level of the group. Murray et al.30 extended this work, showing that unadjusted permutation tests offer no more protection against confounding than unadjusted model-based tests, while the adjusted versions of both tests perform similarly. The permutation test was more powerful than the model-based test when the data were binomial and the ICC≥0.01. Fu31 extended the work to heavy tailed and very skewed distributions and reported similar results.

Li et al. compared model-based and permutation methods in the context of constrained randomization adjusting for group-level covariates. They found that both the adjusted F-test and permutation test maintained the nominal size and had similar power, but cautioned that the randomization distribution must be calculated within the constrained randomization space to prevent inflating the type I error rate.19

DEVELOPMENTS IN THE ANALYSIS OF ALTERNATIVES TO THE PARALLEL GRT

Stepped Wedge GRT

Both between- and within-group information is available to estimate the intervention effect from a stepped wedge group randomized trial (SW-GRT).77,78 However, because the control condition is typically observed earlier than the intervention condition, time is a potential confounder and should be accommodated in the analysis of SW-GRTs, typically by accounting for time as a predictor.79 As for parallel GRTs, clustering by group must be accounted for, and longitudinal measures on individuals can be accommodated within either the mixed-effects or GEE framework, though more easily using mixed-effects models (see both Multiple Levels of Clustering sections). Conditional approaches are more commonly used in practice and reported on in the methods literature.79,80 Several authors have highlighted other characteristics specific to SW-GRT including lagged intervention effects81 and fidelity loss over time.79

Network-Randomized GRT

Because the network properties of a network-randomized GRT are primarily used at the design stage,82 and because they differ from regular GRTs only in the novel way in which groups are defined, the theory on the analysis of parallel-arm GRTs can be applied to parallel-arm network-randomized GRTs.83 For example, in a ring trial of an Ebola vaccine,83 in which a network was defined as all individuals who had regular physical contact with the incident (index) case of Ebola and in which all contacts received the vaccine (placebo or active), standard GRT methods were used. For network-randomized GRTs in which the intervention is not directly administered to all individuals and in which it is expected that the intervention spreads over the network (e.g. the snowball trials of a HIV prevention intervention for drug users84 or a microfinance intervention85), methods86,87 are available to estimate both the direct and indirect effects of the intervention. When network information is available and the outcome of interest is known to be a disseminated process, adjusting for network features such as information on the location of each individual within the network (i.e. group) can improve both the efficiency and power of the analysis.88

Pseudo-Cluster Randomized Trial

Teerenstra et al.89 compared analytic methods for continuous outcomes in pseudo-cluster randomized trials (PCRT) and Campbell and Walters discussed principles in their recent textbook.5 Clustering by the unit of randomization at the first stage (e.g. provider) must be accounted for in both the design and analysis of PCRT. No explicit sample size or analytic methods are known to be available for non-continuous outcomes.

Individually Randomized Group Treatment Trial

Baldwin et al. compared four analytic models for IRGTs and three methods for calculating degrees of freedom.90 A multilevel model adapted to reflect clustering in only one study arm, combined with either Satterthwaite91 or Kenward-Roger32 degrees of freedom, provided better type I error control, better efficiency, and less bias, even with heteroscedasticity at the member level. This finding is consistent with earlier reports by Pals et al.92 and Roberts et al.93 More recently, Roberts & Walwyn94 and Andridge et al.95 considered the circumstance in which members are associated with more than one small group or change agent. Both found that ignoring membership in multiple groups further inflates the type I error rate. Roberts & Walwyn reported that multiple member multilevel models maintained the nominal type I error rate; they also provide sample size and power formulae.94

DEVELOPMENTS TO ADDRESS DATA CHALLENGES

Missing Outcome Data

Two recent reviews6,96 indicate that missing outcome data is common in GRTs, though investigators frequently analyze only available data without accounting for the missing data pattern. When the covariate-dependent missingness (CDM) assumption is plausible, both mixed effects and GEE models provide unbiased estimates of the intervention effect when the CDM covariates are included in an analysis of all available data.97,98 AU-GEE also can provide unbiased effects by including all CDM covariates in the augmentation component55 and has the advantage that all estimates can still be interpreted as marginal effects. Other two-stage approaches such as multiple imputation (MI) or inverse probability weighting (IPW) can provide unbiased intervention effects under certain conditions for more general missing at random (MAR) patterns and may provide increased precision compared to covariate-adjusted conditional or marginal models for CDM.97,99 Although there is less literature on how to deal with missing not-at-random (MNAR) data,100 sensitivity analyses are recommended.101 A recent review showed that very few GRTs performed any sensitivity analyses for their missing data assumptions.6

To avoid possible type I error, MI should account for the clustered data structure.102,103 Fixed group effects should not be used due to reduced power.104 For binary outcomes, Ma et al.105 and Caille et al.106 show that the preferred MI method depends on the number of groups and the design effect, and note that bias may arise for some approaches even for CDM missingness. Using group-specific mean imputation may be adequate for continuous outcomes.98,102 Hossain et al.98 show that if the missing data mechanism has an interaction between a covariate predictive of the outcome and study arm, the imputation strategy must account for this interaction to be unbiased.

Whereas MI requires specifying the distribution of the missing data conditional on covariates, IPW requires specifying the probability of being missing depending on covariates. Theoretically, both approaches can be used for any type of outcome and for both CDM and more general forms of MAR mechanisms.99 While IPW requires an additional assumption of positivity (all participants have a non-zero probability of being observed), it may be viewed as easier to define, particularly in the presence of non-intermittent missingness.107 Importantly, and as for MI, if the missing data mechanism has an interaction between a covariate predictive of the outcome and study arm, the weights must be generated by accounting for this interaction in order to be unbiased.108 Prague et al.109,110 developed a doubly robust estimator in the context of IPW, which provides an unbiased estimate if either the marginal mean model or the missing data model is correctly specified. They demonstrated that a doubly-robust augmented GEE approach can simultaneously account for both CDM and baseline covariate imbalance in GRTs when the parameter of interest is a marginal effect. Combining MI and IPW is a promising new approach which may have superior performance to IPW or MI alone when there are missing covariates in addition to missing outcomes.111

Baseline Imbalance of Covariates

While design strategies such as restricted randomization8 can help to achieve baseline covariate balance, they may not be easy to implement (e.g. if group characteristics are unknown in advance) and chance imbalance may arise regardless. In this case, some form of model-based covariate adjustment could be used such as standard multivariate regression for conditional models or AU-GEE for marginal models.55 The advantage of AU-GEE in this case is that it is doubly robust in that the consistency of intervention effect estimate requires correct specification of either the marginal mean structure or the treatment model, and it separates covariate adjustment from intervention effect estimation thereby reducing the risk of choosing the adjustment models to obtain the most significant results. The standard multivariate regression adjustment approach does not enjoy either of these benefits.

Alternatively, Hansen and Bowers112 proposed a balancing criterion and studied its randomization distribution in order to simultaneously test for balance of multiple covariates in both RCTs and GRTs. Leyrat et al.113 suggested to use the c-statistic of the propensity score model to measure covariate balance at the individual level. Leon et al.114 recommended propensity score matching to correct for baseline imbalance; in a simulation study, they report a median 90% reduction in bias. Nevertheless, the Consolidated Standards for Reporting of Trials (CONSORT)115 recommends that the adjustment covariates be specified a priori for primary analyses so that secondary analyses could test sensitivity of the primary findings to adjustment for covariates identified post hoc.

Software

Table 1 identifies three software programs that can be used to analyze data from GRTs. The table is organized around topics considered in the current paper. While none of the three software programs can readily implement both QIF and tMLE for GRTs, the R program offers the most ready-to-use functionality given its broad applicability to the methods cited in the current paper.

Table 1.

Summary of known functions and procedures to analyze GRTs using methods described in the current review.

Software
Method SAS Stata R
Outcomes analysis of all available data
Mixed-effects models PROC MIXED mixed lme4
PROC NLMIXED melogit nlme
PROC GLIMMIX mepoisson
Generalized estimating equations (GEE) PROC GENMOD1 xtgee geeglm/geeM
Targeted maximum likelihood (tMLE) N/A N/A N/A2
Quadratic inference function (QIF) %qif N/A qif3
Permutation tests %ptest N/A N/A
Accounting for missing outcomes
Multiple imputation for clustered data %mmi_impute4 REALCOM Impute pan
%mmi_analyze mi impute4 jomo5
Inverse probability weighting (IPW) PROC GENMOD6 N/A7 CRTgeeDR
Causal-inference based methods8
Augmented GEE (AU-GEE) N/A N/A CRTgeeDR
Doubly robust AU-GEE N/A N/A CRTgeeDR

Footnotes:

1

. PROC GEE is another option, but is in experimental phase and has limited usefulness for GRTs over and above PROC GENMOD.

2

. In R, tmle is available for tMLE, but at the time of writing, does not allow for clustering.

3

. As of the writing, the authors have been unable to load the package and it only allows equal cluster size, but Westgate has modified the code for GRTs with variable cluster size in the appendix of his paper63

4

. Only useful for continuous outcomes.

5

. In R, mice is available for multiple imputation but at the time of writing, does not account for clustering.

6

. Cannot account for imprecision in the weights.

7

. xtgee cannot accommodate individual-level weights but only group-specific weights.

8

. Both of the listed methods are related: AU-GEE accounts for baseline covariate imbalance and doubly robust AU-GEE, an extension of AU-GEE, accounts for both baseline covariate imbalance and missing data.

N/A: not available at the time of writing.

REPORTING OF RESULTS

The CONSORT guidelines for individually randomized trials were extended to GRTs in 2004115 and most journals now require authors to conform to these guidelines. Based on a review of 300 GRTs published between 2000–2008, Ivers et al. reported that 60% and 70% accounted for clustering in the sample size calculation and in the analysis, respectively, 56% used restricted randomization, and most (86%) allocated more than 4 groups per arm.14 A more recent review of 86 trials published in 2013–2014 showed that 77% and 78% accounted for clustering in the sample size calculation and in the analysis, respectively, and that 51% used some form of restricted randomization.15

Given concerns about the ethical conduct of GRTs,116,117 recent reports on conduct and reporting have focused on the ethics of GRTs. For example, Sim and Dawson discuss the challenges associated with obtaining informed consent in GRTs.118 The Ottawa Statement on the ethical design and conduct of GRTs was published in 2012119 with a reevaluation in 2015.120

DISCUSSION

In this review, we have summarized many of the most important advances in the analysis of GRTs during the 13 years since the publication of the earlier review by Murray et al.7 Many of these developments have focused on developments in marginal model parameter estimation (e.g. augmented GEE, QIF and tMLE) and missing data methods. Some topics that space limitations have prevented include review of recent developments in survival outcomes,2,121125 measurement bias,126,127 validity,128,129 Bayesian methods,4,130132 cost-effectiveness analyses4,133136 and mediation analyses to uncover mechanisms of action.137140

Through this review, we have sought to ensure that the reader is reminded of the value of well-thought out analysis of GRTs and of keeping up to date with the many recent developments in this area. Pairing this knowledge with our companion review of developments in the design of GRTs,8 we hope that our review leads to continued improvements in the design and analysis of GRTs.

Acknowledgments

This work was partly funded by the following National Institutes of Health grants: R01 HD075875, R37 AI51164, R01 AI110478 and K01 MH104310. The authors would like to thank the two anonymous reviewers whose comments greatly helped improve the final version of this manuscript.

APPENDIX: GLOSSARY

Augmented GEE

“Augmenting the standard GEE with a function of baseline covariates.”55 These methods adapt semiparametric theory developed by Robins141 and Robins, Rotnitzky, and Zhao142 for observational studies with time-varying exposures and missing data problems, respectively. They consist of leveraging the estimating equation by a predictor function for counterfactual outcomes under the intervention not received by the group/cluster considered missing.55

Baseline covariate balance

The group-level and individual-level covariate distributions are similar in all study arms.11

Choice of balancing criterion

Li et al. describe several balancing criteria to assess how well a GRT is balanced across covariates. These include the “best balance” (BB) metric of de Hoop et al.143 the balance criterion (B) of Raab and Butcher,11 and the total balance score introduced by Li et al.19

Coefficient of variation

A measure of between-group variation, defined in Table 1 of our companion paper.8

Cohort GRT design

A cohort of individuals is enrolled at baseline and those same individuals are followed up over time.

Constrained randomization

Refers “to those designs that go beyond the basic design constraints to specify classes of randomization outcomes that satisfy certain balancing criteria, while retaining validity of the design.”144

Cross-sectional GRT design

A different set of individuals is obtained at each time point.

Designed balance at the group level

When there are equal numbers of groups randomized to each study arm.

Intraclass correlation

A measure of between-group variation, defined in Table 1 of our companion paper.8

Covariate-dependent missingness (CDM) assumption

The assumption that “missingness in outcomes depends on covariates measured at baseline, but not on the outcome itself.”98

Doubly-robust augmented GEE approach

Combining augmented GEE and IPW, a doubly-robust estimator is obtained, which provides an unbiased estimate if either the marginal mean model or the missing data model is correctly specified.109,110

Equivalence

Assessing whether the new intervention is equivalent to the comparison intervention.

G-computation estimator

A computational method to estimate causal effect in structural nested models. These models are designed to deal with confounding by variables affected by intervention.145

Individually Randomized Group Treatment Trials

“Studies that randomize individuals to study arms but deliver treatments in small groups or through a common change agent.”8,92

Informative cluster size

When the outcome measured is related to the size of the cluster.50

Missing at Random (MAR) assumption

Rubin’s (1976) definition is that “data are missing at random if for each possible value of the parameter φ [the parameter of the conditional distribution of the missing data indicator given the data], the conditional probability of the observed pattern of missing data, given the missing data and the value of the observed data, is the same for all possible values of the missing data.”146

Network-Randomized GRT

“The network-randomized GRT is a novel design that uses network information to address the challenge of potential contamination in GRTs of infectious diseases.”8,82,84,147

Non-inferiority

When a trial is designed to show that the new intervention is not worse than the comparison intervention.

On treatment analyses

When groups are analyzed “according to the intervention they actually received.”2

Per protocol analyses

When groups “not receiving the correct intervention are excluded.”2

Pseudo-cluster randomized trial

Intervention is allocated to individuals in a two-stage process. “In the first stage, providers are randomized to a patient allocation-mix…. In the second stage, patients recruited to the PCRT are individually randomized to intervention or control according to the allocation probability of their provider.”8

Stepped Wedge GRT

“A one-directional crossover GRT in which time is divided into intervals and in which all groups eventually receive the intervention.”8,78

Superiority

When a trial is designed to establish whether a new intervention is superior to the comparison intervention (e.g., another drug, a placebo, enhanced usual care). However, the statistical test is still two-sided, allowing for the possibility that the new intervention is actually worse than the comparison.

Within-cluster resampling

Randomly sample one observation from each cluster, with replacement. Then analyze this resampled dataset. Repeat this process a large number of times. “The within-cluster resampling estimator is constructed as the average” of all of the resample-based estimates (see Hoffman et al.51 pp. 1122-3).

Footnotes

ACCEPTANCE DATE

02/05/2017

CONTRIBUTORS

ELT and DMM initiated the project, developed the outline and topics to be covered, to which all authors agreed. ELT wrote much of the first draft, to which MP, JAG, FL and DMM contributed sections. All authors edited and reviewed the revised manuscript and approved it in its final version.

HUMAN PARTICIPANT PROTECTION

IRB approval was not needed.

Contributor Information

Elizabeth L. Turner, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC; Duke Global Health Institute, Duke University, Durham, North Carolina, USA.

John A. Gallis, Department of Biostatistics and Bioinformatics, Duke University, Durham, NC; Duke Global Health Institute, Duke University, Durham, North Carolina, USA.

Fan Li, Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA.

Melanie Prague, Department of Biostatistics, Harvard T. H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA; Inria, project team SISTM, Bordeaux, France.

David M. Murray, Office of Disease Prevention, Division of Program Coordination and Strategic Planning, Office of the Director, National Institutes of Health, Rockville, Maryland, USA.

References

  • 1.Murray DM. Design and Analysis of Group-Randomized Trials. New York, NY: Oxford University Press; 1998. [Google Scholar]
  • 2.Hayes RJ, Moulton LH. Cluster Randomised Trials. Boca Raton: CRC Press; 2009. [Google Scholar]
  • 3.Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. London: Arnold; 2000. [Google Scholar]
  • 4.Eldridge S, Kerry S. A Practical Guide to Cluster Randomised Trials in Health Services Research. Vol. 120. John Wiley & Sons; 2012. [Google Scholar]
  • 5.Campbell MJ, Walters SJ. How to Design, Analyse and Report Cluster Randomised Trials in Medicine and Health Related Research. Chichester, West Sussex: John Wiley & Sons; 2014. [Google Scholar]
  • 6.Fiero MH, Huang S, Oren E, Bell ML. Statistical analysis and handling of missing data in cluster randomized trials: a systematic review. Trials. 2016;17(1):72. doi: 10.1186/s13063-016-1201-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Murray DM, Varnell SP, Blitstein JL. Design and analysis of group-randomized trials: a review of recent methodological developments. Am J Public Health. 2004;94(3):423–432. doi: 10.2105/ajph.94.3.423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Turner EL, Li F, Gallis JA, Prague M, Murray DM. Review of Recent Methodological Developments in Group-Randomized Trials: Part 1 - Design. Am J Public Health. doi: 10.2105/AJPH.2017.303706. Submitted. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rutterford C, Copas A, Eldridge S. Methods for sample size determination in cluster randomized trials. Int J Epidemiol. 2015;44(3):1051–1067. doi: 10.1093/ije/dyv113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Jo B, Asparouhov T, Muthén BO. Intention-to-treat analysis in cluster randomized trials with noncompliance. Stat Med. 2008;27(27):5565. doi: 10.1002/sim.3370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Raab GM, Butcher I. Balance in cluster randomized trials. Stat Med. 2001;20(3):351–365. doi: 10.1002/1097-0258(20010215)20:3<351::aid-sim797>3.0.co;2-c. [DOI] [PubMed] [Google Scholar]
  • 12.Varnell SP, Murray DM, Janega JB, Blitstein JL. Design and analysis of group-randomized trials: a review of recent practices. Am J Public Health. 2004;94(3):393–399. doi: 10.2105/ajph.94.3.393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Murray DM, Pals SP, Blitstein JL, Alfano CM, Lehman J. Design and analysis of group-randomized trials in cancer: a review of current practices. J Natl Cancer Inst. 2008;100(7):483–491. doi: 10.1093/jnci/djn066. [DOI] [PubMed] [Google Scholar]
  • 14.Ivers NM, Halperin IJ, Barnsley J, et al. Allocation techniques for balance at baseline in cluster randomized trials: a methodological review. Trials. 2012;13:120. doi: 10.1186/1745-6215-13-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fiero M, Huang S, Bell ML. Statistical analysis and handling of missing data in cluster randomised trials: protocol for a systematic review. BMJ Open. 2015;5(5):e007378. doi: 10.1136/bmjopen-2014-007378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Diehr P, Martin DC, Koepsell T, Cheadle A. Breaking the matches in a paired t-test for community interventions when the number of pairs is small. Stat Med. 1995;14(13):1491–1504. doi: 10.1002/sim.4780141309. [DOI] [PubMed] [Google Scholar]
  • 17.Proschan MA. On the distribution of the unpaired t-statistic with paired data. Stat Med. 1996;15(10):1059–1063. doi: 10.1002/(SICI)1097-0258(19960530)15:10<1059::AID-SIM219>3.0.CO;2-2. [DOI] [PubMed] [Google Scholar]
  • 18.Donner A, Taljaard M, Klar N. The merits of breaking the matches: a cautionary tale. Stat Med. 2007;26(9):2036–2051. doi: 10.1002/sim.2662. [DOI] [PubMed] [Google Scholar]
  • 19.Li F, Lokhnygina Y, Murray DM, Heagerty PJ, DeLong ER. An evaluation of constrained randomization for the design and analysis of group-randomized trials. Stat Med. 2015;35(10):1565–1579. doi: 10.1002/sim.6813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. [Google Scholar]
  • 21.Zeger SL, Liang K-Y. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42(1):121–130. [PubMed] [Google Scholar]
  • 22.Ritz J, Spiegelman D. Equivalence of conditional and marginal regression models for clustered and longitudinal data. Stat Methods Med Res. 2004;13(4):309–323. [Google Scholar]
  • 23.Greenland S. Interpretation and choice of effect measures in epidemiologic analyses. Am J Epidemiol. 1987;125(5):761–768. doi: 10.1093/oxfordjournals.aje.a114593. [DOI] [PubMed] [Google Scholar]
  • 24.Blizzard L, Hosmer W. Parameter Estimation and Goodness-of-Fit in Log Binomial Regression. Biom J. 2006;48(1):5–22. doi: 10.1002/bimj.200410165. [DOI] [PubMed] [Google Scholar]
  • 25.Williamson T, Eliasziw M, Fick GH. Log-binomial models: exploring failed convergence. Emerging themes in epidemiology. 2013;10(1):1–10. doi: 10.1186/1742-7622-10-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zou G, Donner A. Extension of the modified Poisson regression model to prospective studies with correlated binary data. Stat Methods Med Res. 2013;22(6):661–670. doi: 10.1177/0962280211427759. [DOI] [PubMed] [Google Scholar]
  • 27.Yelland LN, Salter AB, Ryan P. Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data. Am J Epidemiol. 2011;174(8):984–992. doi: 10.1093/aje/kwr183. [DOI] [PubMed] [Google Scholar]
  • 28.Hubbard AE, Ahern J, Fleischer NL, et al. To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology. 2010;21(4):467–474. doi: 10.1097/EDE.0b013e3181caeb90. [DOI] [PubMed] [Google Scholar]
  • 29.Huang X. Diagnosis of Random-Effect Model Misspecification in Generalized Linear Mixed Models for Binary Response. Biometrics. 2009;65(2):361–368. doi: 10.1111/j.1541-0420.2008.01103.x. [DOI] [PubMed] [Google Scholar]
  • 30.Murray DM, Hannan PJ, Varnell SP, McCowen RG, Baker WL, Blitstein JL. A comparison of permutation and mixed-model regression methods for the analysis of simulated data in the context of a group-randomized trial. Stat Med. 2006;25(3):375–388. doi: 10.1002/sim.2233. [DOI] [PubMed] [Google Scholar]
  • 31.Fu D. A comparison study of general linear mixed moedl and permutation tests in group-randomized trials under non-normal error distributions [Dissertation] Memphis: Statistics, University of Memphis; 2006. [Google Scholar]
  • 32.Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997;53(3):983–997. [PubMed] [Google Scholar]
  • 33.Localio AR, Berlin JA, Have TRT. Longitudinal and repeated cross-sectional cluster-randomization designs using mixed effects regression for binary outcomes: bias and coverage of frequentist and Bayesian methods. Stat Med. 2006;25(16):2720–2736. doi: 10.1002/sim.2428. [DOI] [PubMed] [Google Scholar]
  • 34.Pinheiro JC, Bates DM. Mixed-effects models in S and S-PLUS. New York: Springer; 2000. [Google Scholar]
  • 35.Li P, Redden DT. Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials. BMC Med Res Methodol. 2015;15(1):38. doi: 10.1186/s12874-015-0026-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Murray DM, Hannan PJ, Wolfinger RD, Baker WL, Dwyer JH. Analysis of data from group-randomized trials with repeat observations on the same groups. Stat Med. 1998;17(14):1581–1600. doi: 10.1002/(sici)1097-0258(19980730)17:14<1581::aid-sim864>3.0.co;2-n. [DOI] [PubMed] [Google Scholar]
  • 37.Johnson JL, Kreidler SM, Catellier DJ, Murray DM, Muller KE, Glueck DH. Recommendations for choosing an analysis method that controls Type I error for unbalanced cluster sample designs with Gaussian outcomes. Stat Med. 2015;34(27):3531–3545. doi: 10.1002/sim.6565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.McNeish D, Stapleton LM. Modeling clustered data with very few clusters. Multivariate Behav Res. 2016;51(4):495–518. doi: 10.1080/00273171.2016.1167008. [DOI] [PubMed] [Google Scholar]
  • 39.Li P, Redden DT. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Stat Med. 2015;34(2):281–296. doi: 10.1002/sim.6344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Fay MP, Graubard BI. Small-Sample Adjustments for Wald-Type Tests Using Sandwich Estimators. Biometrics. 2001;57(4):1198–1206. doi: 10.1111/j.0006-341x.2001.01198.x. [DOI] [PubMed] [Google Scholar]
  • 41.Mancl LA, DeRouen TA. A covariance estimator for GEE with improved small-sample properties. Biometrics. 2001;57(1):126–134. doi: 10.1111/j.0006-341x.2001.00126.x. [DOI] [PubMed] [Google Scholar]
  • 42.Morel J, Bokossa M, Neerchal N. Small sample correction for the variance of GEE estimators. Biom J. 2003;45(4):395–409. [Google Scholar]
  • 43.Preisser JS, Lu B, Qaqish BF. Finite sample adjustments in estimating equations and covariance estimators for intracluster correlations. Stat Med. 2008;27(27):5764–5785. doi: 10.1002/sim.3390. [DOI] [PubMed] [Google Scholar]
  • 44.Pan W, Wall MM. Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Stat Med. 2002;21(10):1429–1441. doi: 10.1002/sim.1142. [DOI] [PubMed] [Google Scholar]
  • 45.Carey V, Zeger SL, Diggle P. Modelling multivariate binary data with alternating logistic regressions. Biometrika. 1993;80(3):517–526. [Google Scholar]
  • 46.By K, Qaqish BF, Preisser JS, Perin J, Zink RC. ORTH: R and SAS software for regression models of correlated binary data based on orthogonalized residuals and alternating logistic regressions. Comput Methods Programs Biomed. 2014;113(2):557–568. doi: 10.1016/j.cmpb.2013.09.017. [DOI] [PubMed] [Google Scholar]
  • 47.Halekoh U, Højsgaard S, Yan J. The R package geepack for generalized estimating equations. Journal of Statistical Software. 2006;15(2):1–11. [Google Scholar]
  • 48.Crespi CM, Wong WK, Mishra SI. Using second-order generalized estimating equations to model heterogeneous intraclass correlation in cluster-randomized trials. Stat Med. 2009;28(5):814–827. doi: 10.1002/sim.3518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Teerenstra S, Lu B, Preisser JS, van Achterberg T, Borm GF. Sample size considerations for GEE analyses of three-level cluster randomized trials. Biometrics. 2010;66(4):1230–1237. doi: 10.1111/j.1541-0420.2009.01374.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Williamson JM, Datta S, Satten GA. Marginal analyses of clustered data when cluster size is informative. Biometrics. 2003;59(1):36–42. doi: 10.1111/1541-0420.00005. [DOI] [PubMed] [Google Scholar]
  • 51.Hoffman EB, Sen PK, Weinberg CR. Within-cluster resampling. Biometrika. 2001;88(4):1121–1134. [Google Scholar]
  • 52.Neuhaus JM, Kalbfleisch JD, Hauck WW. A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. Int Stat Rev. 1991;59(1):25–35. [Google Scholar]
  • 53.Hin L-Y, Carey VJ, Wang Y-G. Criteria for working–correlation–structure selection in GEE: Assessment via simulation. Am Stat. 2007;61(4):360–364. [Google Scholar]
  • 54.Tsiatis AA, Davidian M, Zhang M, Lu X. Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Stat Med. 2008;27(23):4658–4677. doi: 10.1002/sim.3113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Stephens AJ, Tchetgen Tchetgen EJ, Gruttola VD. Augmented generalized estimating equations for improving efficiency and validity of estimation in cluster randomized trials by leveraging cluster-level and individual-level covariates. Stat Med. 2012;31(10):915–930. doi: 10.1002/sim.4471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Richiardi L, Bellocco R, Zugna D. Mediation analysis in epidemiology: methods, interpretation and bias. Int J Epidemiol. 2013;42(5):1511–1519. doi: 10.1093/ije/dyt127. [DOI] [PubMed] [Google Scholar]
  • 57.Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc. 1995;90(429):106–121. [Google Scholar]
  • 58.Robins JM, Greenland S, Hu F-C. Estimation of the causal effect of a time-varying exposure on the marginal mean of a repeated binary outcome. J Am Stat Assoc. 1999;94(447):687–700. [Google Scholar]
  • 59.Miglioretti DL, Heagerty PJ. Marginal modeling of multilevel binary data with time-varying covariates. Biostatistics. 2004;5(3):381–398. doi: 10.1093/biostatistics/5.3.381. [DOI] [PubMed] [Google Scholar]
  • 60.Song PXK, Jiang Z, Park E, Qu A. Quadratic inference functions in marginal models for longitudinal data. Stat Med. 2009;28(29):3683–3696. doi: 10.1002/sim.3719. [DOI] [PubMed] [Google Scholar]
  • 61.Khajeh-Kazemi R, Golestan B, Mohammad K, Mahmoudi M, Nedjat S, Pakravan M. Comparison of Generalized Estimating Equations and Quadratic Inference Functions in superior versus inferior Ahmed Glaucoma Valve implantation. J Res Med Sci. 2011;16(3):235–244. [PMC free article] [PubMed] [Google Scholar]
  • 62.Westgate PM, Braun TM. The effect of cluster size imbalance and covariates on the estimation performance of quadratic inference functions. Stat Med. 2012;31(20):2209–2222. doi: 10.1002/sim.5329. [DOI] [PubMed] [Google Scholar]
  • 63.Westgate PM. A bias-corrected covariance estimate for improved inference with quadratic inference functions. Stat Med. 2012;31(29):4003–4022. doi: 10.1002/sim.5479. [DOI] [PubMed] [Google Scholar]
  • 64.Westgate PM. A covariance correction that accounts for correlation estimation to improve finite-sample inference with generalized estimating equations: a study on its applicability with structured correlation matrices. J Stat Comput Simul. 2016;86(10):1891–1900. doi: 10.1080/00949655.2015.1089873. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Westgate PM. Criterion for the simultaneous selection of a working correlation structure and either generalized estimating equations or the quadratic inference function approach. Biom J. 2014;56(3):461–476. doi: 10.1002/bimj.201300098. [DOI] [PubMed] [Google Scholar]
  • 66.Asgari F, Biglarian A, Seifi B, Bakhshi A, Miri HH, Bakhshi E. Using quadratic inference functions to determine the factors associated with obesity: findings from the STEPS Survey in Iran. Ann Epidemiol. 2013;23(9):534–538. doi: 10.1016/j.annepidem.2013.07.006. [DOI] [PubMed] [Google Scholar]
  • 67.Bakhshi E, Etemad K, Seifi B, Mohammad K, Biglarian A, Koohpayehzadeh J. Changes in Obesity Odds Ratio among Iranian Adults, since 2000: Quadratic Inference Functions Method. Comput Math Methods Med. 2016;2016:1–7. doi: 10.1155/2016/7101343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Yang K, Tao L, Mahara G, et al. An association of platelet indices with blood pressure in Beijing adults: Applying quadratic inference function for a longitudinal study. Medicine (Baltimore) 2016;95(39):e4964. doi: 10.1097/MD.0000000000004964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. Springer Science & Business Media; 2003. [Google Scholar]
  • 70.Gruber S, van der Laan MJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int J Biostat. 2010;6(1):1–18. doi: 10.2202/1557-4679.1260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Kotwani P, Balzer L, Kwarisiima D, et al. Evaluating linkage to care for hypertension after community-based screening in rural Uganda. Trop Med Int Health. 2014;19(4):459–468. doi: 10.1111/tmi.12273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Ahern J, Karasek D, Luedtke AR, Bruckner TA, van der Laan MJ. Racial/ethnic differences in the role of childhood adversities for mental disorders among a nationally representative sample of adolescents. Epidemiology. 2016;27(5):697–704. doi: 10.1097/EDE.0000000000000507. [DOI] [PubMed] [Google Scholar]
  • 73.Balzer LB, Petersen ML, van der Laan MJ. Targeted estimation and inference for the sample average treatment effect in trials with and without pair-matching. Stat Med. 2016;35(21):3717–3732. doi: 10.1002/sim.6965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Schnitzer ME, van der Laan MJ, Moodie EE, Platt RW. Effect of breastfeeding on gastrointestinal infection in infants: a targeted maximum likelihood approach for clustered longitudinal data. Ann Appl Stat. 2014;8(2):703–725. doi: 10.1214/14-aoas727. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007;6(1) doi: 10.2202/1544-6115.1309. [DOI] [PubMed] [Google Scholar]
  • 76.Gail MH, Mark SD, Carroll RJ, Green SB, Pee D. On design considerations and randomization-based inference for community intervention trials. Stat Med. 1996;15(11):1069–1092. doi: 10.1002/(SICI)1097-0258(19960615)15:11<1069::AID-SIM220>3.0.CO;2-Q. [DOI] [PubMed] [Google Scholar]
  • 77.Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: rationale, design, analysis, and reporting. BMJ. 2015;350:h391. doi: 10.1136/bmj.h391. [DOI] [PubMed] [Google Scholar]
  • 78.Spiegelman D. Evaluating public health interventions: 2. Stepping up to routine public health evaluation with the stepped wedge design. Am J Public Health. 2016;106(3):453–457. doi: 10.2105/AJPH.2016.303068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Davey C, Hargreaves J, Thompson JA, et al. Analysis and reporting of stepped wedge randomised controlled trials: synthesis and critical appraisal of published studies, 2010 to 2014. Trials. 2015;16(1):358. doi: 10.1186/s13063-015-0838-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Mdege ND, Man M-S, Taylor CA, Torgerson DJ. Systematic review of stepped wedge cluster randomized trials shows that design is particularly used to evaluate interventions during routine implementation. J Clin Epidemiol. 2011;64(9):936–948. doi: 10.1016/j.jclinepi.2010.12.003. [DOI] [PubMed] [Google Scholar]
  • 81.Copas AJ, Lewis JJ, Thompson JA, Davey C, Baio G, Hargreaves JR. Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches. Trials. 2015;16(1):352. doi: 10.1186/s13063-015-0842-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Harling G, Wang R, Onnela J, De Gruttola V. Leveraging contact network structure in the design of cluster randomized trials. Clin Trials. 2016 doi: 10.1177/1740774516673355. [Epub ahead of print] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Ebola ça Suffit Ring Vaccination Trial Consortium. The ring vaccination trial: a novel cluster randomised controlled trial design to evaluate vaccine efficacy and effectiveness during outbreaks, with special reference to Ebola. BMJ. 2015;351:h3740. doi: 10.1136/bmj.h3740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Latkin C, Donnell D, Liu TY, Davey-Rothwell M, Celentano D, Metzger D. The dynamic relationship between social norms and behaviors: the results of an HIV prevention network intervention for injection drug users. Addiction. 2013;108(5):934–943. doi: 10.1111/add.12095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Banerjee A, Chandrasekhar AG, Duflo E, Jackson MO. The diffusion of microfinance. Science. 2013;341(6144) doi: 10.1126/science.1236498. [DOI] [PubMed] [Google Scholar]
  • 86.Ogburn EL, VanderWeele TJ. Causal diagrams for interference. Stat Sci. 2014;29(4):559–578. [Google Scholar]
  • 87.VanderWeele TJ, Tchetgen EJT, Halloran ME. Components of the indirect effect in vaccine trials: identification of contagion and infectiousness effects. Epidemiology. 2012;23(5):751. doi: 10.1097/EDE.0b013e31825fb7a0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Staples P, Prague M, Victor DG, Onnela J-P. Leveraging Contact Network Information in Clustered Randomized Trials of Infectious Processes. arXiv preprint arXiv:1610.00039. 2016 [Google Scholar]
  • 89.Teerenstra S, Moerbeek M, Melis RJ, Borm GF. A comparison of methods to analyse continuous data from pseudo cluster randomized trials. Stat Med. 2007;26(22):4100–4115. doi: 10.1002/sim.2851. [DOI] [PubMed] [Google Scholar]
  • 90.Baldwin SA, Bauer DJ, Stice E, Rohde P. Evaluating models for partially clustered designs. Psychological Methods. 2011;16(2):149–165. doi: 10.1037/a0023464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Satterthwaite FE. An approximate distribution of estimates of variance components. Biometrics. 1946;2(6):110–114. [PubMed] [Google Scholar]
  • 92.Pals SP, Murray DM, Alfano CM, Shadish WR, Hannan PJ, Baker WL. Individually randomized group treatment trials: a critical appraisal of frequently used design and analytic approaches. Am J Public Health. 2008;98(8):1418–1424. doi: 10.2105/AJPH.2007.127027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Roberts C, Roberts SA. Design and analysis of clinical trials with clustering effects due to treatment. Clin Trials. 2005;2(2):152–162. doi: 10.1191/1740774505cn076oa. [DOI] [PubMed] [Google Scholar]
  • 94.Roberts C, Walwyn R. Design and analysis of non-pharmacological treatment trials with multiple therapists per patient. Stat Med. 2013;32(1):81–98. doi: 10.1002/sim.5521. [DOI] [PubMed] [Google Scholar]
  • 95.Andridge RR, Shoben AB, Muller KE, Murray DM. Analytic methods for individually randomized group treatment trials and group-randomized trials when subjects belong to multiple groups. Stat Med. 2014;33(13):2178–2190. doi: 10.1002/sim.6083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Díaz-Ordaz K, Kenward MG, Cohen A, Coleman CL, Eldridge S. Are missing data adequately handled in cluster randomised trials? A systematic review and guidelines. Clin Trials. 2014;11(5):590–600. doi: 10.1177/1740774514537136. [DOI] [PubMed] [Google Scholar]
  • 97.DeSouza CM, Legedza AT, Sankoh AJ. An overview of practical approaches for handling missing data in clinical trials. J Biopharm Stat. 2009;19(6):1055–1073. doi: 10.1080/10543400903242795. [DOI] [PubMed] [Google Scholar]
  • 98.Hossain A, Diaz-Ordaz K, Bartlett JW. Missing continuous outcomes under covariate dependent missingness in cluster randomised trials. Stat Methods Med Res. 2016 doi: 10.1177/0962280216648357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Seaman SR, White IR. Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res. 2013;22(3):278–295. doi: 10.1177/0962280210395740. [DOI] [PubMed] [Google Scholar]
  • 100.Vansteelandt S, Rotnitzky A, Robins J. Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. Biometrika. 2007;94(4):841–860. doi: 10.1093/biomet/asm070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Thabane L, Mbuagbaw L, Zhang S, et al. A tutorial on sensitivity analyses in clinical trials: the what, why, when and how. BMC Med Res Methodol. 2013;13(1):92. doi: 10.1186/1471-2288-13-92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Taljaard M, Donner A, Klar N. Imputation strategies for missing continuous outcomes in cluster randomized trials. Biom J. 2008;50(3):329–345. doi: 10.1002/bimj.200710423. [DOI] [PubMed] [Google Scholar]
  • 103.Ma J, Akhtar-Danesh N, Dolovich L, Thabane L. Imputation strategies for missing binary outcomes in cluster randomized trials. BMC Med Res Methodol. 2011;11(1):18. doi: 10.1186/1471-2288-11-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Andridge RR. Quanitfying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials. Biom J. 2011;53(1):57–74. doi: 10.1002/bimj.201000140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Ma J, Raina P, Beyene J, Thabane L. Comparing the performance of different multiple imputation strategies for missing binary outcomes in cluster randomized trials: a simulation study. J Open Access Med Stat. 2012;2:93–103. [Google Scholar]
  • 106.Caille A, Leyrat C, Giraudeau B. A comparison of imputation strategies in cluster randomized trials with missing binary outcomes. Stat Methods Med Res. 2016;25(6):2650–2669. doi: 10.1177/0962280214530030. [DOI] [PubMed] [Google Scholar]
  • 107.Seaman S, Galati J, Jackson D, Carlin J. What is meant by “missing at random”? Stat Sci. 2013;28(2):257–268. [Google Scholar]
  • 108.Belitser SV, Martens EP, Pestman WR, Groenwold RH, Boer A, Klungel OH. Measuring balance and model selection in propensity score methods. Pharmacoepidemiol Drug Saf. 2011;20(11):1115–1129. doi: 10.1002/pds.2188. [DOI] [PubMed] [Google Scholar]
  • 109.Prague M, Wang R, De Gruttola V. Harvard University Biostatistics Working Paper Series. Harvard University; 2016. CRTgeeDR: An R Package for Doubly Robust Generalized Estimating Equations Estimations in Cluster Randomized Trials with Missing Data. [Google Scholar]
  • 110.Prague M, Wang R, Stephens A, Tchetgen Tchetgen E, DeGruttola V. Accounting for interactions and complex inter-subject dependency in estimating treatment effect in cluster-randomized trials with missing outcomes. Biometrics. 2016;72(4):1066–1077. doi: 10.1111/biom.12519. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Seaman SR, White IR, Copas AJ, Li L. Combining multiple imputation and inverse-probability weighting. Biometrics. 2012;68(1):129–137. doi: 10.1111/j.1541-0420.2011.01666.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Hansen BB, Bowers J. Covariate Balance in Simple, Stratified and Clustered Comparative Studies. Stat Sci. 2008;23(2):219–236. [Google Scholar]
  • 113.Leyrat C, Caille A, Foucher Y, Giraudeau B. Propensity score to detect baseline imbalance in cluster randomized trials: the role of the c-statistic. BMC Med Res Methodol. 2016;16(1):9. doi: 10.1186/s12874-015-0100-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Leon AC, Demirtas H, Li C, Hedeker D. Subject-level matching for imbalance in cluster randomized trials with a small number of clusters. Pharm Stat. 2013;12(5):268–274. doi: 10.1002/pst.1580. [DOI] [PubMed] [Google Scholar]
  • 115.Campbell MK, Elbourne DR, Altman DG. CONSORT statement: extension to cluster randomised trials. Br Med J. 2004;328(7441):702–708. doi: 10.1136/bmj.328.7441.702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Hutton JL. Are distinctive ethical principles required for cluster randomized controlled trials? Stat Med. 2001;20(3):473–488. doi: 10.1002/1097-0258(20010215)20:3<473::aid-sim805>3.0.co;2-d. [DOI] [PubMed] [Google Scholar]
  • 117.Taljaard M, Chaudhry SH, Brehaut JC, et al. Survey of consent practices in cluster randomized trials: improvements are needed in ethical conduct and reporting. Clin Trials. 2014;11(1):60–69. doi: 10.1177/1740774513513658. [DOI] [PubMed] [Google Scholar]
  • 118.Sim J, Dawson A. Informed consent and cluster-randomized trials. Am J Public Health. 2012;102(3):480–485. doi: 10.2105/AJPH.2011.300389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Weijer C, Grimshaw JM, Eccles MP, et al. The Ottawa statement on the ethical design and conduct of cluster randomized trials. PLoS Med. 2012;9(11) doi: 10.1371/journal.pmed.1001346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.van der Graaf R, Koffijberg H, Grobbee DE, et al. The ethics of cluster-randomized trials requires further evaluation: a refinement of the Ottawa Statement. J Clin Epidemiol. 2015;68(9):1108–1114. doi: 10.1016/j.jclinepi.2015.03.013. [DOI] [PubMed] [Google Scholar]
  • 121.Zeng D, Lin D, Lin X. Semiparametric transformation models with random effects for clustered failure time data. Stat Sin. 2008;18(1):355–377. [PMC free article] [PubMed] [Google Scholar]
  • 122.Cai T, Cheng S, Wei L. Semiparametric mixed-effects models for clustered failure time data. J Am Stat Assoc. 2002;97(458):514–522. [Google Scholar]
  • 123.Zhong Y, Cook RJ. Sample size and robust marginal methods for cluster-randomized trials with censored event times. Stat Med. 2015;34(6):901–923. doi: 10.1002/sim.6395. [DOI] [PubMed] [Google Scholar]
  • 124.Zhan Z, de Bock GH, Wiggers T, Heuvel E. The analysis of terminal endpoint events in stepped wedge designs. Stat Med. 2016;35(24):4413–4426. doi: 10.1002/sim.7004. [DOI] [PubMed] [Google Scholar]
  • 125.Xu Z. Statistical Design and Survival Analysis in Cluster Randomized Trials [Dissertation] The University of Michigan; 2011. [Google Scholar]
  • 126.Kramer MS, Martin RM, Sterne JA, Shapiro S, Dahhou M, Platt RW. The double jeopardy of clustered measurement and cluster randomisation. BMJ. 2009;339 doi: 10.1136/bmj.b2900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Cho S-J, Preacher KJ. Measurement Error Correction Formula for Cluster-Level Group Differences in Cluster Randomized and Observational Studies. Educ Psychol Meas. 2016;76(5):771–786. doi: 10.1177/0013164415612255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Eldridge S, Ashby D, Bennett C, Wakelin M, Feder G. Internal and external validity of cluster randomised trials: systematic review of recent trials. BMJ. 2008;336(7649):876–880. doi: 10.1136/bmj.39517.495764.25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Caille A, Kerry S, Tavernier E, Leyrat C, Eldridge S, Giraudeau B. Timeline cluster: a graphical tool to identify risk of bias in cluster randomised trials. BMJ. 2016;354 doi: 10.1136/bmj.i4291. [DOI] [PubMed] [Google Scholar]
  • 130.Ma J, Thabane L, Kaczorowski J, et al. Comparison of Bayesian and classical methods in the analysis of cluster randomized controlled trials with a binary outcome: the Community Hypertension Assessment Trial (CHAT) BMC Med Res Methodol. 2009;9(1):37. doi: 10.1186/1471-2288-9-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Grieve R, Nixon R, Thompson SG. Bayesian hierarchical models for cost-effectiveness analyses that use data from cluster randomized trials. Med Decis Making. 2010;30(2):163–175. doi: 10.1177/0272989X09341752. [DOI] [PubMed] [Google Scholar]
  • 132.Clark AB, Bachmann MO. Bayesian methods of analysis for cluster randomized trials with count outcome data. Stat Med. 2010;29(2):199–209. doi: 10.1002/sim.3747. [DOI] [PubMed] [Google Scholar]
  • 133.Gomes M, Ng ES-W, Grieve R, Nixon R, Carpenter J, Thompson SG. Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials. Med Decis Making. 2012;32(2):350–361. doi: 10.1177/0272989X11418372. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Díaz-Ordaz K, Kenward M, Gomes M, Grieve R. Multiple imputation methods for bivariate outcomes in cluster randomised trials. Stat Med. 2016;35(20):3482–3496. doi: 10.1002/sim.6935. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Ng ES, Diaz-Ordaz K, Grieve R, Nixon RM, Thompson SG, Carpenter JR. Multilevel models for cost-effectiveness analyses that use cluster randomised trial data: an approach to model choice. Stat Methods Med Res. 2013;25(5):2036–2052. doi: 10.1177/0962280213511719. [DOI] [PubMed] [Google Scholar]
  • 136.Díaz-Ordaz K, Kenward MG, Grieve R. Handling missing values in cost effectiveness analyses that use data from cluster randomized trials. J R Stat Soc Ser A Stat Soc. 2014;177(2):457–474. [Google Scholar]
  • 137.Hox JJ, Moerbeek M, Kluytmans A, van de Schoot R. Analyzing indirect effects in cluster randomized trials. The effect of estimation method, number of groups and group sizes on accuracy and power. Front Psychol. 2014;5:78. doi: 10.3389/fpsyg.2014.00078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.MacKinnon DP, Fairchild AJ, Fritz MS. Mediation analysis. Annu Rev Psychol. 2007;58:593–614. doi: 10.1146/annurev.psych.58.110405.085542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Vanderweele TJ, Hong G, Jones SM, Brown JL. Mediation and spillover effects in group-randomized trials: a case study of the 4Rs educational intervention. J Am Stat Assoc. 2013;108(502):469–482. doi: 10.1080/01621459.2013.779832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.VanderWeele TJ. A unification of mediation and interaction: a 4-way decomposition. Epidemiology. 2014;25(5):749–761. doi: 10.1097/EDE.0000000000000121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Robins JM. Marginal structural models versus structural nested models as tools for causal inference. In: Halloran ME, Berry DA, editors. Statistical models in epidemiology, the environment and clinical trials. New York: Springer; 1999. pp. 95–134. [Google Scholar]
  • 142.Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89(427):846–866. [Google Scholar]
  • 143.de Hoop E, Teerenstra S, van Gaal BG, Moerbeek M, Borm GF. The "best balance" allocation led to optimal balance in cluster-controlled trials. J Clin Epidemiol. 2012;65(2):132–137. doi: 10.1016/j.jclinepi.2011.05.006. [DOI] [PubMed] [Google Scholar]
  • 144.Moulton LH. Covariate-based constrained randomization of group-randomized trials. Clin Trials. 2004;1(3):297–305. doi: 10.1191/1740774504cn024oa. [DOI] [PubMed] [Google Scholar]
  • 145.Vansteelandt S, Joffe M. Structural nested models and g-estimation: The partially realized promise. Stat Sci. 2014;29(4):707–731. [Google Scholar]
  • 146.Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–592. [Google Scholar]
  • 147.Staples PC, Ogburn EL, Onnela J-P. Incorporating Contact Network Structure in Cluster Randomized Trials. Sci Rep. 2015;5:17581. doi: 10.1038/srep17581. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES