ABSTRACT
Stepped‐wedge cluster randomized trials (SW‐CRTs) are one‐way crossover trials that randomize clusters (i.e., groups) of individuals to the time point (period) at which an intervention is introduced into the cluster. In these designs, the intervention under evaluation is introduced into all of the clusters by the end of the study in a series of “steps.” Analysis of SW‐CRTs using marginal models provides a population‐averaged interpretation of the estimated intervention effect and flexible specification of the within‐cluster, marginal pairwise association structure; the latter has practical application in reporting intraclass (i.e., pairwise) correlations and calculating power for CRTs. Despite these features, use of marginal modeling of SW‐CRTs has been mostly limited to applications with working independence and simple exchangeable correlation structures that are suboptimal for multi‐period CRTs when correlation among responses decays over time. However, there have been many methodological developments in marginal modeling of SW‐CRTs over the past fifteen years, particularly on (i) multi‐parameter, within‐cluster correlation structures; (ii) paired generalized estimating equations (GEE) for simultaneous estimation of mean and correlation parameters with standard errors; and, when the number of clusters is small, (iii) corrections to reduce the bias of variance estimators, and that of correlation estimates using matrix‐adjusted estimating equations (MAEE). The goal of the current tutorial is to survey these newer developments and to provide case studies to enable applied researchers to implement GEE/MAEE for marginal model analysis of SW‐CRTs, with application to both cohorts and designs with repeated cross‐sectional samples. The methods are also applicable to multi‐period, parallel‐arm and cluster‐crossover CRTs.
Keywords: crossover CRT, exponential decay correlation structure, generalized estimating equations (GEE), intracluster correlation coefficient, matrix‐adjusted estimating equations (MAEE), multi‐period CRT, nested exchangeable correlation structure, small‐sample corrections
1. Introduction
Stepped‐wedge cluster randomized trials (SW‐CRTs) are one‐way crossover trials that randomize clusters (i.e., groups) of individuals to one of several time points (periods) at which an intervention is introduced into the cluster and such that the intervention under evaluation is introduced into all of the clusters by the end of the study in a series of “steps” [1, 2]. The SW‐CRT design is the design of choice when cluster randomization is necessary [3, 4, 5] (e.g., because the intervention has one or more components defined at the cluster‐level) and when there is a strong rationale to deliver the intervention to all clusters by the end of the study, and to do so in a staggered (i.e., stepped) manner [6, 7, 8]. For example, in the case that an intervention is to be rolled out to all hospitals within a health system as part of routine care, the SW‐CRT design offers a rigorous randomized evaluation of that intervention. Despite the advantages of the stepped‐wedge design, it is well‐known that unless time effects are correctly adjusted for in analysis, estimates of intervention effects can be biased, in contrast to other CRT designs such as the standard parallel‐arm CRT [7, 8].
SW‐CRT designs can be classified according to the sampling strategy adopted. This classification includes the cross‐sectional design for which different individuals provide data at each time period, and the cohort design, in which the same individuals are followed over time, contributing repeated measures at the rate of one (or none) per period [8]. Whatever the chosen sampling strategy, an additional important feature is whether or not the SW‐CRT is “complete” in the sense that all clusters are measured in all periods; an example of a complete design appears in Figure 1, with 40 clusters, 8 in each of 5 sequences, and with 5 steps [9]. In practice, many SW‐CRTs have incomplete data structures either by design or by the nature of the data available in the conduct of the trial. For example, in the latter case, even if the intention in Figure 1 was to enroll participants in every cluster‐period, it may not be possible to do so in practice or once the study is in the field. Reasons for selecting an incomplete design are varied. For example, in implementation science, logistical, resource and patient‐centered considerations may require the intentional use of incomplete SW‐CRT designs. Specific considerations often include the need for the staggered start of data collection across clusters and/or an implementation phase to roll out the intervention preceding follow‐up data collection in the intervention condition [9]. In other trials, an incomplete SW‐CRT design may be chosen when resource constraints place a cap on the total number of cluster‐periods and/or study participants, leaving possibly many potential incomplete and complete designs for consideration. Finally, a maintenance phase is sometimes defined following an active intervention phase of a pre‐specified duration to evaluate the enduring impact of the intervention condition, relative to control, after its active support has ceased. This part of the study design may be incomplete because there may be a fixed number of periods of observation under the maintenance condition for each sequence. For example, if the design in Figure 1 were adjusted such that all sequences had five periods of observation under active intervention, and all had two additional periods of observation under a maintenance condition, there would be periods with no data collection at the end of each of sequences 1–4.
FIGURE 1.

Example of a complete SW‐CRT with 40 clusters and 5 sequences.
Design and analysis of specific SW‐CRTs must accommodate key features of the general design, including pairwise correlations (i.e., clustering) of individuals in the same cluster, possible changes in pairwise correlations as a function of time lag (with differences for cross‐sectional and cohort designs), and the potential confounding effect of time due to the staggered rollout of the intervention, whereby the control condition is measured, on average, earlier in calendar time than the intervention condition. To accommodate correlated participant‐level outcomes, SW‐CRT analysis can be performed within either a marginal modeling or a conditional modeling (random‐effects) framework (sometimes referred to as a cluster‐specific modeling approach), typically via generalized estimating equations (GEE) and generalized linear mixed models (GLMM), respectively. Most methodological research on SW‐CRTs has focused on development of methods for analysis of continuous outcomes within the GLMM framework, with some extensions to binary and count outcomes, primarily for complete designs [10, 11].
Despite a strong focus on conditional models in the existing SW‐CRT literature, there are several important caveats to the use of the GLMM approach. First, for most conditional models, the interpretation of the intervention effect changes according to different specifications of the GLMM's latent random‐effects structure, for example, whether a random time‐slope is included in addition to the usual random intercept. Second, while GLMMs are flexible in terms of accounting for the correlation between observations within clusters via random effects, with few exceptions, they do not directly describe the pattern and magnitude of the intracluster correlation coefficient (ICC), on the natural measurement scale of the outcomes. This is partially because exact expressions for the marginal mean and the marginal within‐cluster correlation structure for CRTs and SW‐CRTs are generally lacking—that is, are not of a closed analytic form—for non‐identity link GLMMs [12, 13, 14]. Yet, providing estimates and confidence intervals for ICC parameters is valuable, as highlighted by the CONSORT statement on reporting of SW‐CRTs, so as to facilitate uniform interpretation and the design of future studies [15].
In contrast, there are several benefits to the use of the marginal modeling framework for both design and analysis of SW‐CRTs. First, marginal models carry a population‐average interpretation such that the intervention effect contrasts the average response across the subsets of the population defined by the treated cluster‐periods with the corresponding average across the control cluster‐periods [14, 16]. Moreover, because models for the mean and correlation structures are separately specified, the interpretation of the marginal mean regression parameters remains the same regardless of the working correlation model specification. The link function is chosen to obtain inference on the target parameters of choice, for example, for binary outcomes, this could be the odds ratio via the logit link or the risk ratio via the log link. A second advantage in using GEE for SW‐CRTs is that the estimation of mean model parameters is robust to misspecification of the ICC structure in large samples [17], and, when implemented with bias‐corrected variance estimators [18, 19], on smaller samples too, including, in SW‐CRTs with a small number of clusters (shown in a simulation study contained in a paper regarding the evaluation of empirical power for SW‐CRTs) [20]. A third advantage is that when GEE is paired with a second set of estimating equations for the correlation parameters using an approach known as matrix‐adjusted estimating equations (MAEE) [19], the marginal modeling approach aligns with the CONSORT recommendation [15] to provide bias‐corrected estimates and confidence intervals for ICC parameters on the natural scale of the outcome, as these can be obtained directly from the estimation procedure.
Despite the benefits of marginal modeling and GEE, there are several caveats of note. First, GEE procedures may fail to converge to a solution for binary responses when natural bounds on pairwise within‐cluster correlations in non‐independence working correlation matrices are violated [21, 22]. Second, although both the GLMM and GEE approaches are valid when data follow a covariate‐dependent missing data pattern and the covariates are included in the model, unlike the GLMM approach, the GEE approach is not valid under general patterns of data that are missing at random [23]. (We note that this is particularly relevant to cohort SW‐CRTs for which general MAR patterns are more likely to arise and for which methods such as weighted GEE can help to remove bias from estimated intervention effects [24, 25]). Third, as noted above, although the GEE approach is robust to misspecification of the ICC structure, use of a working correlation matrix that differs from the true correlation matrix results in loss of efficiency.
The advantages of marginal models have motivated recent developments in GEE methods for SW‐CRTs. For example, recent methodological developments for planning stepped‐wedge trials include a non‐simulation, fast power calculation method for continuous, binary and count outcomes in complete and incomplete designs [26] based on a general power procedure for marginal models [27]. These methods, based on Wald tests for GEE analysis, encompass a variety of link functions and within‐cluster correlation structures and reduce to closed‐form sample size formula in some balanced designs [20, 28]. Fast GEE power has been implemented for stepped‐wedge designs in multiple software platforms [29, 30] with the SAS macro CRTFASTGEEPWR [31] being the most flexible in its inclusion of incomplete designs; a comprehensive software review is provided elsewhere [32]. Developments for marginal model‐based study design also include a generalized information criterion for use at the design phase to identify periods and cluster‐sequences that most contribute to intervention effect estimation [33, 34], and locally optimal and maximin optimal stepped‐wedge designs given a budget constraint [35]. To complement this recent work on study design tools, recent methodological developments for marginal model analysis of complete and incomplete SW‐CRTs include computationally‐efficient marginal modeling via cluster‐period summary statistics [36], the study of implications of unequal cluster size [37], and regression diagnostics to identify clusters and cluster‐periods that are most influential in analysis [38]. Moreover, to facilitate implementation of the marginal modeling approach for the analysis of SW‐CRTs, a series of software packages and code is available (see details provided with the Case Studies in Section 4 below).
Despite the recent developments in marginal models for SW‐CRTs, only a limited number of published SW‐CRTs have considered GEE methods for analysis, likely due to the inaccessibility of more advanced GEE methods. For instance, Barker et al. [39] reviewed 102 SW‐CRTs published no later than 2015 and found only 13 trials considered GEE in analysis of the primary outcomes. Nevins et al. [40] provided a more recent review of 160 published SW‐CRTs from 2016 to 2022, and they found that only 12 studies (7.5%) adopted GEE for their primary analyses. In addition, few studies have considered a more complex correlation structure beyond simple exchangeable, and few studies have adopted small‐sample adjustments appropriate for a limited number of clusters. In light of the gaps between methodology development and practice, the current tutorial focuses on, and provides examples and tools to analyze SW‐CRTs in the situation where the researcher has chosen to use the marginal modeling framework. In particular, we emphasize and illustrate analysis via the paired GEE/MAEE (matrix‐adjusted estimating equations) approach, an extension of “traditional” GEE. The paired GEE/MAEE approach specifies two sets of estimating equations (the mean model and the correlation model) and provides several additional benefits over “traditional” GEE, including correlation parameter estimates and their standard errors with reduced bias.
The current tutorial is comprehensive in its coverage of issues in the analysis of SW‐CRTs as it includes diversity across outcome types, mean model specification—including both time and intervention effects encoded in the mean model—and correlation model specification for both cross‐sectional and cohort designs. In particular, we include two examples of cross‐sectional designs as these are most common in practice [39, 41], together with an example of a cohort design [42]. We also touch on modes of design “incompleteness,” including periods (phases) of no enrollment, of intervention implementation, and of maintenance and cover small‐sample corrections to standard errors (SEs) of both mean and correlation parameters. Given the space needed to provide the appropriate level of detail for analysis, readers are referred to other sources for study design [20, 26, 31, 35]. Our paper is targeted to applied (bio)statisticians working in this area, and we have included references to support or expand on needed theory.
The manuscript is organized as follows: Section 2 describes three motivating data sets that we will use to illustrate the analysis methods and choices, Section 3 introduces the paired GEE/MAEE approach for analysis, Section 4 provides case studies of complete and incomplete SW‐CRTs, and Section 5 provides reflections on the examples presented in the current tutorial together with a summary of key notions not covered in the current tutorial. Throughout, our goal is to highlight the many benefits of the marginal modeling framework, how the approach can facilitate valid and meaningful estimates about intervention effects on participant‐level outcomes and, when the researcher chooses the marginal framework, how to implement the approach.
2. Three Motivating SW‐CRTs
We illustrate marginal modeling methods for SW‐CRTs via data from two cross‐sectional SW‐CRTs in rehabilitation care and primary care, respectively, and a cohort SW‐CRT in a public health setting: The Connect‐Home, Heart Health Now and Crowdsourced HIV Testing studies.
2.1. Connect‐Home
The Connect‐Home study, conducted from 2019 to 2021, used an incomplete SW‐CRT design of a form that is also referred to as a staircase design [43]. The goal of the study was to test an intervention to improve outcomes for rehabilitation patients transitioning to home from six skilled nursing facilities (SNFs) [44]. Patients were followed for primary endpoints for sixty days following discharge from the SNF. An incomplete SW‐CRT design was chosen to balance considerations of internal validity and power under the restrictions imposed by available resources and logistical considerations (Figure 2). In particular, although all clusters were enrolled and randomized at the same time at the start of the study, a staggered start of study activities across SNFs was used due to limited research staff, and no data were collected during the implementation phase of the intervention condition (green boxes, with planned duration of two periods, i.e., two months) in order to allow for training of SNF staff on the intervention. The black and green boxes represent cluster‐periods where no patients were enrolled, and thus no data collected, yielding an incomplete design. Data were collected during (grey) control periods and (blue) intervention periods. In reality, the COVID‐19 pandemic forced additional design changes (summarized elsewhere) [45]. Given that these forced changes considerably altered the planned study design and that participants did not provide consent for data sharing, the current tutorial uses a simulated data set based on some key features of the real data set (i.e., mean baseline outcome levels, cluster‐period sizes and other related features) but assuming that the data were collected according to the originally planned schedule. Specific details of the simulation scheme and code to generate the simulated data set can be found in the Supporting Information and via https://github.com/XueqiWang/SW‐CRT_tutorial.
FIGURE 2.

Connect‐Home cross‐sectional SW‐CRT designed to be conducted over 22 periods (months) in 6 skilled nursing facilities (SNFs). Cell counts represent the cluster‐period size of the simulated data set used for analysis in Section 4.1. Cell counts were randomly‐generated to emulate variance that would arise from a planned study in the course of real‐world implementation.
2.2. Heart Health Now
The Heart Health Now (HHN) study, conducted from 2015 to 2018, was designed as a complete, stratified, SW‐CRT. The goal of the study was to evaluate the effect of primary care practice support on evidence‐based cardiovascular disease (CVD) prevention, organizational change, and patient outcomes obtained via electronic health records in 219 practices [46]. Practices were randomized to receive the intervention within two strata defined by high (the first three sequences, Figure 3) or low (last three sequences) “readiness for change.” Because of the logistical challenges of initiating data collection based on electronic health records (EHR) in a large number of practices at the same time, the HHN study was designed so that baseline control EHR patient visit data used retrospective collection from one to five periods (grey boxes) for each practice prior to intervention roll‐out. Note, however, staggered initiation of retrospective data collection would not necessarily result in incomplete data since retrospective data collection of EHR is used to potentially capture all data for a practice back to a fixed date. Although this study technically involves an underlying open‐cohort of patients, due to data regulations no patient identifiers were available and therefore data cannot be linked over time. Instead, the data were analyzed as aggregated cross‐sectional data in the published manuscript [46] and is the approach we must adopt here.
FIGURE 3.

Heart Health Now SW‐CRT conducted over 11 periods (quarters) in 219 practices, of which 217 practices provided data. Data in each sequence‐period cell are number of practices that provided “screened for smoking” data and total number of patients included in the denominator, that is, the sequence‐period size for the HHN analysis in Section 4.2.
In contrast to the Connect Home study, the HHN study was designed to be complete but, as indicated in Figure 3, the data structure was incomplete as data were not available in all periods for all clusters because of challenges due to technical integration with EHRs. We can see that in sequence 2, for example, wherein data were recorded from fewer than 27 clusters in periods 6–11. After 12 months (four three‐month periods) in the active intervention phase (light blue boxes), each practice entered a more passive maintenance phase (blue boxes). Several patient‐centered outcomes were obtained including a binomial (numerator/denominator) measure of the proportion of patients who were screened for smoking in the past twelve months including the given quarter, implying approximately 75% overlap of patients from adjacent quarters (i.e., time lag‐1); 50% overlap for quarters with second order time‐lag, and 25% overlap for quarters with a time‐lag of 3 (e.g., 2nd and fifth quarters). Additionally, there could be some overlap among patients from quarters over a year apart from different clinic visits that could not be linked. To illustrate analysis of a real data set from a SW‐CRT, the aggregated HHN study data are made available and via https://github.com/XueqiWang/SW‐CRT_tutorial [47]. Therefore, consistent with other SW‐CRTs that use repeated cross‐sectional samples that may have overlap, we consider this example as a cross‐sectional design in the current tutorial. Furthermore, we note that, in reality, delays were experienced in the timing of the transition from control to intervention and from intervention to maintenance periods in some practices. As such, in reality, some practices transitioned according to a different schedule than the one intended, that is, that shown in Figure 3. For the purposes of the current tutorial, we analyze the data under an intention‐to‐treat framework, according to the intended schedule, recognizing that this may lead to attentuation of estimated intervention effects.
2.3. Crowdsourced HIV Testing
The Crowdsourced HIV Testing Study, conducted from 2016 to 2017 in China, sought to evaluate the effectiveness of a crowdsourced HIV testing intervention to reduce transmission between men who have sex with men (MSM) [42]. The study followed a closed‐cohort stepped‐wedge design of 1381 participants across eight clusters (cities). In closed‐cohort designs, participants are enrolled only at the beginning of the trial whereas enrollment in open‐cohort designs extends beyond the first period and may continue throughout the trial [48]. Randomization of clusters to treatment sequences was stratified by province with four clusters in each of two provinces of China (Guangdong and Shandong). Unlike in Figures 2 and 3 where all clusters are initially in the control condition, some clusters (Guangzhou and Yantai) are in the intervention condition in the first period as shown in Figure 4. The “active” intervention was delivered in one period followed by zero to three post‐intervention or “maintenance” periods. The primary outcome of interest was the participant‐level binary outcome of HIV testing (either self‐testing or from a clinic) versus no testing in the prior three months, with four follow‐up periods each of three‐months' duration. Given that MSM participants were eligible only if they had not undertaken HIV testing in the prior three months, there is no participant‐level outcome “baseline” measure (i.e., prior to Period 1). A complete design was used whereby all clusters (and all participants within those clusters) were invited to participate at all four follow‐up timepoints. In reality, as is common with cohort designs, there was participant‐level missing data at each follow‐up time point. Using the data published as an open‐access supplement to the main manuscript, and in consultation with a study co‐author (Lu Haidong), we derived the primary outcome for each of the four follow‐up time points. As in the published manuscript, we focus on analysis of 1219 participants who have the primary outcome available in at least one of the four follow‐up time periods.
FIGURE 4.

Crowdsourced HIV Testing SW‐CRT conducted over 4 periods (quarters) in 8 cities in China. Data in each cluster‐period cell are number of participants that provided “screened for HIV” data and the percentage of enrolled participants out of the total cluster enrollment numbers provided in the first column of data.
2.4. Data Set Summary
The Connect‐Home data set, specifically the simulated data set, will serve as the leading demonstration platform for three key reasons: (1) It enables us to illustrate the marginal modeling approach for individual‐level patient data in a SW‐CRT with a continuous outcome in a repeated cross‐sectional study; (2) with just six clusters, it clearly demonstrates the need–as is quite common–for small‐sample corrections to standard errors (SEs); and, (3) it is modeled on the data set used to illustrate the simulation‐free, fast GEE power methodology developed by Zhang et al. [26], and therefore to facilitate a link between the current tutorial on methods for analysis and methods for study design. The Heart Health Now study will be used to complement the marginal modeling features demonstrated via the Connect‐Home study. In particular, it will be used to illustrate (1) analysis of a binary outcome; (2) the analysis of aggregated outcome data through use of a cluster‐period mean (i.e., cluster‐period proportion) marginal modeling approach because the data set is so large that analysis of individual‐level data poses computational challenges; and, (3) the modeling of a maintenance intervention phase. Finally, the HIV Testing Study complements the two previous cross‐sectional designs with a cohort design for a binary outcome. Whereas the marginal mean model specification–including for the treatment effect is similar to those for the cross‐sectional designs, considerations for the correlation model are different, and this example serves to elaborate on and illustrate those issues.
3. Marginal Models and Statistical Inference
Marginal modeling is typically undertaken via the generalized estimating equations (GEE) approach of Liang and Zeger, whereby parameters of a mean model are estimated using the GEE, and correlation parameters are estimated using the methods of moments [12, 17]. Here we consider an extension of this approach in two key ways. The first extension we consider is the addition of a second set of matrix‐adjusted estimating equations used to estimate parameters of a correlation model (see details below) [19]. The benefit of pairing MAEE with GEE is that the pairwise‐correlation structure is itself directly estimated via a set of estimating equations, usually with a non‐identity working covariance structure, rather than via methods of moments in traditional GEE. This approach also provides estimates of standard errors of the pairwise correlation parameters and, as such, supports the reporting of ICC estimates with quantification of uncertainty [28]. ICC reporting facilitates the planning of future SW‐CRTs and is recommended by the CONSORT statement on reporting of SW‐CRTs [15]. The paired GEE/MAEE approach is inspired by methodology introduced by Prentice [49], whose approach was extended by Sharples and Breslow to apply a scalar‐adjustment correction within the estimating equations to correct for the small‐sample bias expected in the correlation parameters estimates [50], and then further extended by Preisser et al. in GEE/MAEE to provide more effective bias correction to the correlation estimates, with application to parallel‐arm CRTs [19]. The second extension we consider is to employ small‐sample corrections to the empirical (so‐called robust, or sandwich) variance estimators for both the mean and the correlation model parameters. Prior work has shown that the combination of MAEE estimation for the correlation parameters, along with small‐sample corrections to the empirical standard errors, yields superior statistical performance than conventional GEE methods when the goal is testing and estimating both intervention and correlation parameters in parallel‐arm [19, 51], multi‐level [52, 53], crossover [54], and stepped‐wedge CRTs [20, 26, 36, 55].
3.1. Model Specification
3.1.1. Marginal Mean Model
We first consider a SW‐CRT (either cross‐sectional or closed‐cohort) with clusters, sequences, a total of periods in which data are collected from at least one of the clusters and such that clusters are allocated to the th sequence, so that . To flexibly accommodate complete and incomplete designs, as well as those with implementation periods such as the Connect‐Home study (Figure 2) or maintenance periods such as the HHN study (Figure 3), we use to denote the number of periods in which the th cluster has data collected, such that , and use to denote the cluster‐period sample size, specifically the number of observations in the th period of observation of the th cluster, so that is the total number of observations in the th cluster, and we use to denote the total number of observations across all clusters. In a cross‐sectional design will also represent the number of participants in the th cluster and the total number of participants (equivalently observations) across all clusters. In a closed‐cohort design, if there is no drop‐out then every participant is observed in every period in which the th cluster is represented, which is also equal to the number of participants in the th cluster, and the total number of participants is . We note that, the software described below in Section 3.4 allow for unequal cluster‐period sizes for cohort designs in general.
Putting the notation in context using the planned Connect‐Home study as an example, we note that this cross‐sectional SW‐CRT was designed (Figure 2) such that data would be collected from 15 periods (out of a planned total of = 22) for each of the 6 clusters (SNFs), and that those periods differed for each SNF. Table 1 provides details of the notation for each cluster (and therefore, for each sequence, given that there is a single cluster in each sequence). As a specific example, the intention for the 3rd SNF was for data collection in study periods corresponding to timepoints that would be indexed by (where we highlight the points at which there was a planned gap in data collection by explicitly denoting the set with the point at which the split occurs, namely ). Importantly, notice that because the intention was to collect data from all 6 clusters in exactly 15 out of the total of 22 periods (Figure 2), the timepoints of observation for each of the 6 clusters have the same set of indices, namely , but the intended study periods of observation were different for each cluster.
TABLE 1.
Model notation for the Connect‐Home study.
| Sequence | Timepoint indices, | Calendar periods | Control periods | Intervention periods | Total periods | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|||||||
| 1 |
|
|
1 | 5 | 5 | 8 | 17 | 10 | 15 | ||||
| 2 |
|
|
2 | 7 | 6 | 10 | 18 | 9 | 15 | ||||
| 3 |
|
|
3 | 9 | 7 | 12 | 19 | 8 | 15 | ||||
| 4 |
|
|
4 | 11 | 8 | 14 | 20 | 7 | 15 | ||||
| 5 |
|
|
5 | 13 | 9 | 16 | 21 | 6 | 15 | ||||
| 6 |
|
|
6 | 15 | 10 | 18 | 22 | 5 | 15 | ||||
For a cross‐sectional design, let and denote the outcome and corresponding marginal mean response, respectively, for individual in cluster at the th timepoint (). Similarly, in a closed‐cohort design, let and denote the outcome and mean for cohort member (participant) in cluster at the th timepoint (). The marginal mean model is specified via a generalized linear model
| (1) |
where is a link function, is a vector of covariates and is a vector of regression parameters. We note that the “systematic” component of the marginal model (i.e., the vector of covariates) generally takes the form of (time effect) (intervention effect). We do not consider individual‐level covariates in this article, but if we did, an additional subscript (for cross‐sectional designs) or (for closed‐cohort) could be added to the covariate vector.
For example, the categorical time, marginal mean model with intervention effect can be specified as follows:
| (2) |
(or similarly for ), where represents the calendar period corresponding to the th design timepoint for the th cluster (e.g., in the Connect‐Home study), is the th period effect, is the treatment condition (e.g., an indicator variable for intervention), and is the time‐adjusted marginal intervention effect on the link function scale. We note that in this model, each time period has its own distinct intercept parameter. In this example, is a vector and is a vector of zeros except for two non‐zero terms, specifically, 1 as the th element, corresponding to the th period intercept and the st element , the intervention indicator. In addition to the marginal mean model, the marginal variance of () is a function of the marginal mean and is denoted for cross‐sectional designs by , where is the dispersion parameter and is a function of the marginal mean, with analogous specification for closed‐cohort designs. Common forms of link function of marginal mean model (1) include identity for continuous outcomes and logit for binary outcomes. Common dispersion and variance functions are 1 and the constant variance, respectively, for continuous outcomes, and and 1, respectively, for binary outcomes. Risk differences and risk ratios can be modeled directly via identity and log link functions, respectively; however use of the logit link function for effects with odds ratio interpretations is generally less prone to computational problems when modeled with the aforementioned Bernoulli variance function.
A common specification of the marginal mean model (Equation (2)) specifies a constant intervention effect, in which the full treatment effect is assumed to occur in the first period after treatment initiation and thereafter remains constant over time, that is, if the th cluster receives intervention in the th period, and 0 otherwise. Alternative terminology for “constant intervention effect” includes “immediate intervention effect” [56] and, sometimes, “average intervention effect.” We note that, based on work by Kenny et al. [56] the “average intervention” effects terminology can be misleading in the case that the underlying intervention effect varies with time and, as such, it is preferable to avoid the use of “average intervention effect” in this context.
Multiple alternative forms of the marginal mean model (2) can be specified, in particular with adjustments to either the specification of the time and/or the intervention effects. Table 2 provides several common specifications, where we note that some specifications may not be appropriate for a given design. For example, when there are few clusters and/or many time points (e.g., in the Connect‐Home study), the data may not be able to suitably accommodate categorical time effects; in such cases, a linear or some other parametric (often smooth) form for the time effect is required. For example, analysis of a SW‐CRT of a decision‐making intervention for breast cancer controlled time using linear and quadratic terms for wave, in addition to an indicator for a cluster‐period being in the post‐COVID19 era [58]. Whereas the extended incremental intervention effect model (Table 2) is applicable when there is a maintenance period, as in the Heart Health Now study (Figure 3), it may also be applicable in other trials with at least three intervention periods, as illustrated below for Connect‐Home (see Section 4.1). Even so, given that, for most trials, the analysis plan, including the model form for the intervention effect, should be pre‐specified, we caution against overly‐elaborate forms for this model component, rather erring on the side of simplicity both for interpretation and model fitting reasons. Importantly, we note that the examples of different forms of the mean model given in Table 2, are not specific to the marginal modeling framework per se. Rather these forms could also be specified within a mixed effects modeling framework with the inclusion of random effects to account for within‐cluster association. Indeed, the constant intervention effect model form with categorical time (1) has been described extensively in the methodological literature on mixed effects models [7, 8, 10] and is one of the most commonly used approaches in practice [40].
TABLE 2.
Mean model specifications of the intervention effect and the time effect components.
| Intervention effect | ||
|---|---|---|
| Type | Specification of | Notes |
| Constant | if th cluster in period | Assumes intervention effect attained |
| in intervention; 0 in control periods | within first period it is implemented | |
| Incremental | , for control | Allows for “linear” increase in |
| periods before intervention periods in the th cluster | intervention effect. Requires at least | |
| and where and for control periods | one cluster with at least two | |
| intervention periods | ||
| Extended incremental | Three piecewise conjoined lines with full | Allows for multiple periods |
| treatment effect after periods: | where the full intervention | |
| , for controls periods; | effect is maintained | |
| , for | (having zero slope) | |
| for intervention periods; | after it is attained prior | |
| and for maintenance periods, . | to period . | |
| Other | Context‐dependent | E.g., curvilinear form [57], |
| or general time‐on‐treatment [56] | ||
| Time effect | |||
|---|---|---|---|
| Type | Form | Notes | |
| Categorical |
|
Note that corresponds to the | |
| Linear |
|
th time period in the th cluster | |
| Other | Context‐dependent, for example, two disjoint lines | See published Connect‐Home results [45] | |
3.1.2. Marginal Correlation Model
The paired estimating equations GEE/MAEE methodology accommodates an expansive class of within‐cluster correlation structures for the analysis of clustered outcomes including SW‐CRTs. This flexible class of structures is specified through a generalized linear model for pairwise correlations, where we note that there are many more pairwise correlations than there are outcomes to be modeled in the marginal mean model. In particular, for cluster there are pairs to be modeled, where is the total number of observations in the th cluster, as defined above. For cross‐sectional designs, letting , where either or and for closed‐cohort designs, letting , where if generalized linear models can be specified as
| (3) |
where is a user‐specified link function, (cross‐sectional designs) or (cohort) are vectors of covariates related to the pairs of observations, and is the corresponding set of correlation parameters. For example, in the nested exchangeable cross‐sectional correlation structure, one element of would be an indicator (i.e., equal one) for two observations for which , while another element would indicate For multi‐period CRTs including stepped‐wedge designs, working correlation structures usually consider only cluster‐period level features and, although not in this article, cluster‐level features.
We consider two correlation structures for cross‐sectional SW‐CRTs as extensions of the common exchangeable correlation model: the nested exchangeable (NE) and the exponential decay (ED) correlation structures. These structures specify the form of all pairwise correlations of outcomes within the same cluster, for pairs in the same period () and for pairs from different periods (), as shown in Table 3 (two rows labeled “Cross‐sectional”). Both of these structures are expressed in our formulation as generalized linear models with choice of link function affecting correlation structure and interpretation and/or sampling variance estimation. Sometimes, as in the case of exchangeable or nested exchangeable, different link functions correspond to the same correlation structure, but provide different sampling variance estimators according to the link function scale. Specifically, the choice of Fisher's ‐transformation link over the identity link for the nested exchangeable structure in Table 3 is motivated by the desire for confidence intervals for ICCs with improved small sample properties. For other correlation structures, a non‐identity link function may be required in order to express the ICC structure in the general linear form of Equation (3). In particular, a log link function linearizes the exponential decay structure [59], , to allow estimation of its two parameters by GEE/MAEE, where is the decay in correlation for measurements one period apart. For any non‐identity link function, back‐transformation of the components of is required to obtain estimates of the ICC parameters on their natural scale (and by extension, lower and upper confidence bounds for them), in this case, and (Table 3).
TABLE 3.
Marginal correlation models for pairs of outcomes in the same cluster suitable for multi‐period cluster randomized trials including those with stepped‐wedge designs.
| Structure | Model forms for pairwise correlations a | Intracluster correlations | ||||
|---|---|---|---|---|---|---|
| Same period | Different period | |||||
| Exchangeable (EX) |
|
|
|
|||
| Cross‐sectional correlation structures | ||||||
| Nested exchangeable (NE) a |
|
|
|
|||
|
|
|
|
||||
| Exponential decay (ED) b |
|
|
|
|||
| where | ||||||
| Closed‐cohort correlation structures | ||||||
| Block |
|
|
, | |||
| exchangeable (BE) c | , | |||||
|
|
|
|
||||
|
|
||||||
| General proportional |
|
|
|
|||
| decay (GPD) b , d | for () | where | ||||
| and | ||||||
In the NE model, and ; inclusion of both terms provides an intercept‐free model whose parameters directly define two ICCs. Two alternate link function choices are given; the first entry is the identity link function whereas the second entry is the Fisher's ‐transformation link [38], where we note, for the latter, this function is sometimes scaled by .
For the multiplicative ED and PD (see footnote d) correlation structures, linearization is achieved via use of the log link.
In the BE model, , , and ; inclusion of all three terms provides an intercept‐free model whose parameters directly define three ICCs. See also notea.
In the GPD model [30], , , and ; a special case is the two‐parameter proportional decay (PD) model with (implying , that is, the same rate of correlation decay for within‐ and between‐participant pairs).
By analogy and extension, we also consider two correlation structures for closed‐cohort SW‐CRTs. The block exchangeable (BE) structure varies on the nested exchangeable model by providing for one correlation value for two observations in the same cluster and period ( and ), which per force requires them to be on different participants, and two correlations for pairs of observations in different periods (): One each for pairs on different () and on the same () participant, this last parameter being a new addition in the block exchangeable over the nested exchangeable model. The proportional decay (PD) correlation structure is the same as the exponential decay structure for pairs of observations on different participants () [55]. In addition, the PD model borrows the exponential decay parameter from the longitudinal sequence on different participants, so that the correlation at lag is within participants. A general proportional decay (GPD) structure separates the lag parameter into and when participants are the same () or different (); [30] see Table 3. Other details are analogous to those for the foregoing cross‐sectional models.
Importantly, as for the example specifications of the marginal mean model given in Table 2, the five specifications of the pairwise correlations of outcomes provided in Table 3 have all been described previously for linear mixed models, with those correlation formulae induced by the random effects specification of the conditional mean model. More generally, given there is a single link function for both fixed and random effects in mixed models, the discussion of link functions uniquely specified for modeling correlations does not apply to mixed models (e.g., the identity and Fisher's identity link function for the nested exchangeable and the block exchangeable correlation structures in marginal models.) For mixed models with non‐identity link functions, exact expressions for pairwise correlations are only available in special cases [13, 14]. In summary, while the specifications of functional forms of intervention and period fixed effects in Table 2 are broadly applicable to marginal and conditional models, the marginal correlation formulae in Table 3 tend to be limited to linear, that is, identity link, mixed models. We note too that ICCs are obtained directly under models adopting the identity link function, with transformation needed for models with non‐identity link function (e.g., the Fisher's Z transformation).
3.2. Estimation via Paired GEE/MAEE
While our primary focus is inference on the intervention effect parameter and the pairwise within‐cluster correlations, estimation pertains to the entirety of the vector in Equation (2) and the parameters of the correlation model in Equation (3). In this section, we describe estimation of and by iteratively reweighted least squares. This is achieved via a set of paired GEE/MAEE estimating equations with technical details presented below [19, 20, 26, 38].
3.2.1. Mean Model Estimation
Overview. Focusing first on the mean model (1), the main idea of GEE is as follows. For each cluster construct a linear combination over or of the mean‐centered responses or , to sum over , and to solve for after setting the result to zero. Noting that, because each component of the GEE is of the “observed‐minus‐expected” form, the resulting estimator, is generally consistent (i.e., unbiased in large samples) so long as the model for the mean, namely, or is correct. The weights in the sum in Equation (4) arise from a matrix product comprised of functions of the parameters the link function , the variance function and the specified working correlation structure.
Additional notation. For ease of exposition, we will just present details in terms of cross‐sectional designs and therefore for indices ; analogous results for the closed‐cohort design involve nothing more than replacing with and with Taking the marginal mean model in Equation (2) as an example (i.e., intervention effect with categorical time), we denote the vector of the time period parameters together with the intervention effect parameter by . For each cluster , the vectors of outcomes and of mean responses are and , respectively. Furthermore, is a matrix consisting of identical columns as the model in Equation (2), without individual‐level covariates. Also, has row vectors of zeros and one row vector of 1's among the first rows. Specifically, the latter is the th row, corresponding to the th period intercept. The st row of has , the intervention indicator; it equals 1 for all vector elements if cluster receives intervention during period and 0 for all elements otherwise. As such, the matrix of cluster‐period mean model predictors (including intervention indicators) for cluster is given by .
estimating equations. The mean model GEE estimator can be obtained by solving the following ‐estimating equations
| (4) |
where is the derivative matrix, is the working covariance matrix, where is a ‐dimensional diagonal matrix with diagonal elements equal to the marginal variances, , and is the working correlation matrix for [12, 17]. Note that we have used the “tilde” notation here, as the working correlation (or working covariance) structure does not necessarily equal the true underlying correlation (or covariance) structure, . As described in Li et al. [20] and considering that , computational savings can be obtained in solving these ‐estimating equations, especially with large cluster sizes, whenever we can obtain an analytic inverse of any correlation matrix, , from Table 3. For example, the NE correlation matrix can be expressed as a linear combination of simple basis matrices and has a closed‐form analytic inverse [54], in which case there is no need for numeric inversion.
3.2.2. Correlation Model Estimation
Overview. Paired with GEE for the mean, estimation for the within‐cluster correlation parameters is based on the pairwise products of standardized residuals. In the standard implementation of Prentice [49], these residuals take the form , so that the centered pairwise products, namely
| (5) |
have mean zero. As with the mean model, the estimating equations for the parameters of the correlation model are constructed by taking, within each cluster , linear combinations of (5) over , summing over , and, for given solving for [19]. Generally, if both the mean model and the correlation model are correctly specified, then the resulting estimator will be consistent. This consistency is, however, based on having a large number of clusters; as we have stated, a major challenge in cluster randomized trials is that very often the number of clusters is rather modest or even small. Whereas ‐estimation turns out to be fairly robust to a small cluster count, estimation is less so; the reason for the sensitivity to the number of clusters in estimation is that replacing the true with the estimated introduces bias at the level of each single standardized residual, We describe more details as to how.
Finite‐sample bias in estimation of . When considering estimation of the correlation parameters , it is important to note that the consistency of is maintained as long as a consistent estimator is used, even if that estimator is not consistent for the true value of [12, 17]. In traditional GEE [12, 17], as noted above, such an estimator is obtained via the method of moments. However, such an approach is limited in that it does not directly provide confidence intervals for the correlation parameters and exhibits poor statistical performance, and, in practice, there is limited flexibility within existing software as to which working correlation structures can be specified. Moreover, by correctly modeling the working correlation structure, more efficient estimation of the parameters can be achieved [60]. As such, estimation of parameters of the correlation model (3) is performed via a second independent set of estimating equations.
Approaches to account for finite‐sample bias in estimation of . To address the problem of finite‐sample bias, small sample bias‐corrected correlation estimators have been proposed and investigated. For the scalar‐adjusted estimating equations approach (SAEE, which we note is in contrast to the standard unadjusted approach which we refer to as the uncorrected estimating equations [UEE] approach), the estimated residuals are replaced with a Studentized version, where is the th component of , the “hat”‐matrix for the th cluster. As in ordinary least squares regression, is less biased than the naively standardized version. Improving further on this, matrix‐adjusted estimating equations (MAEE) have been proposed that jointly Studentize the entire vector using at a heuristic level the cluster‐level inverse of . It has been shown that the MAEE estimators for have less bias than do the SAEE estimators, which in turn do better than UEE estimators that formalize the standard method‐of‐moments approach [19].
Additional notation. Before presenting the ‐estimating equations, we must define some additional terms. We introduce a single index to represent the double index for observation indexed by Thus, we consider outcome and corresponding marginal mean response for the ‐th observation in the th cluster. Let be defined as above, be the Pearson residual for the ‐th observation and () for pairs of observations in the same cluster (where we note that, in practice, this may be either at the same period or different period ) with mean and variance (see Equation 6), which is defined elsewhere [49, 61]. We further define , the number of unique pairs of observations in cluster , the vector and , where .
estimating equations. The correlation model estimator can be obtained by solving the following ‐estimating equations
| (6) |
where is the derivative matrix, is a diagonal working covariance matrix for the vector and is a (potentially bias‐corrected) function of the cross‐products of Pearson residuals [19]. When , referred to as “uncorrected estimating equations” (UEE), it has been shown that may be biased in small samples (e.g., ) [19]. As a consequence, the “matrix‐adjusted estimating equations” (MAEE) approach sets , where corresponds to the ‐th row of and is the ‐th column vector of the outer product of the uncorrected residuals, In this way, the residual vector is Studentized by the “hat” matrix through . We do not provide further detail on SAEE because in our experience, MAEE almost always outperforms SAEE, and software implementing SAEE also implements MAEE as well.
3.3. Inference via Paired GEE/MAEE
Inference for and builds from the fact that, as the number of clusters increases, and each converge to a multivariate normal distribution with mean given by a random vector of 0 and covariance given either by the model‐based variance or sandwich variance. In order to provide more details, we focus here on , with an easy parallel in the details for [19].
3.3.1. Variance Estimation and Small‐Sample Corrections
The model‐based variance for is given by that was defined in Section 3.2.2. We further define a class of empirical sandwich variance estimators for given where
| (7) |
where and are defined in Section 3.2.1. Matrices for represent multiplicative factors for small‐sample bias correction in variance estimation. The “hat” matrix plays a key role such that we consider the small‐sample bias corrections:
(BC0) corresponds to the usual sandwich variance estimator, for example, Liang and Zeger [17];
(BC1) corresponds to the Kauermann‐Carroll (KC) variance estimator [62];
(BC2) corresponds to the Mancl‐DeRouen (MD) variance estimator [63].
We note that the usual sandwich variance estimators (using BC0) are known to underestimate the variance of the intervention effect [19, 20]. A third small‐sample bias corrected variance estimator introduced by Fay and Graubard has a different structure than in Equation (7) and is available in the SAS macro GEEMAEE as “BC3”; also see Wang et al. [53] for an alternative expression of that also unifies the Fay and Graubard bias correction. In simulations, the BC3 estimator was found to provide similar power as BC1 under complete SW designs [18, 64]. However, BC1 had higher power than BC3 in simulations while controlling the Type I error rate under the incomplete SW design in Figure 5 with 6 or 12 clusters [26]. The lower empirical power and overly conservative Type I error of BC3, as compared to BC1, was also shown in simulations for multi‐level designs [53].
FIGURE 5.

Coding of the marginal mean model for analysis of the simulated data set of the Connect‐Home cross‐sectional SW‐CRT shown by cluster‐period. Panel A: Coefficient (i.e., ) of based on the linear time model given by ; Panel B: Coefficients of under incremental (the selected coding), extended incremental and constant intervention effect models, respectively. Note that, for this data set, the extended incremental effects model assumed that the full intervention effect was attained at the 5th period of intervention (i.e., the minimal number of intervention periods across all 6 clusters) and was sustained beyond, whereas the incremental effects model assumed that the full intervention effect was only attained at the 10th period of intervention (i.e., the maximal number of intervention periods across all 6 clusters).
Empirical bias‐corrected variance estimators for are also available [19, 38]. Here the bias‐correction factor involves a second “hat” matrix, this one defined from matrix components of the ‐estimating equations in Equation (6) as , where . Analagous to the bias‐correction methods of the sandwich variance estimator for , the bias‐correction terms are:
(BC0) provide the uncorrected sandwich variance estimator, for example, as in Prentice [49]
(BC1) based on the Kauermann‐Carroll (KC) variance estimator [62];
(BC2) based on the Mancl‐DeRouen (MD) variance estimator [63].
The factor pre‐multiplies in the bias‐corrected sandwich variance formulae for . However, the variance formula for are more complicated than as they involve information from both ‐ and ‐ estimating equations, Equations (4) and (6), respectively. Thus, we refer the reader to the references cited above [19, 38]. Importantly, for the data analyst, published empirical work supports use of BC1 estimator for in the marginal mean model and BC2 for in the pairwise correlation models [19, 36].
3.3.2. Inference
Inference for a single component of either parameter vector or , or for a linear contrast of components, can be performed using a Wald ‐statistic with or degrees of freedom, respectively, where and in the case as considered in the present tutorial, where we include only variables that vary between cluster or cluster‐periods (but not across participants within a cluster) [20, 26]. This approach has been shown to have good performance in the finite‐sample settings common in many SW‐CRTs (e.g., with ), so long as an appropriate small‐sample adjusted estimator of the SE, discussed above, is used. For example, inference for in the categorical‐time, constant intervention effect marginal mean model (1), is performed using test statistic with degrees of freedom in the “small‐sample” setting.
In a similar fashion, hypothesis tests and confidence intervals on the components of would also be on the scale of the link function for the ‐estimating equation. For example, with either the identity link or the Fisher's ‐transformation link function for the NE correlation model, a confidence interval for the degree to which the within‐cluster correlation exceeds the between cluster correlation would be constructed as
whereas hypothesis tests could be constructed, in general, for the SW‐CRT design, we do not recommend a specification of the working correlation structure solely based on testing for , as power may generally be limited for small . Rather, we recommend that the choice of correlation structure, at least in the primary analysis of the primary outcome, be pre‐specified, although sensitivity analyses can be performed in secondary analysis with alternative correlation structure specifications.
Regarding the appropriate small‐sample adjusted estimator of the SE to use when combined with df given by or degrees of freedom for inference for elements of the vector or , respectively, previous simulation findings have suggested that the preferred small‐sample correction to elements of including , is Kauermann‐Carroll (BC1) in most settings [20, 26, 64], with this adjustment being selected among the following possibilities: Regular unadjusted robust SEs (BC0), Kauermann‐Carroll (BC1) [62], Mancl‐DeRouen (BC2) [63], and Fay‐Graubard (BC3) [65]. (See details of the form of these corrections under Equation (7) above.) Relatedly, the Mancl‐DeRouen (BC2) correction is typically preferred to correct for small‐sample bias of sandwich variance estimators of parameters in elements of the correlation vector [19, 36]. In contrast, for “large‐sample” settings, such as the Heart Health Now study example (i.e., with > 200), based inference is acceptable using standard, robust standard errors (i.e., BC0). As such, analysis may be performed via the standard GEE/UEE approach, even though there is no harm asymptotically in adopting the GEE/MAEE for large‐sample analysis. With regard to the degrees of freedom for the test in the small‐sample setting, we note that some simulations have found that the df specification for testing parameters of the vector can be conservative with an extremely small number of clusters and have instead opted for borrowing the default from parallel‐arm CRTs [36, 55, 66, 67]. Similar considerations support use of instead of for the degrees of freedom of approximate t‐distributions for correlation model parameter estimates for small . How to consistently and correctly specify the appropriate df for all SW‐CRTs remains an open question.
3.4. Implementation in Software
In practice, unless the user wishes to program the estimating equations and optimization approach themselves, the paired GEE/MAEE approach can be implemented using the GEEMAEE macro available in SAS/IML software and through functions in the geeCRT package available in R, respectively. In their current versions (2.04 for GEEMAEE and 1.1.3 for geeCRT), there are some overlapping and distinct capabilities. In particular, analysis of individual‐level data for both cross‐sectional and cohort data for three types of outcome variables (continuous, binary and count) is available in both GEEMAEE and geeCRT, whereas analysis of cluster‐period mean data is currently available only in geeCRT and only for cross‐sectional binomial outcomes. Both softwares provide correlation variance estimation through paired estimating equations, and options for finite‐sample bias‐corrected correlation estimation (with MAEE) and bias‐corrected variance estimation for marginal mean and correlation model parameter estimates [38].
Both the SAS macro GEEMAEE and R package geeCRT allow for general specification of the marginal mean model in Equation (1) through user provision of the explanatory variables for all observations in the dataset, in other words, the cluster design matrices (). The user should prepare the data such that all values are numeric without missing values. Distinctions between the capabilities of the respective softwares pertain to (i) whether the marginal mean model is specified at the individual observation versus cluster‐period mean level and (ii) their respective capacities for correlation model specification.
3.4.1. Analysis of Individual‐Level Data With Standard Correlation Structures
The GEEMAEE SAS macro has readily accessible built‐in options for choice among the five marginal correlation structures in Table 3, where we note that the basic exchangeable structure is rarely recommended for SW‐CRTs. We also note that the proportional decay structure that is a special case of the general proportional decay (GPD) structure is available as a built‐in option in the GEEMAEE macro but GPD is not. Rather additional programming is needed for the GPD or other within‐cluster pairwise correlation structures to use the SAS macro as described in Section 3.4.3 (where it is also noted that additional programming is always required for individual‐level analysis in the geemaee function within the geeCRT package for any correlation structure). The first case study in Section 4.1 illustrates an analysis of the Connect‐Home trial using the SAS macro GEEMAEE with built‐in options for standard correlation structures. Corresponding R code using functions from the geeCRT package, together with the additional code required to specify standard correlation structures, is provided in Supporting Information.
3.4.2. Analysis of Cluster‐Period Mean Data
At the time of writing, the cluster‐period mean analysis approach of Li et al. [36] is available only with the R geeCRT package via the cpgeeSWD function. Moreover, this approach is currently only available for cross‐sectional binomial‐like data. In this approach, the aggregated cluster‐period outcome means (i.e., proportions) have overdispersed (scaled) binomial variances and within‐cluster covariances that are induced by the specified underlying individual‐level data correlation model. The R function cpgeeSWD is provided to estimate the model depending upon choice of NE or ED correlation structure from (Table 3) with argument corstr = ne or corstr = exp_decay, respectively. In an adaptation of Equations (4) and (6), a paired estimating equations method for cluster‐period means is employed with closed form updates for the estimates of the correlation parameters computed at each iteration. The second case study in Section 4.2 presents examples of cluster‐period analysis with geeCRT for the HHN data analysis.
3.4.3. Analysis of Individual‐Level Data With Flexibly Specified Correlation Models
In the analysis of individual‐level outcomes from cross‐sectional and cohort SW‐CRTs, there are occasions when the data analyst would like to specify a within‐cluster correlation structure other than one from among the five standard structures in Table 3. To implement this analysis approach, the user is cautioned that additional effort and understanding of the data structure needed for the correlation model in Section 3.1.2 is required. The general approach is to specify a generalized linear model for the correlation structure along with a dataset containing values of the covariates for all within‐cluster pairs of observations (i.e., matrices or in Equation (3)). Users of the SAS macro GEEMAEE should use options ZDATA, ZVAR and ZPAIR to, respectively, pass the dataset containing the correlation model covariates, list the covariates by SAS variable name, and provide indices for the identifiers for all observations pairs in each cluster. Additionally, use of GEEMAEE requires specification of the link function for the correlation model via the option CORRLINK. The approach using SAS is detailed in Section 4.3 for the third case study that analyzes the crowdsourced HIV testing data. In regards to R programming, capabilities within the geemaee function are similar to those of GEEMAEE in that some programming is required to define the correlation matrix. The user's effort is rewarded with much greater scope for correlation model specification for both cross‐sectional and cohort multi‐period CRT individual level data than is available for cluster‐period analysis. Examples are provided in R vignettes that accompany the geeCRT package.
4. Using Marginal Models to Analyze SW‐CRTs: Case Studies
The primary goal of the current section is to present, via illustrative examples of three SW‐CRTs, the types of design features one will encounter, the ensuing analytic decisions one will face, and the types of analyses one can perform via the paired estimating equations approach. For each study, the presentation considers whether a complete versus incomplete design was used, the presence or absence of an implementation phase, and the presence or absence of a maintenance phase. We illustrate both mean and correlation model specification, and the rationale behind various choices. In terms of approach, we use individual‐level GEE/MAEE for the two “small” studies, namely the Connect‐Home study (see Figure 2) and the Crowdsourced HIV Testing study (see Figure 4) and use cluster‐period mean (summary) GEE/UEE analysis for the “large” Heart Health Now study (see Figure 3). An additional focus is on comparison of the estimation procedure (that is, GEE/MAEE versus GEE/UEE, with the former required for analysis of the “small” Connect‐Home and Crowdsourced HIV Testing data set but with minimal difference for the “large” HHN data set). As for the Connect‐Home example, individual‐level GEE/MAEE is used for the Crowdsourced HIV Testing study in Figure 4 to highlight additional correlation models that can be considered for closed‐cohort data and for an individual‐level binary outcome, in contrast to the cluster‐period aggregate binomial data available in the HHN example. Data and source code are available to reproduce the results of all three Case Studies via GitHub at https://github.com/XueqiWang/SW‐CRT_tutorial.
4.1. Connect‐Home Analysis
4.1.1. Analysis Overview
We focus here on analysis of a continuous outcome in the simulated Connect‐Home data set with analysis conducted using the SAS GEEMAEE macro version 2.04 [38], implemented using SAS version 9.4. Additionally, we provide code in Supporting Information to implement these analyses in R. As noted in Section 2.1 above, because we are unable to share the real data set with this tutorial, we chose to analyze a simulated data set so that the reader may reproduce the findings presented here. Moreover, because of COVID‐19, there were many disruptions to the actual study which caused highly irregular timing of data collection contrary to the planned study design (summarized elsewhere) [45]. As such, focusing on the actual study data would have made it difficult to meet our overarching aim of conveying some fundamental messages about analysis of SW‐CRTs using the paired GEE/MAEE approach. We further note, as above in Section 2.1, that the simulated outcome data were based on outcomes of the real data set, where we note that those outcomes were actually collected under a modified schedule that resulted from COVID‐19 disruptions (as described in Toles et al. [45]). Instead, we mapped features of those outcomes (e.g., mean and variance) to the planned schedule of measurements like that shown in Figure 2. More specifically, for this simulated data set (henceforth referred to as “the Connect‐Home data set” and which we speak about as if a real data set), the outcome of interest is the Preparedness for Caregiving Scale (PCS) measured at 7 days post‐discharge for the primary caregiver of each patient participant [68]. The PCS is an 8‐item, Likert‐scaled measure with scores ranging from 0 to 32, with higher scores indicating greater preparedness.
4.1.2. Model Specification
We first take stock of features of the design. This SW‐CRT takes repeated cross‐sectional samples of patients within SNFs, so there is no need to account for correlated or repeated observations within participant. Second, the design is incomplete and has an implementation phase of two periods' (months') duration. These features typically add to the total number of periods and, correspondingly, reduce the number of clusters observed in any given period. The net effect on model specification is that smoother, less flexible time effect models are recommended, lest the analysis model be under‐identified. Finally, there is no maintenance phase in this design, so we choose to capture the impact of the intervention with a single degree‐of‐freedom intervention effect.
Using our assessment of the design, we consider marginal model specification, generally along three dimensions: (i) The model for time effect(s); (ii) the model for the intervention effect(s), (where we note that these two model parameterizations are jointly specified within the marginal mean model, e.g., see a general form in Equation (1)); and (iii) the correlation model (e.g., see Equation (3)). For the time effect, as suggested in the foregoing, there are many more periods than clusters in this study (i.e., 22 periods in which data were observed in at least one of the six clusters, Figure 2). As such, time period as a categorical variable (such as the model in (1)) would pose challenges to statistical inference for parameters of the marginal mean model because of the use of degrees of freedom given by (see Section 3.3). Instead, we assumed a linear time model such as that shown in Table 2, with details of the specification by cluster‐period shown in (Figure 5A) [45].
To model intervention effects, we consider as primary analysis a ten‐period incremental effects model. The rationale for this choice is to mirror a scenario where, a priori, the study team expects that the intervention would have a gradual effect over time and that, in this case, the maximal effect would be attained after the largest number of periods under which any of the clusters in the study is observed under the intervention condition. Therefore, the ten‐period incremental model was selected as the primary analysis in this instance as it corresponds to the maximum number of periods of intervention across all of the six clusters (i.e., see SNF 1, Figure 2). (Moreover, in this simulated data example, we note that the data were actually generated according to this specific ten‐period incremental intervention effect model and therefore, for illustrative purposes, it is valuable to consider such a marginal mean model as the key model of focus.) As alternative models, we consider a constant intervention effect model and an extended incremental intervention effect model. For the latter, we specifically consider the extended incremental intervention effect such that the maximum intervention effect is assumed to be reached at period five of intervention implementation and sustained thereafter, where we note that five corresponds to the minimum number of periods of implementation across all clusters (i.e., see SNF 6, Figure 2). As summarized in Figure 5B, the intervention variable takes value 0 if cluster is allocated to the control condition during the th period and, depending on the form of the model, a value in if the cluster is allocated to the intervention. Specifically, the incremental effects model sets when cluster is in intervention in period , whereas the constant intervention effect model sets , and the extended incremental intervention effect model sets in intervention periods. A general form of the marginal mean model is therefore specified as
| (8) |
where is the link function, namely identity for the PCS outcome.
The final component of the model specification is the correlation model. Our view is that this would be pre‐specified based on a priori expectations about likely patterns and would be corroborated by observed empirical data. We note that, for the present illustration, the simulated data set was generated according to the NE correlation structure in accordance with the pre‐specified plan for analysis of the real CH data set. Importantly, in this “small” SW‐CRT, it is valuable to consider the NE correlation model to be specified with the Fisher's ‐transformation link function (see Table 3). This provides inference (e.g., values) for ICC parameters with improved finite‐sample properties and point estimates and confidence intervals for and that are assured to be in the range (0,1) and which are obtained through transformation of and back to their natural ICC scale.
4.1.3. Estimation and Inference
As described above, inference for (as well as, if of interest, and ) was based on the ‐distribution with 3 (i.e., = ) degrees of freedom using bias‐corrected (BC) standard errors (because of the limited number of clusters), specifically using the BC1 (i.e., Kauermann‐Carroll [62]) correction. Given that the Fisher ‐transformation link function was adopted for the correlation model in this “small” sample setting with nested exchangeable correlation structure, inference for correlation model parameters was performed on the Fisher's scale using the distribution with 4 (i.e., = ) degrees of freedom, again using bias‐corrected (BC) standard errors, but here using BC2 for parameters of the correlation model. Importantly, motivated by their usefulness for planning future studies, we present 95% CIs for the correlation parameters rather than values.
4.1.4. Data Extract and Example Code
The SAS analysis dataset was created from the individual‐level simulated data with SAS code shown in the Supporting Information section 1.2. An extract of the simulated data created in R is shown in Web Figure 2 with the full simulated data set provided in the supporting Github website. The following SAS code creates the design matrix as a dataset with 90 observations, where all control and intervention data collection cluster‐periods have a sample size of 1. Notably, it shows the creation of all three intervention variables with values shown in Panel B of Figure 5.
.

For the analysis dataset, the PCS outcomes of interest in three periods (1, 5, and 8) for cluster‐periods of sizes 6, 5 and 4, respectively, are shown with other analysis variables in Figure 6. Data are arranged with one row per individual in each cluster and in each time period. Taking the first cluster as an example, we note that this first cluster is under the control condition through period 5, then has an implementation phase of 2 periods (i.e., no data collection, shown as no rows of data) and finally switches to the intervention condition in period 8 as shown in Figure 6. For use in analysis via the GEEMAEE SAS macro, the variables clusters and periods denote the cluster (i.e., SNF) number and the period number, respectively. Parameters of the mean model are coded as: int for an intercept (equal to 1 for all records), period=periods to model linear time, intervention_binary corresponding to indicator of model (8) for the constant intervention effect model, incre_trt corresponding to indicator for the incremental effect model, and ex_incre_trt corresponding to indicator for the extended incremental effects model. Note the variable is the intervention effect variable for the incremental and extended incremental intervention effects model for each cluster‐period, as shown in the values in Panel B of Figure 5. Before we fit the analytic model, we need to sort the analysis SAS dataset, shown in Figure 6, first by site ID and second by period, and load the SAS macro using the %include statement as the macro was stored in an external file.
FIGURE 6.

Extract from the simulated Connect‐Home cross‐sectional SW‐CRT data set with 22 periods (months) in 6 skilled nursing facilities (SNFs).
.

The analysis dataset is identified by the option XYDATA in the GEEMAEE SAS macro. Of the 26 possible arguments of the GEEMAEE macro, 8 are required and these together with 6 others (for a total of 14) are specified for this analysis and are summarized in Table 4. The SAS macro GEEMAEE code for the model with linear period effects and incremental intervention effects under nested exchangeable correlation structure is:
TABLE 4.
Arguments specified for analysis of Connect‐Home Preparedness for Caregiving Scale for primary analysis with linear periods effect and constant intervention effect under nested exchangeable correlation structure using the SAS GEEMAEE macro.
| Argument | Example | Description |
|---|---|---|
| xydata | pcg_comp_CH_2 | Data set name |
| yvar | pcgs_integer | Outcome variable name |
| ytype | Normal | Outcome type |
| link | Identity | Link function for marginal mean model |
| xvar | Int period intervention_binary | Predictor variables for marginal mean model a |
| clusterid | Clusters | Cluster ID variable |
| periodid | Period | Period ID variable |
| corr | NE | In‐built structure for correlation model |
| corrlink | Fishersz | Link function for correlation model |
| makevone | No | Set W matrix as identity |
| makephione | No | Set dispersion parameter equal to 1 |
| alpadj | MAEE | Type of adjustment for correlation model |
| (No adjustment [UEE] | scalar [SEE] | matrix‐based [MAEE]) | ||
| maxiter | 100 | Maximum number of iterations; default is 50 |
| epsilon | 0.00001 | Tolerance level for convergence; default is 0.0001 |
Intercept (1 for all records), linear time period (with period 1 as reference) and binary intervention arm indicator.
title “The GEE/MAEE analysis of caregivers' preparedness with linear period effects and incremental intervention effects under NE correlation structure”;
%GEEMAEE(xydata= pcg_comp_CH_2, yvar= pcgs_integer, ytype = normal, link =identity, xvar= int period incre_trt, clusterID= clusters, periodID = period, corr = NE, corrlink = Fishersz, makevone=NO, makephione=NO, alpadj=MAEE, maxiter=100, epsilon=0.00001);
Holding fixed the macro specifications for linear period effects and nested exchangeable correlation structure, the extended incremental intervention effects and constant intervention effects models are fitted with the SAS GEEMAEE macro by substituting ex_incre_trt and intervention_binary, respectively, for incre_trt in the XVAR argument of the GEEMAEE macro call as shown in the Supporting Information section 1.3.
Before proceeding to examine results of the fitted models including estimated intervention effects, it can be informative to visualize and examine the summary outcome data. We remark that there is an overall increasing trend of the mean PCS by cluster‐period under the intervention condition (Figure 7). With those remarks as prologue, we turn now to the summarization and interpretation of the formal analysis results.
FIGURE 7.

Mean outcomes for Preparedness for Caregiving Scale (PCS) for the simulated Connect‐Home data example. Each circle in the plots represents a cluster‐period with its size proportional to its cluster size.
4.1.5. Results
Output from the analysis of the incremental intervention effects model is provided as an example of the format of output from the GEEMAEE macro in Figure 8. From this output, as noted in Section 4.1.3, we extract Kauermann‐Carroll (BE1) and Mancl‐DeRouen (BC2) SEs for the mean and correlation model parameters, respectively. Table 5 summarizes the results of the GEE/MAEE analysis for the PCS outcome for the three different intervention effect specifications, under the NE correlation structure. Based on the specified primary analysis using the incremental intervention effect model, caregivers of patients receiving the intervention in the 10th period of implementation in a given SNF were estimated to have a preparedness score that was on average 2.9 (95% CI: 1.9, 4.0) points higher than caregivers of patients in the control condition (where we reiterate that only a single SNF was considered to have implemented the intervention for a full ten periods, that is, SNF 1, see Figure 2). For the purpose of comparison, the maximal intervention effect in caregivers of patients in the extended incremental intervention effect models was estimated to be attained by caregivers in an SNF that had implemented the intervention for five or more periods at a magnitude of 1.8 (95% CI: 0.8, 2.9) points higher than in caregivers of patients in the control condition.
FIGURE 8.

Example of SAS GEEMAEE output for analysis of simulated Connect‐Home data set under incremental effect mean model with linear time and nested exchangeable correlation model.
TABLE 5.
Analysis of Preparedness for Caregiving Scale (PCS) for the simulated Connect‐Home data set using nested exchangeable correlation structure: Comparison of three different mean models with estimation via GEE/MAEE. a
| Incremental | Extended incremental | Constant | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| intervention effect | intervention effect | intervention effect | ||||||||||
| Marginal mean model (identity link) | ||||||||||||
| Parameter | Estimate (SE) | value | 95% CI | Estimate (SE) | value | 95% CI | Estimate (SE) | value | 95% CI | |||
| Period intercept, | 18.6 (0.12) |
|
(18.2, 19.0) | 18.6 (0.11) |
|
(18.2, 18.9) | 17.7 (0.30) |
|
(16.7, 18.6) | |||
| Period slope, | 0.03 (0.01) | 0.09 | () | 0.03 (0.02) | 0.16 | () | 0.20 (0.04) | 0.02 | (0.08, 0.33) | |||
| Intervention, b | 2.9 (0.3) | 0.003 | (1.9, 4.0) | 1.8 (0.3) | 0.011 | 0.8, 2.9 |
|
0.23 | () | |||
| Dispersion, | 2.22 | 2.27 | 2.62 | |||||||||
|
Correlation model (Fisher's
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Parameter c | Estimate (SE) | value | 95% CI d | Estimate (SE) | value | 95% CI d | Estimate (SE) | value | 95% CI d | |
| Within‐period, | 0.21 (0.10) | — | () | 0.24 (0.10) | — | () | 0.45 (0.19) | — | () | |
| Between‐period, | 0.11 (0.11) | — | () | 0.10 (0.05) | — | () | 0.29 (0.14) | — | () | |
| ICC parameters | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Parameter c | Estimate (SE) e | value | 95% CI | Estimate (SE) e | value | 95% CI | Estimate (SE) e | value | 95% CI |
| Within‐period, | 0.10 (0.05) | — | () | 0.12 (0.06) | — | () | 0.22 (0.09) | — | () |
| Between‐period, | 0.05 (0.05) | — | () | 0.05 (0.03) | — | () | 0.14 (0.07) | — | () |
With BC1 (i.e., Kauermann‐Carroll [62]) and BC2 (i.e., Mancl‐DeRouen [63]) SEs for the mean and correlation model parameters, respectively, and with inference (confidence intervals and values) for mean model parameters via the statistic using 3 (i.e., ) degrees of freedom.
Note different interpretation of for extended incremental and constant intervention effect models. See Table 2.
Not reported as the focus is on estimation.
Confidence intervals for and were calculated from the model results on the Fisher ‐transformation scale with 4 (i.e., ) degrees of freedom and were then back‐transformed to the original correlation scale to obtain CIs for the ICCs, and (see Table 3), where we note that implementation of the Fisher's ‐transformation link in the SAS GEEMAEE macro [38] is as outlined in Table 3, without the (1/2) scaling parameter that is sometimes used.
SEs for and were obtained via the delta method using SEs of and (details not shown).
Finally, under the constant intervention effect specification, caregivers of patients receiving the intervention in an SNF implementing the intervention for any number of periods had a preparedness score that was on average 0.5 points lower than caregivers of patients in the control condition in that period with an estimated magnitude of and 95% CI given by (). Although not statistically significant, this negative effect contradicts the beneficial effects estimated under both of the other two marginal mean models and contradicts the visual patterns observed in the empirical data (Figure 7), which we expected to see given that we know that the underlying data generating mechanism for the simulated data set was based on the extended incremental intervention effect model with a positive, beneficial effect of the Connect‐Home intervention. This example serves to confirm the findings of Kenny et al. [57] for stepped wedge designs that, when the true underlying mean model is time‐varying (i.e., the incremental intervention effect model, in this case), the misspecified constant intervention effect model may estimate an effect in the opposite direction to the true effect.
It may be considered not surprising that the estimated intervention effect estimate is the greatest under the incremental intervention effect model because it estimates the maximum intervention effect only after 10 intervention periods rather than averaged over all intervention periods, or that of the extended incremental intervention effect model, which strikes a balance between the incremental and constant effects specifications. Regarding time effects, we note that the constant intervention effect model has the greatest estimated time effect among the three models. This, combined with the negative estimated intervention effect (as noted above, which deviates from the assumed positive treatment effect), further highlights the problems of misspecification of the treatment effect in so far as noting that the linear time effect may then effectively confound some of the true intervention effect.
Estimated ICCs of the NE correlation model under the incremental intervention effect model are close to the true ICCs used in the data‐generating mechanism of the simulated data set (i.e., with values (, ) = (0.1, 0.05)) with estimated values of 0.10 (95% CI: ) and of 0.05 (95% CI: ), respectively. These are compared to values of 0.12 (95% CI: ) and 0.05 (95% CI: ), respectively, under the extended incremental intervention effect mean model. While the ICCs under the constant intervention effect mean model are the greatest among the three intervention effect models with estimated between‐period ICC () of 0.14 (95% CI: ) and within‐period ICC () of 0.22 (95% CI: ), we note that all confidence intervals are wide due to the small sample size and, mostly span zero, indicating a range that includes negative pairwise correlations.
When reflecting on the modeling strategy used in the current analysis, it is important to note that the form of the mean model should be pre‐specified and based on a carefully hypothesized mechanism of action of the intervention and that, as noted above, it is possible that intervention effects could be considerably biased if the form is misspecified [57, 69, 70]. Given that we used a simulated data set in the current example, we are in the unusual position of having knowledge of the correct form of the marginal mean model and were able to show that our selected primary analysis (i.e., that matched the true underlying form), did indeed estimate the incremental intervention effect that was very close to the truth. In reality, although we advocate for model pre‐specification in SW‐CRTs (e.g., the incremental intervention effect model, in this case), it is still important to undertake sensitivity analyses with other model forms in order to try to elucidate situations whereby model misspecification may have occurred. This recommendation aligns with that of Kenny et al. [57] and Wang et al. [70] Other useful tools, even when model forms are pre‐specified is to corroborate via comparison of model selection criteria (e.g., the CIC criterion [71], which is implemented in the SAS GEEMAEE macro [38]) and by comparing model estimates to visualizations of the cluster‐specific data (e.g., see Figure 7).
For comparison, we provide the results from estimation of all three intervention effects for the PCS outcome using uncorrected estimating equations for the correlation parameters (Web Table 1). Findings show estimated intervention effects generally smaller than those from the GEE/MAEE and no strong patterns in their SEs. The main point here is that UEE is expected to underestimate the correlations. Thus, as expected, the correlation values in Table 5 are greater than the correlation values in Web Table 1. For small sample‐sized studies (in terms of number of clusters) such as Connect‐Home, it is important to mitigate the bias by MAEE, particularly if there is interest in using these ICC estimates for planning future trials.
4.2. Heart Health Now Analysis
4.2.1. Analysis Overview
In this data example, we analyze an individual‐level binary outcome—whether the patient was screened for smoking [47]—P aggregrated into a binomial‐like outcome using the cluster‐period mean approach. Analysis is implemented in R version 4.4.2 using the geeCRT package [36]. We note that the cluster‐period mean analysis approach [36] is not yet available for implementation in SAS. We again take account of some key features of the design. First, while HHN had a complete design, it ended up having an incomplete data structure as some clusters (medical practices) had no data recorded in some periods (Figure 3). Second, there is a maintenance window after a four‐period (one year) active intervention phase of onsite practice facilitation designed to determine whether the benefits of practice facilitation continue after active intervention is withdrawn. Third, there are six unique treatment sequences across two strata but, in contrast to Connect‐Home, each with many clusters per sequence, ranging from 27 to 58. Finally, there are very large cluster‐period sizes (ranging from 1 to 10 948, with median [25th, 75th percentiles] of 1217 [622, 2497], see Figure 3). Clusters of this size render analysis of individual‐level data prohibitively burdensome even with access to a high‐performance computer. Observed outcome proportions by sequence‐period indicate relatively large proportions (mostly 60%) across the three phases of the study (control, active and maintenance) (Figure 9). The observed trends or curves in Figure 9 suggest that the proportion screened for smoking increases in the intervention relative to the control period in four of six sequences whereas the opposite may be true in Sequences 2 and 3, although these two sequences appear to have smaller sample sizes judging by the relatively smaller size of the circles in the plot. The trends may seem surprisingly smooth for observed data, but recall that adjacent periods have 75% overlapping data within practices.
FIGURE 9.

Observed outcome proportions for “screened for smoking” for each sequence‐period. Each circle in the plots represents a sequence‐period with its size proportional to its sequence‐period size. Light grey lines show outcome proportions by cluster‐periods.
4.2.2. Model Specification
In the statistical analysis of the HHN data, our modeling choices pertain to the underlying individual‐level data with binary responses, which induces the cluster‐period mean model for our aggregated data structure. We specify the canonical logit link as is common for binary outcomes. With many more clusters than periods, the flexible categorical time effects structure is appealing, so we adopt that specification here; see Table 2. We then consider four models that arise from combining two different marginal mean intervention effect models (extended incremental effect and constant intervention effect) with two different correlation models (ED and NE, see Table 3, again formulated for individual‐level data). We focus on the first of these four specifications, and provide the other three as points of comparison. Note that our choice to focus on the extended incremental effects model arises specifically from the presence of a maintenance period in the design and the focus on the ED correlation structure is supported by the empirical correlation structure of the cluster‐period means (i.e., proportions) as described below in relation to the pairwise correlation at the individual observation level. Moreover, given the use of stratified randomization in this study, all marginal mean models include an indicator variable for “high readiness” (with low readiness as the reference level).
To elaborate further, we specify the individual‐level outcome model as , where quantifies the intervention for cluster‐periods in active intervention or maintenance phases, is the intervention effect and is an indicator variable corresponding to strata, specifically equal to 1 for clusters in the high readiness stratum and 0 for those in the low readiness stratum. As in Equation (2), represents the calendar period corresponding to the th design timepoint for the th cluster because the HHN study uses an incomplete design and not all clusters are observed in all periods.
The corresponding cluster‐period marginal mean model is then derived from the individual‐level model as:
| (9) |
where denotes the cluster‐period mean (in this case, proportion of “screened for smoking”) of the th cluster in the th period.
In the extended incremental effects model, accounting for the maintenance phase, the full intervention effect is assumed to be attained at the 4th period of active implementation and continues throughout the maintenance period, such that during all control cluster‐periods and then equals for the 1st, 2nd, 3rd and 4th active intervention periods, respectively, and equals 1 during each cluster‐period in the maintenance phase. In contrast, in the constant intervention effect model, in all cluster‐periods in intervention, whether active or maintenance. A comparison of the coding of the indicator for the constant and extended incremental intervention effect models for this analysis is shown in Figure 10.
FIGURE 10.

Coding of intervention effects for marginal model for Heart Health Now SW‐CRT; numbers in each cell correspond to coding for the extended incremental intervention effect model and constant intervention effect model, respectively.
As noted above, both the ED and NE correlation structures were adopted (Table 3) with the model specified as a function of pairwise correlations of individual observations (not cluster‐period means) from the same cluster and with a focus on the ED structure, using the NE structure for comparative purposes. Under the individual‐level ED correlation structure (Table 3), the induced correlation model for the cluster‐period mean outcomes from the th and th periods of the th cluster (practice) can be derived as [36]
| (10) |
where is the pairwise correlation for individuals in the same cluster during the same period (i.e., the within‐period ICC) and is the decay rate for measurements one period apart that is, it is the decay in between‐period correlation over time. If the cluster‐period sizes are the same for all periods (which is not the case with the HHN study), the correlation of the cluster‐period means in Equation (10) has a scaled first‐order autoregressive, that is, AR1, structure. Moreover, we note that, in contrast to the specification and estimation of the ED correlation structure for individual‐level data through a generalized linear model with log‐link (see Equation (3)), and are estimated directly in the analysis of cluster‐period means. In particular, prior work by our team has shown and , can be estimated using the GEE/UEE approach directly by solving a system of equations (see (2.9) and (2.10) of Li et al. [36]); analogous equations are available for the ICCs of the NE correlation structure. Furthermore, the cluster covariance matrices of the pairwise cluster‐period means that are constructed from the pairwise correlations in Equation (10) are building blocks of the UEE or MAEE estimation equations, where the latter incorporates finite‐sample adjustments in the estimation of the correlation parameters (e.g., and for ED; see Supporting Information of Li et al. [36] for methodology of the GEE/MAEE approach). Example R code for fitting this model to cluster‐period mean data is given in the Supporting Information of the current tutorial, specifically using the geeCRT package.
To justify that the ED cluster‐period correlation model is preferred to the cluster‐period NE structure, we examine the 11 11 empirical correlation matrix of cluster‐period means (i.e., proportions), which is given by:
Here we observe some decay in pairwise correlation with increasing distance between periods which is more consistent with an ED structure than a NE structure. Furthermore, we note that we can obtain an approximation to the induced pairwise correlation of cluster‐period means (i.e., proportions) in Equation (10) as a result of the large cluster‐period sizes (i.e., values of ) such that it can be approximated by . Linking this to the empirical data, if we average the estimated correlations for each of the different lags, we observe comparable values to those expected based on the estimated value of under the extended incremental intervention effect model fitted under the ED correlation structure (see Table 6).
TABLE 6.
Heart Health Now Analysis of “screened for smoking”: Logistic model parameter estimates via cluster‐period analysis via GEE/UEE, under exponential decay correlation structure with extended incremental effects and constant intervention effect model. a
| Extended incremental | Constant | |||||||
|---|---|---|---|---|---|---|---|---|
| intervention effect | intervention effect | |||||||
| Marginal mean model (logit link) | ||||||||
| Parameter | Estimate (SE) | value | 95% CI | Estimate (SE) | value | 95% CI | ||
| Period 1, | 0.38 (0.12) | 0.002 | (0.14, 0.62) | 0.37 (0.12) | 0.002 | (0.14, 0.61) | ||
| Period 2, | 0.40 (0.12) | 0.001 | (0.16, 0.63) | 0.39 (0.12) | 0.001 | (0.16, 0.63) | ||
| Period 3, | 0.43 (0.12) |
|
(0.19, 0.67) | 0.44 (0.12) |
|
(0.20, 0.68) | ||
| Period 4, | 0.46 (0.12) |
|
(0.22, 0.70) | 0.48 (0.12) |
|
(0.24, 0.72) | ||
| Period 5, | 0.43 (0.13) | 0.001 | (0.18, 0.67) | 0.48 (0.12) |
|
(0.24, 0.72) | ||
| Period 6, | 0.37 (0.13) | 0.004 | (0.12, 0.63) | 0.46 (0.12) |
|
(0.23, 0.70) | ||
| Period 7, | 0.32 (0.13) | 0.016 | (0.06, 0.58) | 0.46 (0.12) |
|
(0.22, 0.69) | ||
| Period 8, | 0.22 (0.14) | 0.109 | () | 0.38 (0.12) | 0.001 | (0.15, 0.61) | ||
| Period 9, | 0.16 (0.14) | 0.246 | () | 0.34 (0.12) | 0.004 | (0.11, 0.56) | ||
| Period 10, | 0.12 (0.14) | 0.361 | () | 0.30 (0.12) | 0.009 | (0.08, 0.53) | ||
| Period 11, | 0.10 (0.14) | 0.446 | () | 0.28 (0.12) | 0.015 | (0.05, 0.51) | ||
| High readiness, | 0.08 (0.17) | 0.662 | () | 0.08 (0.17) | 0.635 | () | ||
| Intervention b , | 0.24 (0.07) | 0.001 | (0.10, 0.39) | 0.06 (0.03) | 0.037 | (0.004, 0.13) | ||
| ICC parameters c | ||||||
|---|---|---|---|---|---|---|
| Parameter | Estimate (SE) | value d | 95% CI | Estimate (SE) | value d | 95% CI |
| Within‐period, | 0.49 (0.02) | — | (0.44, 0.54) | 0.49 (0.02) | — | (0.44, 0.54) |
| Between‐period decay, | 0.94 (0.01) | — | (0.92, 0.96) | 0.94 (0.01) | — | (0.92, 0.96) |
With inference (confidence intervals and p values) calculated using statistics with robust SEs.
Note different interpretation of for extended incremental and constant intervention effect models. See Table 2.
Via correlation model with identity link based on cluster‐period mean analysis. See details in [36].
Not reported as the focus is on estimation.
4.2.3. Estimation and Inference
For inference, given the large number of clusters and observations per cluster‐period, estimation was via GEE/UEE (i.e., with no small‐sample correction to estimation of the correlation parameters), and inference for and for the correlation parameters was based on the ‐distribution with robust SEs with no small‐sample correction. For comparative purposes, estimation via GEE/MAEEwith bias‐corrected SEs (BC1 for parameters from the marginal mean model and BC2 for parameters of the correlation model) is provided in Supporting Information (Web Tables 2, 3, 4). All analyses of HHN were performed with the R package geeCRT.
4.2.4. Data Extract and Example Code
Data are aggregated at the cluster‐period level and are therefore arranged with one row per cluster per period (Figure 11). The variables denoted by site_id and cohort are the cluster (clinic) and sequence identifiers, respectively, while quarterand phase denote the period and intervention status, respectively. Intervention status is coded with levels of 0, 1 and 2 corresponding to control, active intervention and maintenance, respectively. We take the example of the first cluster, which is in sequence 4 (i.e., labeled as “cohort” in the data set). This cluster is under the control condition through period 3 (i.e., 2016Q2), then switches to intervention in period 4 (i.e., 2016Q3) and switches to the maintenance phase in period 8 (i.e., 2017Q3) through the final period, period 11 (i.e., 2018Q2). In this example, the aggregated data for the '‘screened for smoking’' outcome is analyzed using the denominator for the cluster‐period for whom data are recorded (denoted by smoking_screened_denom) and the number who were actually screened (denoted by smoking_screened_num). For example, for clinic 1 in the first period (i.e., quarter 2015Q4) in sequence 4, most (97.8%) people were screened (i.e., ) during this control period.
FIGURE 11.

Extract from the Heart Health Now cross‐sectional SW‐CRT data set.
Using these data, R code is shown below to fit the primary analysis model, namely the extended incremental intervention effect model with categorical time, paired with the exponential decay correlation model, as well as the constant intervention effect model. This is implemented using the cpgeeSWD function from the geeCRT package within both the GEE/UEE and GEE/MAEE frameworks.
To implement the analytic model, we first define key arguments namely the outcome y, design matrix X and cluster identifier, amongst other variables.
We first load relevant libraries:
.
![]()
Next read in the data set:
.
![]()
Next define the cluster identifier for 217 clusters with data:
> id <‐ (HHN_smoking_screened_c$site_id)
the cluster‐period sizes:
> m <‐ HHN_smoking_screened_c$smoking_screened_denom
the outcome, namely the fraction screened per cluster‐period:
.
![]()
the period identifier:
> period <‐ as.numeric(factor(HHN_smoking_screened_c$quarter))
and the cluster sizes (i.e., across all periods):
> n <‐ as.numeric(tapply(period,id,length))
Then the design matrix for the constant intervention effect model is defined as follows:
.

In contrast, the design matrix for the incremental intervention effect model first needs a variable indicating the level of the intervention “dose” given by:
.

with the corresponding design matrix for the incremental intervention effect model given by:
.

Code for both the extended incremental intervention effect model and the constant intervention effect model each with categorical time model paired with the exponential decay correlation model under GEE/UEE is given by the following code, where X would be defined as the relevant design matrix for the marginal mean model based on the two structures shown above:
> fit_eiie_ed_uee <‐ cpgeeSWD(y=y, X=X, id=site_id, m=m, corstr=“exp_decay”, family=“binomial”, epsilon=1e‐8, alpadj=FALSE)
With, as an example, output given by:
.

Code for the extended incremental intervention effect model with categorical time model paired with the exponential decay correlation model under GEE/MAEE is given by:
> fit_eiie_ed_maee <‐ cpgeeSWD(y=y, X=X, id=site_id, m=m, corstr=“exp_decay”, family=“binomial”, epsilon=1e‐8, alpadj=TRUE)
Note that there is a single difference between the two sets of code to perform GEE/UEE and GEE/MAEE, namely via the “alpadj” option, which is coded as “FALSE” in order to implement GEE/UEE and coded as “TRUE” in order to implement GEE/MAEE. This is because the marginal mean and correlations models are the same, and only the estimation approach changes.
4.2.5. Results
Table 6 presents results for the GEE/UEE analysis of the cluster‐period proportions of “screened for smoking” under the ED individual level correlation model for both the extended incremental and the constant intervention effects model with comparative results under NE in Web Table 2. Note that R model output for the extended incremental intervention effects model is shown above and maps to the point estimates and standard errors presented in the first results column of Table 6.
Under the primary analysis model (extended incremental effects with ED correlation), the estimated odds for being “screened for smoking” is 1.27 (i.e., ) times higher in the maintenance phase of the intervention (or at 4 periods of active intervention) than under the control condition, with 95% CI given by (1.11,1.48), that is, (, ). In contrast, estimated correlation parameters of the same assumed correlation model do not change (to two decimal places) with a change in the marginal mean model. For example, the estimated within‐cluster correlation () is estimated to be large at 0.49 (95% CI: 0.44, 0.54) and the same whether the extended incremental intervention effects model or the constant intervention effects model is selected, with the same decay parameter () estimated under both marginal mean models. As expected, the estimated intervention effect under the constant intervention effect model is attenuated relative to the extended incremental intervention effect model. Moreover, the intervention effect estimates under NE correlation are larger than those under ED, suggesting that the ED results may be conservative. We note that estimates and inferences under MAEE are nearly identical to those under UEE, owing to the large sample sizes (Web Tables 3 and 4).
4.3. Crowdsourced HIV Testing Analysis
4.3.1. Analysis Overview
In this closed‐cohort data example, we analyze the primary outcome of the trial, an individual‐level binary outcome indicating whether the participant reported having undergone HIV testing in the previous three months. In contrast to the HHN analysis, we do not aggregate data by cluster‐period but instead analyze the individual‐level binary outcome data directly with analysis performed in SAS version 9.4 using the SAS GEEMAEE macro version 2 [38]. We demonstrate use of flexible within‐cluster correlation models (Section 3.4.3), and provide code in the Supporting Information to implement the analyses in R.
We again take account of some key features of the design. First, the HIV study was designed to have complete data on the study cohort such that the outcome was measured in all individuals every three months over a one‐year period [42]. However, as expected in a cohort study, some individuals are missing outcome data in some periods even though the study design itself is considered complete since at least some data were available in all clusters in all periods. More specifically, data completeness by cluster‐period for the primary outcome of HIV testing ranges from 66.7% to 87.4% (Figure 4). Second, like the HHN study, there is a maintenance window but with the difference that, unlike the HHN study with four active intervention periods in each cluster, each sequence of the HIV study has only one 'active' intervention period (in this case of three months' duration) before switching to the maintenance condition for the remainder of the one‐year study. As such, by design, the final sequence in each stratum (sequence 4) is not observed under the maintenance period. Third, unlike the Connect‐Home and HHN studies in which all clusters were observed under the control condition, we note that a “baseline” measure is not available in the HIV study as there is no “period 0” in which all clusters and all individuals were observed under the control condition (because the two clusters in sequence 1 were first observed under the intervention condition). Fourth, there are four unique sequences, each with two clusters (cities), one each from two province strata. As such, like HHN, there are more clusters than time periods and a flexible categorical indicator model form can be used for time. Before getting into the details of model specification, we note that, in contrast to the HHN example in which we aggregated data to the cluster‐period mean, in the analyses described below we will directly analyze the individual‐level outcomes.
4.3.2. Model Specification
In alignment with the fixed components of the published GLMM analysis [42], we specify a marginal mean model with a constant intervention effect and (calendar) time period as categorical. More specifically, like the published analysis we do not distinguish between active‐intervention and post‐intervention cluster‐periods and instead consider a two‐level indicator for treatment condition, namely control () vs. intervention (). In other words, our statistical models treat the light blue and dark blue cluster‐periods of Figure 4 the same. Moreover, a variable for strata (a randomization variable) is included in the model to reflect the study design as was the case for the HHN data example (see Section 4.2). Our approach to handling missing outcomes is also the same as the primary published analysis [42] in that we analyze all available data (see data completeness in Figure 4) and perform no additional adjustment for incompleteness. As such, we assume that the data are missing completely at random (MCAR) noting, in the GEE framework utilized in the current analysis, it is more restrictive than assumptions made by the authors of the primary likelihood‐based GLMM analysis; the Discussion provides further elaboration on this point.
We initially considered two correlation models, namely the block exchangeable (BE) and general proportional decay (GPD) correlation models (see Table 3). However, the preliminary models using these correlation structures, in addition to a model with the two‐parameter proportional decay correlation (a special case of GPD), did not converge with GEE/MAEE. We then conducted a set of exploratory analyses, described in the Supporting Information, that suggested the between‐period correlation for pairs of measurements on different people () was zero and, as such, the standard correlation models could not be estimated. Instead, we considered two alternative correlation models each with constrained to be 0. The first was the block exchangeable structure (with a constraint) and the second structure is a variation on the standard correlation decay models that we refer to as the extended 3‐dependence correlation model, again with . These two alternative models flexibly specify pairwise correlation decay over time and demonstrate the use of the GEEMAEE macro with a user‐defined correlation structure. We also utilize the identity link for both correlation models in contrast to the Connect Home example in which the nested exchangeable model (for the cross‐sectional counterpart) was specified with the Fisher's Z‐transformation. In light of the considerations outlined above, we therefore present results from two correlation models (constrained block exchangeable and 3‐dependence correlation) each paired with the marginal mean model specified as the constant intervention effect model with categorical time and adjustment for the stratification variable.
The marginal mean model for cohort member (participant) in cluster at the th timepoint () is given by:
| (11) |
where is an indicator variable for strata, specifically equal to 1 for Shandong Province and 0 for Guangdong Province, is the intervention effect and is the period effect, that is, corresponds to period 1, corresponds to period 2, corresponds to period 3 and corresponds to period 4.
The constrained block exchangeable model with identity link is like that in Table 3 with fixed equal to 0. In particular, the constrained block exchangeable form is given by:
| (12) |
where
In turn, this translates to the following pairwise correlations: for two different people at the same time point; for two different people at two different time points; and for the same person at two different time points. Next, the “user‐defined” extended 3‐dependence correlation model, again with identity link and , is given by:
| (13) |
where and are defined as in Equation (12), and indicators for time lags between an individual's observations are
corresponding to lags of one, two or three periods, respectively. While is interpreted as for the constrained BE model, the intra‐person decay correlation parameters in the extended 3‐dependence correlation model are: for the same person in adjacent time periods; for the same person two time periods apart; and for the same person three time periods apart.
4.3.3. Estimation and Inference
Given the limited number of clusters (i.e., 8), estimation was via GEE/MAEE and inference for and for the correlation parameters was based on the ‐distribution with bias‐corrected SEs (BC1 for parameters from the marginal mean model and BC2 for parameters of the correlation model). Degrees of freedom were for parameters of the marginal mean model and for parameters of both correlation models, namely the constrained block exchangeable model and extended 3‐dependence model. As noted in Section 3.3.2, there is some debate as to the appropriate degrees of freedom and thus we use the HIV Crowdsourcing study as an example of how the GEEMAEE macro has an option for degrees of freedom equal to . This choice of is in contrast to the Connect‐Home example were we used and for mean and correlation models, respectively. With note that whereas with in the Connect‐Home analysis due to the linear period effect, the penalty to using vs. is not as marked as for the HIV Crowdsourcing example with analysis assuming categorical time which leads to . As such, the use of for the HIV Crowdsourcing study may have been overly conservative.
4.3.4. Data Extract, Preliminary Analysis and Example Code
The analysis dataset with extract shown in Figure 12 has similar structure as the source data in Web Figure 4, with the notable exception that the analysis data are sorted (in order) by clusternum, ID and time. In particular, Figure 12 shows that there are 607 observations from 203 participants in cluster 1 (from SAS data set named hivtest2). Cohort studies must have an individual participant identifier, and in the HIV testing study individuals identified by ID have from one to four observations according to the number of periods they are observed. To handle time as a categorical variable with nominal scale as in Equation (2), period indicator variables period1, …, period4 have been created for inclusion in the marginal mean model along with the treatment condition indicator for intervention and stratum indicator variable for Shangdong province (Figure 12). A derived variable obsnum gives consecutive positive integers 1,…, to the observations in the th cluster, and its use will be discussed below for models with user‐defined correlation models.
FIGURE 12.

Extract for of the Crowdsourced HIV Testing closed‐cohort SW‐CRT data set.
Initially, we attempted to fit a model with the BE correlation structure with the option Corr=BE built‐in to the GEEMAEE macro. Specifically, the variables clusternum, ID and time are identifiers required in the dataset (Figure 12) specified with the xydata= argument for the macro to internally construct the correlation model dataset for within‐cluster observation pairs with variables z0, z1 and z2. Specifically, the data extract shown in Figure 13 lists the observation pairs with explanatory variables that can be constructed from the 10 observations of the first three individuals in the first cluster. The pairwise correlation model dataset provides the design matrix of explanatory variables for all possible within‐cluster pairs of observations. The macro code is:
.

FIGURE 13.

Extract of design factors for the BE pairwise correlation model for the Crowdsourced HIV Testing closed‐cohort.
Note that the required argument makevone selects the form of the weighting matrix in the estimating equations in Equation (6). Usually, one specifies makevone=NO to define non‐identity matrix for the Pearson residual cross‐products as described in Section 3.2.2. A more simple estimating procedure is specified by makevone=YES that sets the diagonal elements of to Finally, while the default number of iterations is 50, the user can choose their own value with the option maxiter. Unfortunately, the GEE/MAEE estimating procedure for the model with BE correlation failed to converge, even with 100 attempted iterations. Neither could we achieve convergence when we set makevone=YES.
Given that the specified model did not converge for the BE (or GPD) correlation structures (Table 3), we conducted exploratory analyses to help identify alternative user‐defined correlation structures that we could consider. These analyses (described in Supporting Information section 3.2) suggested that in the BE correlation structure. Thus, we proceeded to create the dataset of within‐cluster observation pairs whose extract is shown in Figure 13, while understanding that would be excluded from the correlation model specification in order to invoke the constrained BE correlation structure. Importantly, although variables obsnum, time and ID are not provided to the GEEMAEE macro for user defined correlation structures, they are used to create a second dataset of explanatory variables for Equation (3), the model for the correlation parameters.
The SAS code below produces dataset zd for fitting the BE correlation structure with a user‐defined Z‐matrix that alternatively can be specified with the macro option corr as BE is a built‐in structure to GEEMAEE. Here we specify the constrained block exchangeable correlation structure with a user‐defined matrix that omits the column with explanatory variable from the Z‐matrix for the BE correlation model. The following data step creates dataset named a and produces the within‐cluster observation identifier obsnum. Optionally, it also creates within‐cluster participant indicator personcnt.
.

Next, we use SAS PROC SQL to create from dataset a a new dataset zd of within‐cluster observation pairs whose extract is shown in Figure 13. The outcomes for the observation pairs hivt1and hivt2 are created in the following code to help identify the pairs but these variables are not passed to the macro nor are they shown in Figure 13:
.

Note that the first cluster (Guangzhou) has 607 participant‐time observations (from 203 participants) in the marginal mean matrix and contributes observation pairs to the matrix. There are observations from 1219 participants in the mean model dataset named a and within‐cluster observation pairs in the full correlation model dataset named zd. While this case study is used to illustrate the fitting of marginal models with user‐defined correlation structures with the SAS macro GEEMAEE, one can see that larger cluster sizes cause zd to be large. In such scenarios, data aggregation and cluster‐period GEEMAEE are an attractive estimation option subject to the limitation of software with respect to correlation model specification.
In the macro call for the constant intervention effects marginal mean model with categorical period effects combined with the constrained block exchangeable correlation structure, the two datasets are passed to the macro by xydata=a and Zdata=zd. The arguments xvar= and zvar= specify the respective explanatory variables for the mean and correlation models. Arguments link and corrlink specify their corresponding link functions while yvar= and ytype= specify the response variable and its type (i.e., binary), respectively. Option zpair= is required for user‐specified ‐matrices to identify the observation pair (e.g., zpair1 and zpair2) in Figure 13 with respect to the original observations in Figure 12. The option df_choice=2 specifies as df for both mean and correlation model parameters. Because is small, option alphadj=MAEE is specified for matrix‐adjusted estimating equations bias corrections to correlation model parameter estimates.
.

The macro option ESTOUT produces an output dataset for and whereas options VAR_BETA and VAR_ALPHA produce output datasets for their respective estimated covariance matrices including BC0, BC1 and BC2 estimators (Section 3.3.1). As illustrated in the Connect‐Home case study, the default output of macro GEEMAEE includes standard errors, values and corresponding values for tests of regression coefficients and while their confidence intervals could be calculated manually from the output. Alternatively, Section 3.3 of the Supporting Information provides SAS code to calculate CIs based on distributions with the desired degrees of freedom using the output datasets from the GEEMAEE macro. The next SAS GEEMAEE macro call is similar to the last except it specifies the extended 3‐dependence correlation model. Supporting Information section 3.4 provides SAS code to create dataset zd for this model.
.

4.3.5. Results
Table 7 presents results for the GEE/MAEE analysis of the individual‐level outcome of “tested for HIV in the past three months” under the two assumed individual‐level correlation models described above. We see no sensitivity of the estimated intervention effect to the choice of correlation model with the estimated odds ratio for intervention vs. control conditions for the outcome of being “tested for HIV” given by 1.31 (95% CI: 0.94, 1.82) (i.e., , value = 0.09) for both the constrained block exchangeable and extended 3‐dependence correlation models. Interestingly, in both models, there is a suggestion of changing period effect over time with an increase over time in the coefficients corresponding to time. This suggests an underlying increase in HIV testing rates based on these analyses. The two correlation models provide estimated within‐period correlations of the same magnitude of 0.02 (95% CI: ). For the components that differ between the two forms we see, as expected that, the within‐person correlation for the single “any lag” correlation (i.e., from the constrained block exchangeable model) is of a magnitude (i.e., 0.22) in the middle of the three distinct estimates obtained under the extended 3‐dependence correlation model (i.e., 0.26, 0.18 and 0.15, for lag‐1, lag‐2 and lag‐3, respectively).
TABLE 7.
Crowdsourced HIV Testing analysis with outcome of “HIV testing”: Logistic model parameter estimates via individual‐level analysis via GEE/MAEE with constant intervention effect model with categorical time under constrained block exchangeable and user‐defined extended 3‐dependence correlation structure. a
| Constrained block | Extended 3‐dependence | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| exchangeable correlation | correlation | |||||||||
| Marginal mean model (logit link) | ||||||||||
| Variable, Parameter | Estimate (SE) | 95% CI | value | Estimate (SE) | 95% CI | value | ||||
| Period 1, |
|
() |
|
|
() |
|
||||
| Period 2, |
|
() |
|
|
() |
|
||||
| Period 3, |
|
() |
|
|
() |
|
||||
| Period 4, |
|
() | 0.005 |
|
() | 0.005 | ||||
| Shandong, |
|
() | 0.991 |
|
() | 0.997 | ||||
| Intervention, | 0.27 (0.14) | () | 0.089 | 0.27 (0.14) | () | 0.094 | ||||
| Within‐cluster association model | ||||||
|---|---|---|---|---|---|---|
| Within‐cluster pair, parameter | Estimate (SE) | 95% CI | value b | Estimate (SE) | 95% CI | value b |
| Within‐period, | 0.02 (0.01) | () | — | 0.02 (0.01) | () | — |
| Different periods & persons, | Assumed 0 | — | — | Assumed 0 | — | — |
| Same person, any lag, | 0.22 (0.03) | (0.16, 0.28) | — | — | — | — |
| Same person, lag‐1, | — | — | — | 0.26 (0.04) | (0.17, 0.35) | — |
| Same person, lag‐2, | — | — | — | 0.18 (0.02) | (0.13, 0.23) | — |
| Same person, lag‐3, | — | — | — | 0.15 (0.02) | (0.11, 0.20) | — |
Note: (BC1 and BC2 for mean and correlation parameters, respectively) and with degrees of freedom for both.
With inference (confidence intervals and values) calculation using the statistic with bias‐corrected robust SEs (BC1 and BC2 for mean and correlation parameters, respectively) and t inference using I‐2 degrees of freedom for both.
values for correlation parameters not reported as the focus is on estimation.
5. Discussion
This article brings the benefits of the GEE/MAEE [19] population‐average modeling approach to the SW‐CRT community in an applications‐oriented, tutorial‐style format. In particular, the tutorial has illustrated features of this approach for the analysis of both complete and incomplete SW‐CRTs with cross‐sectional or cohort data. These features included diversity across: (1) Response types (continuous and binary); (2) mean model specification, including the time (period) component and the treatment component (constant, incremental and extended incremental effects intervention model); (3) correlation structures (nested exchangeable, exponential decay and block exchangeable); and (4) small‐sample corrections to SEs of both mean and correlation parameters to account for small numbers of clusters. Moreover, the application to the Connect‐Home SW‐CRT sought to illustrate two key benefits of the paired GEE/MAEE methodology over “standard” GEE, that is, UEE. First, that more valid estimates of correlation parameters can be obtained via directly modeling them and addressing potential small‐sample bias in the parameter estimators themselves (i.e., using MAEE), and, second, that small‐sample corrections can be used to improve sampling variance estimators for parameters of both the mean and the correlation models. Our team's work has shown that the combination of MAEE estimation for the correlation parameters with small‐sample corrections to the empirical standard errors yields superior statistical performance when the goal is testing and estimating both intervention and correlation parameters in SW‐CRTs [19, 26, 36]. The Connect‐Home analysis illustrated that, as expected, MAEE estimation of correlation model parameters is different to that obtained from UEE and that, in a situation with many more time periods than clusters (22 vs. 6, in this example), the marginal mean model can be fitted within the GEE/MAEE framework.
The Heart Health Now analysis illustrated that use of a cluster‐period GEE/UEE analysis approach based on cluster‐period summary statistics can offer considerable computational savings compared to analysis of individual‐level data when cluster sizes are large. The analysis demonstrated that different inferences may be reached using different mean model specifications. Because of the large sample size, results based on GEE/MAEE (i.e., with finite‐sample bias corrections to correlation parameters) were similar to GEE/UEE results. Li et al. [36] provide an example of cluster‐period GEE/MAEE analysis for a SW‐CRT with 22 clusters where bias‐correction is necessary. Moreover, in the analysis of binary outcomes from a parallel‐arm CRT of 10 hospitals (clusters) conducted in Malawi, Pence et al. [72] used the geeCRT package for estimating differences in sub‐cluster (medical provider) proportions calculated from three‐level binary data using an identity link and underlying individual level NE correlation structure; the authors used the GEE/MAEE methods implemented in geeCRT to account for large cluster sizes of the individual (patient) level data and to make finite‐sample adjustments given the small number of clusters. Their work illustrates the application of GEE/MAEE methodology for cluster‐period means directly to the cluster‐subcluster means in a CRT having a single timepoint where, in the NE correlation structure, subclusters take the place of periods.
The forms used for the mean model in this tutorial were pre‐selected from a class of the six forms that result from all combinations of three forms for the intervention effect (i.e., constant, incremental and extended incremental) and two forms for the time effect (i.e., categorical and linear time), Table 2. Whereas these six forms could be considered too restrictive, they are offered as illustrations. A larger class would accommodate more flexible forms for time (e.g., spline‐based forms or other piecewise forms with, say, a shift in mean control condition outcomes either upwards or downwards as a result of COVID19 disruptions, as was implemented in the actual Connect‐Home trial [45]). To this end, we have offered pointers to guide investigators in both the general narrative and the case studies. In fact, it is of key importance to consider a larger class of mean model specifications, given that recent work by Kenny et al. [57], Maleyeff et al. [69] and Wang et al. [70] have all demonstrated that biased intervention effects estimation may arise if a constant intervention effect is specified when the true effect is time‐varying (e.g., if the intervention requires some time to achieve its full effect), although this issue is not unique to marginal models or GEE estimation procedures. To accommodate a broader range of models, our GEEMAEE macro [38] allows general specification of the design matrix, as with any other GEE implementation, along with flexible, user‐defined, specification of the correlation model.
Relatedly, the class of models considered for the correlation structure was also pre‐selected from a class of structures, specifically from one of five forms (i.e., exchangeable, nested exchangeable, exponential decay, block exchangeable, and proportional decay). We did not fit the exchangeable structure to either of the three key data sets as such a form is expected to be too constrained in contrast to the other four forms which allow for smaller pairwise correlations between pairs of observations taken further apart in time. Moreover, the exchangeable correlation structure could lead to under‐powered studies when used in the design phase [73] if, for example, a correlation structure with decay is the true structure [59]. Indeed, correlation decay was illustrated with use of nested exchangeable correlation for both cross‐sectional studies, that is, Connect‐Home and Heart Health Now, with exponential decay correlation also illustrated for the latter. In contrast, flexible correlation modeling with specification of the Z‐matrix was demonstrated for the HIV testing cohort example, after attempts at standard use of the block exchangeable and proportional decay structures resulted in non‐convergence of the GEE/MAEE estimation algorithm. Estimation problems persisted with the three‐parameter GPD correlation structure. Considering that simulation studies have found greater convergence issues with ED and PD than NE and BE [26], we reconsidered a simplified version of the three‐parameter BE structure for the HIV testing cohort data. This followed use of descriptive statistics for the within‐period ICC () and the within‐person/different periods ICC () including one‐way ANOVA‐based ICC estimates that were non‐negligibly positive. This led to the successful application—solution with convergence—of GEE/MAEE with a constrained BE correlation that assumed a null different‐persons/different periods ICC, that is, . Whilst specification of the Z‐matrix to define a generalized linear model in Equation (3) for the correlation structure offers increased flexibility in correlation modeling, it also requires greater effort and knowledge in data preparation on the part of the analyst.
Many extensions to the class of correlation structures could be considered including those which account for different correlations for treatment conditions [74] or which accommodate additional layers of nesting (e.g., repeated measures, in the case, of a cohort SW‐CRT or additional sub‐clustering [66] in the case of, say, patients nested in providers nested in health facilities, nested in regions [53]). Additional questions arise as to what extent both the mean and correlation structures should be prespecified, and what room there is for model selection. While we strongly adhere to the recommendation to pre‐specify the form of the intervention effect based on the hypothesized mechanism by which the effect would achieve its full effect and whether or not a maintenance period is present, it is valuable to consider whether some form of model selection could be adopted to achieve the best model form for time and for the correlation structure through use of data that does not include the intervention condition. The CIC information criterion is a good choice for selection amongst correlation structures [36, 71, 75], with implementation directly available in the GEEMAEE macro in SAS [38]. In particular, see the SAS model output in Figure 8 with CIC value of 4.8754. For an example of a direct comparison amongst different correlation structures using the CIC criterion, see Table 4 of Zhang et al. [38]. When selecting amongst mean models, alternatives to the CIC are needed. For nested forms of the mean model with the same working correlation structure, hypothesis testing can be used to compare such nested models (just as it can be for nested GLMM models). In contrast, for non‐nested mean models, absolute goodness‐of‐fit tests are not available and the information‐based QIC criterion of Pan is a good choice; with a simulation study supporting its use in marginal logistic regression [76]. Alternatively, a generalized version of Mallow's Cp was shown to perform well relative to variable selection based on Wald and score tests [77]. Nevertheless, if some form of model selection is used, more research is needed to determine the most appropriate criterion to use.
Importantly, the methodology and many of the analytic considerations described in this tutorial are also applicable to other multi‐period designs including multi‐period parallel‐arm CRTs and crossover CRTs. Whilst, unlike SW‐CRTs, there should be no confounding effect of time in either of those designs, the need to consider multi‐correlation models and the challenges of handling small‐sample bias applies to many such multi‐period CRTs and, as such, the methods presented in the current tutorial are much more broadly applicable than the setting of SW‐CRTs.
There are limitations to this tutorial and the marginal model analysis approaches it presented in addition to the aforementioned potential for convergence issues. We note that implementation of the cluster‐period methodology described in Li et al. [36] in the geeCRT package is currently limited to binary outcomes and three correlation structures (EX, NE and ED). Future research could extend the applications to allow for non‐binary outcomes and additional correlation structures to accommodate within‐individual correlation decay in cohort designs, for example. In contrast, the individual‐level GEE/MAEE methodology as implemented in the SAS GEEMAEE macro (see Section 4.1) is very general, as the user can define any pair of generalized linear models for marginal mean and correlation structures of the form given in Equations (1) and (3) including functional forms that, for example, depend on individual‐level or cluster‐level covariates. Importantly, we note that, if the user does not use a pre‐packaged correlation structure in the SAS macro GEEMAEE, such as NE or ED, they would have to define their own matrix through provision of a second data set for the correlation model consisting of all observation pairs within clusters. Finally, this tutorial did not cover model diagnostics for isolated departures from model assumptions. In fact, the SAS GEEMAEE macro includes observation‐, cluster‐period‐, and cluster‐deletion diagnostics for assessing the influence of these types of data elements on the estimation of regression coefficients in both the marginal mean and correlation models, which have been illustrated elsewhere [38] using a marginal model for a binary outcome from the real Connect‐Home data.
In summary, marginal modeling of SW‐CRT data provides intervention effect estimates with a population‐averaged interpretation that is often of interest in public health. Adopting the paired GEE/MAEE approach, along with improved SE estimation (BC1 and BC2), provides superior statistical performance for testing and estimation of intervention effect and correlation parameters in SW‐CRTs. These more advanced procedures that are tailored to challenges in SW‐CRTs have been readily implemented in statistical software and we provide case studies, together with sample code, in practice.
Funding
This work was supported by the Patient‐Centered Outcomes Research Institute (Grant No. ME‐2019C1‐16196) and the National Institutes of Health (Grant No. R01DC020026, U01DC021719, KL2TR002490).
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Data S1: Supporting Information.
Acknowledgments
Research in this article was partially funded through a Patient‐Centered Outcomes Research Institute (PCORI Award ME‐2019C1‐16196) and through two awards from the National Institutes of Health (R01DC020026 and U01DC021719). The statements presented in this article are solely the responsibility of the authors and do not necessarily represent the views of the National Institutes of Health, or PCORI, or its Board of Governors or Methodology Committee. Dr. Preisser has received stipends for service on a data and safety monitoring board and as a merit reviewer from PCORI. Dr. Preisser did not serve on the Merit Review panel that reviewed his project. Dr. Preisser also received funding support for this project from the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant KL2TR002490. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Dr. Turner has received stipends for service as a member of the Clinical Trials Advisory Panel of PCORI. The Connect‐Home study was funded by the National Institute of Nursing Research, grant number: 1R01NR017636‐01. The HHN study was sponsored by the Agency for Healthcare Research and Quality (AHRQ) as part of the EvidenceNow Program, grant number: 5R18HS023912, with research administration provided by the Cecil G. Sheps Center for Health Services Research at The University of North Carolina at Chapel Hill. The authors wish to thank Stephanie Pierson and Brian Cass of the Sheps Center for their support in creating the HHN data set provided with the current tutorial. The authors wish to thank Dr. Haidong Lu of Yale University for support in deriving the primary outcome for the HIV crowdsourcing data example, which was performed using the publicly available data set provided with the published outcomes paper [42].
Data Availability Statement
All data and source code is available via GitHub at https://github.com/XueqiWang/SW‐CRT_tutorial. Specifically, source code to reproduce the simulated data and to generate the results in the Connect‐Home Case Study 4.1 using the SAS GEEMAEE macro, source code and the aggregated data set to generate the results in the Heart Health Now Case Study 4.2 using the geeCRT R package, and source code and the derived data set to generate the results for the HIV Crowdsourcing Case Study 4.3, 5 using the SAS GEEMAEE macro with user‐supplied pairwise correlation values.
References
- 1. Turner E. L., Li F., Gallis J. A., Prague M., and Murray D. M., “Review of Recent Methodological Developments in Group‐Randomized Trials: Part 1—Design,” American Journal of Public Health 107, no. 6 (2017): 907–915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Turner E. L., Prague M., Gallis J. A., Li F., and Murray D. M., “Review of Recent Methodological Developments in Group‐Randomized Trials: Part 2—Analysis,” American Journal of Public Health 107, no. 7 (2017): 1078–1086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Donner A. and Klar N., Design and Analysis of Cluster Randomization Trials in Health Research, vol. 27 (Arnold London, 2000). [Google Scholar]
- 4. Murray D. M., Design and Analysis of Group‐Randomized Trials, vol. 29 (Oxford University Press, USA, 1998). [Google Scholar]
- 5. Hayes R. and Moulton L., Cluster Randomised Trials (CRC Press, 2009). [Google Scholar]
- 6. Hemming K. and Taljaard M., “Reflection on Modern Methods: When Is a Stepped‐Wedge Cluster Randomized Trial a Good Study Design Choice?,” International Journal of Epidemiology 49, no. 3 (2020): 1043–1052, 10.1093/ije/dyaa077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Hussey M. and Hughes J., “Design and Analysis of Stepped Wedge Cluster Randomized Trials,” Contemporary Clinical Trials 28, no. 2 (2007): 182–191, 10.1016/j.cct.2006.05.007. [DOI] [PubMed] [Google Scholar]
- 8. Hemming K., Haines T., Chilton P., Girling A., and Lilford R., “The Stepped Wedge Cluster Randomised Trial: Rationale, Design, Analysis, and Reporting,” British Medical Journal 350 (2015): h391, 10.1136/bmj.h391. [DOI] [PubMed] [Google Scholar]
- 9. Hemming K., Lilford R., and Girling A., “Stepped‐Wedge Cluster Randomised Controlled Trials: A Generic Framework Including Parallel and Multiple‐Level Designs,” Statistics in Medicine 34, no. 2 (2015): 181–196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Li F., Hughes J. P., Hemming K., Taljaard M., Melnick E. R., and Heagerty P. J., “Mixed‐Effects Models for the Design and Analysis of Stepped Wedge Cluster Randomized Trials: An Overview,” Statistical Methods in Medical Research 30, no. 2 (2021): 612–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Li F. and Wang R., “Stepped Wedge Cluster Randomized Trials: A Methodological Overview,” World Neurosurgery 161 (2022): 323–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Zeger S. L., Liang K. Y., and Albert P. S., “Models for Longitudinal Data: A Generalized Estimating Equation Approach,” Biometrics 44 (1988): 1049–1060. [PubMed] [Google Scholar]
- 13. Ritz J. and Spiegelman D., “Equivalence of Conditional and Marginal Regression Models for Clustered and Longitudinal Data,” Statistical Methods in Medical Research 13 (2004): 309–323. [Google Scholar]
- 14. Young M., Preisser J., Qaqish B., and Wolfson M., “Comparison of Subject‐Specific and Population Averaged Models for Count Data From Cluster‐Unit Intervention Trials,” Statistical Methods in Medical Research 16 (2007): 167–184. [DOI] [PubMed] [Google Scholar]
- 15. Hemming K., Carroll K., Thompson J., et al., “Quality of Stepped‐Wedge Trial Reporting Can Be Reliably Assessed Using an Updated CONSORT: Crowd‐Sourcing Systematic Review,” Journal of Clinical Epidemiology 107 (2019): 77–88. [DOI] [PubMed] [Google Scholar]
- 16. Preisser J. S., Young M. L., Zaccaro D. J., and Wolfson M., “An Integrated Population‐Averaged Approach to the Design, Analysis and Sample Size Determination of Cluster‐Unit Trials,” Statistics in Medicine 22, no. 8 (2003): 1235–1254. [DOI] [PubMed] [Google Scholar]
- 17. Liang K. and Zeger S., “Longitudinal Data Analysis Using Generalized Linear Models,” Biometrika 73, no. 1 (1986): 13–22. [Google Scholar]
- 18. Scott J. M., deCamp A., Juraska M., Fay M. P., and Gilbert P. B., “Finite‐Sample Corrected Generalized Estimating Equation of Population Average Treatment Effects in Stepped Wedge Cluster Randomized Trials,” Statistical Methods in Medical Research 26, no. 2 (2017): 583–597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Preisser J., Lu B., and Qaqish B., “Finite Sample Adjustments in Estimating Equations and Covariance Estimators for Intracluster Correlations,” Statistics in Medicine 27, no. 27 (2008): 5764–5785. [DOI] [PubMed] [Google Scholar]
- 20. Li F., Turner E. L., and Preisser J. S., “Sample Size Determination for GEE Analyses of Stepped Wedge Cluster Randomized Trials,” Biometrics 74, no. 4 (2018): 1450–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Shults J., Sun W., Tu X., and Amsterdam J., “On the Violation of Bounds for the Correlation in Generalized Estimating Equation Analyses of Binary Data From Longitudinal Trials,” 2006. The Berkeley Electronic Press, Working Paper 46, http://biostatsresearch.com/upennbiostat/papers/art8.
- 22. Preisser J. and Qaqish B., “A Comparison of Methods for Simulating Correlated Binary Variables With Specified Marginal Means and Correlations,” Journal of Statistical Computation and Simulation 84, no. 11 (2014): 2441–2452. [Google Scholar]
- 23. Rathouz P. and Preisser J., “Missing Data: Weighting and Imputation,” in Encyclopedia of Health Economics, Volume 2: Third Edition, Revised and Expanded, ed. Michael Jones A. (Elsevier, Inc., 2014). [Google Scholar]
- 24. Preisser J., Lohman K., and Rathouz P., “Performance of Weighted Estimating Equations in Longitudinal Studies With Dropouts Missing at Random,” Statistics in Medicine 21, no. 20 (2002): 3035–3054. [DOI] [PubMed] [Google Scholar]
- 25. Turner E. L., Yao L., Li F., and Prague M., “Properties and Pitfalls of Weighting as an Alternative to Multilevel Multiple Imputation in Cluster Randomized Trials With Missing Binary Outcomes Under Covariate‐Dependent Missingness,” Statistical Methods in Medical Research 29, no. 5 (2020): 1338–1353. [DOI] [PubMed] [Google Scholar]
- 26. Zhang Y., Preisser J. S., Turner E. L., Rathouz P. J., Toles M., and Li F., “A General Method for Calculating Power for GEE Analysis of Complete and Incomplete Stepped Wedge Cluster Randomized Trials,” Statistical Methods in Medical Research 32 (2023): 71–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Rochon J., “Application of GEE Procedures for Sample Size Calculations in Repeated Measures Experiments,” Statistics in Medicine 17, no. 14 (1998): 1643–1658. [DOI] [PubMed] [Google Scholar]
- 28. Preisser J. S., Reboussin B. A., Song E. Y., and Wolfson M., “The Importance and Role of Intracluster Correlations in Planning Cluster Trials,” Epidemiology 18, no. 5 (2007): 552–560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Chen J., Zhou X., Li F., and Spiegelman D., “Swdpwr: A SAS Macro and an R Package for Power Calculations in Stepped Wedge Cluster Randomized Trials,” Computer Methods and Programs in Biomedicine 213 (2022): 106522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Gallis J. A., Wang X., Rathouz P. J., Preisser J. S., Li F., and Turner E. L., “Power Swgee: GEE‐Based Power Calculations in Stepped Wedge Cluster Randomized Trials,” Stata Journal 22, no. 4 (2022): 811–841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Zhang Y., Preisser J. S., Li F., Turner E. L., and Rathouz P. J., “CRTFASTGEEPWR: A SAS Macro for Power of the Generalized Estimating Equations of Multi‐Period Cluster Randomized Trials With Application to Stepped Wedge Designs,” Journal of Statistical Software, Code Snippets 108 (2024): 1–27. [Google Scholar]
- 32. Ouyang Y., Li F., Preisser J. S., and Taljaard M., “Sample Size Calculators for Planning Stepped‐Wedge Cluster Randomized Trials: A Review and Comparison,” International Journal of Epidemiology 51, no. 6 (2022): 2000–2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Li F., Kasza J., Turner E. L., Rathouz P. J., Forbes A. B., and Preisser J. S., “Generalizing the Information Content for Stepped Wedge Designs: A Marginal Modeling Approach,” Scandinavian Journal of Statistics 50 (2022): 1048–1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Tian Z. and Li F., “Information Content of Stepped Wedge Designs Under the Working Independence Assumption,” Journal of Statistical Planning and Inference 229 (2024): 106097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Liu J. and Li F., “Optimal Designs Using Generalized Estimating Equations in Cluster Randomized Crossover and Stepped Wedge Trials,” Statistical Methods in Medical Research 33, no. 8 (2024): 1299–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Li F., Yu H., Rathouz P. J., Turner E. L., and Preisser J. S., “Marginal Modeling of Cluster‐Period Means and Intraclass Correlations in Stepped Wedge Designs With Binary Outcomes,” Biostatistics 23 (2022): 772–788, 10.1093/biostatistics/kxaa056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Tian Z., Preisser J. S., Esserman D., Turner E. L., Rathouz P. J., and Li F., “Impact of Unequal Cluster Sizes for GEE Analyses of Stepped Wedge Cluster Randomized Trials With Binary Outcomes,” Biometrical Journal 64, no. 3 (2021): 419–439, 10.1002/bimj.202100112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Zhang Y., Preisser J. S., Li F., Turner E. L., Toles M., and Rathouz P. J., “GEEMAEE: A SAS Macro for the Analysis of Correlated Outcomes Based on GEE and Finite‐Sample Adjustments With Application to Cluster Randomized Trials,” Computer Methods and Programs in Biomedicine 230 (2023): 107362, 10.1016/j.cmpb.2023.107362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Barker D., McElduff P., D'Este C., and Campbell M. J., “Stepped Wedge Cluster Randomised Trials: A Review of the Statistical Methodology Used and Available,” BMC Medical Research Methodology 16 (2016): 69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Nevins P., Ryan M., Davis‐Plourde K., et al., “Adherence to Key Recommendations for Design and Analysis of Stepped‐Wedge Cluster Randomized Trials: A Review of Trials Published 2016‐2022,” Clinical Trials 21, no. 2 (2024): 199–210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Nevins P., Davis‐Plourde K., Macedo J. P., et al., “A Scoping Review Described Diversity in Methods of Randomization and Reporting of Baseline Balance in Stepped‐Wedge Cluster Randomized Trials,” Journal of Clinical Epidemiology 157 (2023): 134–145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Tang W., Wei C., Cao B., et al., “Crowdsourcing to Expand HIV Testing Among Men Who Have Sex With Men in China: A Closed Cohort Stepped Wedge Cluster Randomized Controlled Trial,” PLoS Medicine 15, no. 8 (2018): e1002645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Grantham K., Forbes A., Hooper R., and Kasza J., “The Staircase Cluster Randomised Trial Design: A Pragmatic Alternative to the Stepped Wedge,” Statistical Methods in Medical Research 33, no. 1 (2024): 24–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Toles M., Colón‐Emeric C., Hanson L., et al., “Transitional Care From Skilled Nursing Facilities to Home: Study Protocol for a Stepped Wedge Cluster Randomized Trial,” Trials 22 (2021): 120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Toles M., Preisser J., Colón‐Emeric C., et al., “Connect‐Home Transitional Care From Skilled Nursing Facilities to Home: A Stepped Wedge, Cluster Randomized Trial,” Journal of the American Geriatrics Society 71 (2023): 1068–1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Weiner B., Pignone M., DuBard C., Lefebvre A., Suttie J., and Freburger S., “Advancing Heart Health in North Carolina Primary Care: The Heart Health NOW Study Protocol,” Implementation Science 10 (2015): 160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Kowitt S. D., Goldstein A. O., and Cykert S., “A Heart Healthy Intervention Improved Tobacco Screening Rates and Cessation Support in Primary Care Practices,” Journal of Prevention 43, no. 3 (2022): 375–386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Kasza J., Hooper R., Copas A., and Forbes A., “Sample Size and Power Calculations for Open Cohort Longitudinal Cluster Randomized Trials,” Statistics in Medicine 39, no. 13 (2020): 1871–1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Prentice R., “Correlated Binary Regression With Covariates Specific to Each Binary Observation,” Biometrics 44 (1988): 1033–1048. [PubMed] [Google Scholar]
- 50. Sharples K. and Breslow N., “Regression Analysis of Correlated Binary Data: Some Small Sample Results for the Estimating Equation Approach,” Journal of Statistical Computation and Simulation 42 (1992): 1–20, 10.1080/00949659208811406. [DOI] [Google Scholar]
- 51. Wang X., Turner E. L., and Li F., “Designing Individually Randomized Group Treatment Trials With Repeated Outcome Measurements Using Generalized Estimating Equations,” Statistics in Medicine 43, no. 2 (2024): 358–378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Teerenstra S., Lu B., Preisser J. S., Van Achterberg T., and Borm G. F., “Sample Size Considerations for GEE Analyses of Three‐Level Cluster Randomized Trials,” Biometrics 66, no. 4 (2010): 1230–1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Wang X., Turner E. L., Preisser J. S., and Li F., “Power Considerations for Generalized Estimating Equations Analyses of Four‐Level Cluster Randomized Trials,” Biometrical Journal 64 (2021): 663–680, 10.1002/bimj.202100081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Li F., Forbes A. B., Turner E. L., and Preisser J. S., “Power and Sample Size Requirements for GEE Analyses of Cluster Randomized Crossover Trials,” Statistics in Medicine 38, no. 4 (2019): 636–649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Li F., “Design and Analysis Considerations for Cohort Stepped Wedge Cluster Randomized Trials With a Decay Correlation Structure,” Statistics in Medicine 39, no. 4 (2020): 438–455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Kenny A., Voldal E. C., Xia F., Heagerty P. J., and Hughes J. P., “Analysis of Stepped Wedge Cluster Randomized Trials in the Presence of a Time‐Varying Treatment Effect,” Statistics in Medicine 41 (2022): 4311–4339, 10.1002/sim.9511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Hughes J. P., Granston T. S., and Heagerty P. J., “Current Issues in the Design and Analysis of Stepped Wedge Trials,” Contemporary Clinical Trials 45 (2015): 55–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Schumacher J. R., Zahrieh D., Chow S., et al., “Increasing Socioeconomically Disadvantaged Patients' Engagement in Breast Cancer Surgery Decision‐Making Through a Shared Decision‐Making Intervention (A231701CD): Protocol for a Cluster Randomised Clinical Trial,” BMJ Open 12, no. 11 (2022): e063895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Kasza J., Hemming K., Hooper R., Matthews J., and Forbes A., “Impact of Non‐Uniform Correlation Structure on Sample Size and Power in Multiple‐Period Cluster Randomised Trials,” Statistical Methods in Medical Research 28, no. 3 (2019): 703–716. [DOI] [PubMed] [Google Scholar]
- 60. Wang Y. and Carey V., “Working Correlation Structure Misspecification, Estimation and Covariate Design: Implications for Generalized Estimating Equations Performance,” Biometrika 90 (2002): 29–41. [Google Scholar]
- 61. Lu B., Preisser J. S., Qaqish B. F., Suchindran C., Bangdiwala S. I., and Wolfson M., “A Comparison of Two Bias‐Corrected Covariance Estimators for Generalized Estimating Equations,” Biometrics 63, no. 3 (2007): 935–941. [DOI] [PubMed] [Google Scholar]
- 62. Kauermann G. and Carroll R. J., “A Note on the Efficiency of Sandwich Covariance Matrix Estimation,” Journal of the American Statistical Association 96, no. 456 (2001): 1387–1396. [Google Scholar]
- 63. Mancl L. A. and DeRouen T. A., “A Covariance Estimator for GEE With Improved Small‐Sample Properties,” Biometrics 57, no. 1 (2001): 126–134. [DOI] [PubMed] [Google Scholar]
- 64. Thompson J., Hemming K., Forbes A., Fielding K., and Hayes R., “Comparison of Small‐Sample Standard‐Error Corrections for Generalised Estimating Equations in Stepped Wedge Cluster Randomised Trials With a Binary Outcome: A Simulation Study,” Statistical Methods in Medical Research 30, no. 2 (2021): 425–439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Fay M. P. and Graubard B. I., “Small‐Sample Adjustments for Wald‐Type Tests Using Sandwich Estimators,” Biometrics 57, no. 4 (2001): 1198–1206. [DOI] [PubMed] [Google Scholar]
- 66. Davis‐Plourde T. M. L. F., “Sample Size Considerations for Stepped Wedge Designs With Subclusters,” Biometrics 79 (2023): 98–112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Ford W. P. and Westgate P. M., “Maintaining the Validity of Inference in Small‐Sample Stepped Wedge Cluster Randomized Trials With Binary Outcomes When Using Generalized Estimating Equations,” Statistics in Medicine 39, no. 21 (2020): 2779–2792. [DOI] [PubMed] [Google Scholar]
- 68. Archbold P., Stewart B., Greenlick M., and Harvath T., “Mutuality and Preparedness as Predictors of Caregiver Role Strain,” Research in Nursing and Health 13 (1990): 375–384. [DOI] [PubMed] [Google Scholar]
- 69. Maleyeff L., Li F., Haneuse S., and Wang R., “Assessing Exposure‐Time Treatment Effect Heterogeneity in Stepped‐Wedge Cluster Randomized Trials,” Biometrics 79, no. 3 (2022): 2551–2564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Wang B., Wang X., and Li F., “How to Achieve Model‐Robust Inference in Stepped Wedge Trials With Model‐Based Methods?,” Biometrics 80, no. 4 (2024): ujae123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Hin L. Y. and Wang Y. G., “Working‐Correlation‐Structure Identification in Generalized Estimating Equations,” Statistics in Medicine 28, no. 4 (2009): 642–658. [DOI] [PubMed] [Google Scholar]
- 72. Pence B., Gaynes B., Udedi M., et al., “Two Implementation Strategies to Support Integration of Depression Screening and Treatment Into Hypertension and Diabetes Medical Care in Malawi (SHARP): Parallel, Cluster‐Randomized, Controlled, Implementation Trial,” Lancet Global Health 12 (2024): e652–e661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Taljaard M., Teerenstra S., Ivers N. M., and Fergusson D. A., “Substantial Risks Associated With Few Clusters in Cluster Randomized and Stepped Wedge Designs,” Clinical Trials 13, no. 4 (2016): 459–463. [DOI] [PubMed] [Google Scholar]
- 74. Hemming K., Taljaard M., and Forbes A., “Modeling Clustering and Treatment Effect Heterogeneity in Parallel and Stepped‐Wedge Cluster Randomized Trials,” Statistics in Medicine 37, no. 6 (2018): 883–898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Rezaei‐Darzi E., Kasza J., Forbes A., and Bowden R., “Use of Information Criteria for Selecting a Correlation Structure for Longitudinal Cluster Randomised Trials,” Clinical Trials 19, no. 3 (2022): 316–325. [DOI] [PubMed] [Google Scholar]
- 76. Pan W., “Akaike's Information Criterion in Generalized Estimating Equations,” Biometrics 37 (2001): 120–125. [DOI] [PubMed] [Google Scholar]
- 77. Cantoni E., Flemming J. M., and Ronchetti E., “Variable Selection for Marginal Longitudinal Generalized Linear Models,” Biometrics 61 (2005): 507–514, 10.1111/j.1541-0420.2005.00331.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1: Supporting Information.
Data Availability Statement
All data and source code is available via GitHub at https://github.com/XueqiWang/SW‐CRT_tutorial. Specifically, source code to reproduce the simulated data and to generate the results in the Connect‐Home Case Study 4.1 using the SAS GEEMAEE macro, source code and the aggregated data set to generate the results in the Heart Health Now Case Study 4.2 using the geeCRT R package, and source code and the derived data set to generate the results for the HIV Crowdsourcing Case Study 4.3, 5 using the SAS GEEMAEE macro with user‐supplied pairwise correlation values.
