Summary
Estimating parameters corresponding to mean outcomes and their intricate association structures in cluster randomized trials (CRTs) can pose significant methodological challenges. This paper introduces a novel framework that leverages network concepts to represent complex dependency structures and estimate these parameters using generalized estimating equations (GEE). We focus on modeling complex correlation structures by partitioning observations into potentially overlapping groups of interrelated data, where observations are assumed locally exchangeable within each group. This network GEE framework is inherently flexible, and we demonstrate its application to multiple exchangeable structures (simple, nested, block), moving average structures, and exponential decay structures. Furthermore, to address computational challenges arising in GEEs with large cluster sizes, we present the networkGEE R package, enabling the fitting of models beyond the capabilities of existing statistical software. The proposed methods are evaluated through extensive simulation studies. To illustrate their practical application, we analyze data from the Washington State Expedited Partners Therapy trial, a stepped-wedge CRT designed to assess the impact of a public health intervention aimed at reducing sexually transmitted infections through free patient-delivered partner therapy.
Keywords: clustered data, intracluster correlation, stepped wedge design, generalized equicorrelated marginal means assumptions (GEMMA), stochastic optimization
1. Introduction
The complex nature of data arising from cluster randomized trials (CRTs) necessitates advanced analytical methods capable of accurately capturing and analyzing hierarchical correlation structures. CRTs arise naturally in contexts where individual randomization is impractical or undesirable, and address challenges such as treatment contamination and administrative efficiency. The evolution of CRTs, including stepped wedge CRTs (SW-CRTs) (Hussey and Hughes, 2007) and 3-level hierarchical CRTs (Liu and Colditz, 2020), has introduced increasingly complex designs that require expanded analytical approaches. SW-CRTs are characterized by their staggered, systematic transition of clusters from a control to an intervention condition, ensuring that all receive the intervention by the study’s conclusion, while 3-level hierarchical CRTs outline a structure where interventions are allocated at a primary level (e.g., medical practices), which subsequently influences secondary (e.g., healthcare providers) and tertiary (e.g., patients) levels. This layered approach to randomization and intervention implementation underscores the intricate dependency structures inherent in modern CRTs, where outcomes are measured at the individual level but are influenced by group-level dynamics.
In CRTs, the focus often rests on estimating population-level effects that are marginal with respect to the clustering of data. To align with this objective, our analytical strategy adopts marginal models, primarily facilitated through the utilization of generalized estimating equations (GEE) (Liang and Zeger, 1986). When the intricacies of correlation parameters from a 2-level hierarchical structure become critical to inference, second-order GEE (GEE2) (Liang and Zeger, 1992; Ziegler, 1995) has been used, which establishes further estimation equations for residual pairs. The Washington State Expedited Partner Therapy (EPT) trial serves as a compelling case study highlighting how complex correlation structures can complicate inference (Harrison and Wang, 2021; Tian et al., 2022; Ouyang et al., 2023). This trial, aimed at evaluating the effectiveness of free patient-delivered partner therapy in the prevention of sexually transmitted infections, implemented its intervention across cluster–periods, which are defined as combinations of local health jurisdictions and time periods within the stepped-wedge design (Golden et al., 2015). Another defining feature of the Washington State EPT trial is its large sample size within each cluster-period cell, presenting computational challenges for fitting traditional GEE. For instance, the largest local health jurisdiction included 6410 individuals between the ages 14 and 25 across five periods, necessitating the computation of million residual pairs for accurate estimation of the intraclass correlations. This computational complexity is illustrated in Appendix A, which delineates the trial’s design and the voluminous data involved. Li et al. (2021) proposed a cluster-period-level analysis to address the computational challenges associated with large SW-CRT datasets. However, their approach cannot accommodate individual-level data analysis nor adjust for individual-level covariates.
With estimation already challenging for mean parameters in cluster-correlated models, it is even more difficult for correlation parameters. In practice, analyses often default to convenient working correlation structures, such as exchangeable or exponential decay, because they simplify computation and interpretation, but at the cost of potential misspecification. Misspecified correlation structures can lead to inefficient or biased standard errors, undermining inference (Kasza et al., 2019; Hemming and Taljaard, 2016; Li, 2020). These challenges are even more notable in stepped-wedge CRTs and three-level hierarchical CRTs, where dependencies span both cluster–period combinations and inter-cluster relationships, respectively (Liu and Colditz, 2020; Li et al., 2023; Kerkhoff and Nussbeck, 2023). Despite advances in variance estimation and design formulas, accurately capturing correlation patterns in such settings remains difficult, particularly with limited numbers of clusters or heterogeneous cluster sizes. Building on these challenges, our research introduces a novel analytical framework based on network theory. Network theory examines the relationships and dependencies between units (e.g., individuals, clusters) as inter-connected nodes within a broader system, allowing for a more comprehensive understanding of how associations within and across clusters influence overall outcomes. We introduce the concept of network GEE, an advancement over GEE2 that provides a detailed inference mechanism for correlation parameters shaped by network relationships. Network GEE presents a holistic approach, formulating estimating equations that effectively capture more intricate dependency structures. In this work, we demonstrate this representation for multiple exchangeable structures (simple, nested, block), moving average structures, and exponential decay structures.
Alongside specification issues mentioned above, computational challenges also arise in GEE applications with large clusters (Carey et al., 1993; Ziegler et al., 1998). These challenges are even more pronounced when employing GEE2 (Chen et al., 2020), particularly within the nuanced context of SW-CRTs (Li et al., 2021). In addition to theoretical advances, this work introduces the eponymous networkGEE R package, designed to facilitate the application of our methods for the aforementioned exchangeable structures. Our package directly confronts these obstacles by incorporating advanced programming strategies, numerical optimization, and stochastic approximation methods, thereby greatly improving the efficiency and practicality of GEE-based analyses. Among its contributions is the implementation of a nested exchangeable three-level hierarchical correlation structure with unequal nested cluster sizes, leveraging a derived analytical inverse. To the best of our knowledge, this represents the first such derivation and packaged implementation. While similar ideas have been discussed in the context of sample size determination, they often rely on simplifying assumptions such as uniform cluster sizes (Liu and Colditz, 2020).
The remainder of the paper is organized as follows. Section 2 introduces relevant notation and examples, linking mathematical networks to existing complex correlation structures reflected in data from existing, commonly used CRT designs. Section 3 delves into the core of our contribution, the network GEE estimation procedure, covering the methodology, estimation strategies, inference, and computational techniques. In Section 4, we assess the performance of network GEE through simulation studies, showcasing its robustness and efficiency compared to traditional methods. Section 5 applies our methods to the Washington State EPT trial, illustrating how network GEE adeptly manages the trial’s complex multilevel structure and large cluster sizes. Finally, we conclude the paper with a Discussion in Section 6.
2. Network Notation and Examples
Example 1 (Simple Exchangeable). In building up the intuition to network GEE, we begin with the 2-level hierarchical structure consisting of clusters, each containing individuals. The outcome for individual in cluster is denoted by , which can be continuous, dichotomous, or count, and the covariates are given by . Let be the appropriate inverse-link function (e.g. identity, expit, exp) and . The equicorrelated marginal means assumption (EMMA) is defined as:
| (2.1) |
for , where are the marginal mean coefficients, dispersion parameter, and equicorrelation parameter, respectively. The final condition in (2.1) establishes the simple exchangeable correlation structure. A linear random intercept model , with randomized intervention , intervention effect , random effects , and subject-specific errors , induces the simple exchangeable structure and is a potential modeling approach for a standard CRT. Hussey and Hughes (2007) extended this simple exchangeable structure to cross-sectional SW-CRTs, switching index to represent time periods and index to represent individuals in their model formulation (henceforth called the H&H model), where are period-specific fixed effects. The defining feature of the SW design is that clusters are randomly selected to cross over to treatment, meaning for and for . The defining feature of a cross-sectional design is that a unique set of individuals is recruited for each period within a cluster. Although we use three indices , the index represents fixed effects, so the correlation structure remains simple exchangeable. The marginal coefficients are given by with corresponding covariates , where is the indicator function.
Definition 1 (Hypergraph). In any design with nested observations, indices are important in designating the correlation effects between grouped outcomes; their use could have been omitted in the mean and dispersion components in lieu of the more general form and . For more complicated correlation structures, such as those that can be induced by by -level hierarchical models or SW designs, index tracking becomes cumbersome due to having more levels or non-nested structures. But understanding that indices are only relevant insofar they represent a labeling, we can instead replace indices with a general labeling and define how different might relate (specifically, are correlated with each other) through the use of graphs. A hypergraph is a pair where is a set of vertices and is a set of non-empty subsets of called edges; is called a graph if all sets within have cardinality 2 (i.e. each edge connects exactly two vertices). Figure 1a represents the correlation structure implied by H&H model for a single cluster from a cross-sectional SW-CRTs over 2 time periods; the first period has a cluster size of 4, while the second has a cluster size of 3. Here, each observation is indexed from with correlations induced by each element of ; here only has one source of correlative effects (due to the cluster itself).
Fig. 1:

Examples of representations of common correlation structures via hypergraphs (a) simple exchangeable (b) nested exchangeable (c) block exchangeable (d) first-order moving average
Example 2 (Nested Exchangeable). The H&H model assumes a homogeneous secular trend across all clusters. A natural extension is to introduce a nested structure by incorporating additional random effects , which capture within-period effects nested within clusters (Hemming et al., 2017): . Figure 1b represents the resulting correlation structure from this extension. The set of vertices remains as defined in Figure 1a, but the edge set is augmented to , where captures the correlative effects introduced by the heterogeneous secular trends. This correlation structure is also one of many potential structures that could arise from a 3-level hierarchical design. Specifically, denotes practices (1st level clustering), represents providers within each practice (2nd level clustering), and corresponds to participants for each provider. Assuming exchangeable correlation induced from each level of clustering, the dependency structure is equivalent to that of the H&H model with heterogeneous secular trends.
Definition 2 (Network). In the nested exchangeable structure, while hypergraph notation does display the dependency structures, it does not classify it; specifically, how do we associate which elements of are due to and which are due to ? Instead, the set does clarify the distinction. In general, we denote as the dependency classes of our model, and collectively will be called the dependency structure. Then, we shall define a network , with replacing the role of edge set from the hypergraph formulation. While are called classes, the elements within each , are called edges.
Example 3 (Block Exchangeable). Consider the cohort SW-CRT design (Li et al., 2018) with outcomes for clusters followed over time periods with the individuals per cluster . Unlike the cross-sectional SW-CRT, which enrolls a different group of individuals at each time step in each cluster, a cohort SW-CRT identifies a fixed cohort of individuals at the outset and follows them for repeated outcome assessments over time. This design introduces three distinct correlation blocks: within-cluster, within-period, and within-individual correlations. A linear random intercepts model that captures equicorrelation within each block can be expressed as , with the additional random effect capturing the within-individual correlation block. Figure 1c illustrates the corresponding network structure. The network representation for this structure is given by vertices , dependency structure , and dependency classes , and , where dependency classes represent the three aforementioned correlation blocks.
Example 4 (Moving average correlation). Departing from the exchangeable structures in the first three examples, we revisit the block exchangeable structure but now assume that within-period correlations exhibit an abrupt drop-off after one time period. Specifically, we define the correlation structure as follows:
which corresponds to a first-order moving average (MA(1)) structure. Here, the within-individual correlation is assumed to be nonzero only for adjacent time periods, with correlations for two or more time periods apart being zero. This structure captures a localized temporal dependency, providing a contrast to the long-range dependencies in exchangeable structures. Figure 1d illustrates a particular cluster of size 3 observed over 4 time periods showing the correlations consistent with this MA(1) structure. This network can be represented with vertices , dependency structure , and dependency classes , , and . A further example representing the MA(2) correlation structure is presented in Appendix B.
Building on the network notation introduced earlier, we generalize the equicorrelated marginal means assumption (EMMA) from Equation 2.1 to a broader class of correlation structures. Examples 1–3 fall within this generalized framework, while the MA(1) structure in Example 4 illustrates how our network definitions can accommodate non-exchangeable settings. For the remainder of the paper, however, we focus on designs that align directly with the EMMA generalization.
A network is said to be identifiable if for all and , ensuring that any edge within a class is uniquely associated with a single source of dependency. To formalize the relationships between correlation classes, we define as the set of all pairwise combinations of elements within each edge from . A class is said to be nested within another class (denoted ) if . For instance, in Example 3, and , as visually represented by the fact that (blue dashed) and (red dotted) are contained within (black solid). Finally, for , let
be the set of all pairwise elements from which are not already found elsewhere in for classes nested within ; this notion is required to avoid attributing pairwise combinations to multiple classes and instead attributing them to only a single class which is not already explained by a “smaller” class. Back to Example 3, since both elements are contained within the black solid box, but are not contained in any blue dashed or red dotted boxes.
The examples above worked with just one cluster; in standard multivariate modeling, we are in possession of multiple independent clusters. Hence, assume , where denotes the disjoint union. Similarly, we partition and all previous notations will include an additional index to denote the cluster, e.g. . We say are in correspondence if are parametrized by the same correlation structure for each ; this ensures that multiple clusters can contribute information to estimate the same correlation parameter. Finally, assuming identifiability and correspondence of structures, we propose a generalization of Equation (2.1) to networks which we refer to as the generalized equicorrelated marginal means assumptions (GEMMA):
| (2.2) |
for and . GEMMA effectively states that outcomes are equicorrelated within each class of our network; this is a generalization of the nested exchangeable structure or block exchangeable correlation structures commonly assumed for cross-sectional or closed-cohort designs, respectively (Teerenstra et al., 2010; Li et al., 2019) in that the classes are not necessarily nested in a chain-like fashion (e.g. ) and can handle much more flexible correlation structures. Under GEMMA, the length of the correlation parameter vector equals the common cardinality .
3. Network GEE: Estimation, Inference, and Computation
3.1. Estimating equations
Let network with clusters and structures . Assuming GEMMA, we define Network GEE (NGEE) as the following estimating equations (EE):
| (3.3) |
where
are standardized residuals, and is a vector of ones of length . The components are referred to as the design matrices, working covariance matrices, and error vectors, respectively, where and . The strength in GEE is that correctly specifying and ensures semiparametric efficiency, but if not, provided , we still have guaranteed consistent and asymptotically normal (CAN) estimators for under modest regularity conditions, a result which immediately follows from the theory of -estimators (Van der Vaart, 2000). As noted earlier, network GEE provides a general framework for estimating correlation parameters under complex dependency structures. While we focus on models satisfying GEMMA, the framework can be extended to hybrid settings, for example, equicorrelation within clusters and periods combined with exponential decay within individuals. Such extensions are handled by partitioning the dependency structure into its components and parameterizing the corresponding correlations within the network representation. Specifically, structures can be divided into , where represents equicorrelated components and represents exponential decay components. Observations in are assumed to follow , where defines the distance between vertices in the network. The estimating equation portions corresponding to the decay parameter are then:
3.2. Covariance estimation
The covariance matrix is typically estimated using the model-based estimator or through sandwich estimators of the form , where
| (3.4) |
with both and evaluated at . Setting and , where denotes the identity matrix, results in the uncorrected sandwich estimator, as proposed by Liang and Zeger (1986). To enhance the robustness of inference, particularly for small , adjustments to and have been proposed (Kauermann and Carroll, 2001; Mancl and DeRouen, 2001; Fay and Graubard, 2001). For the simulation studies presented in Section 4, we implement the correction technique from Fay and Graubard (2001), known as the FG-corrected sandwich variance: and , where denotes the th element of matrix and is the matrix direct sum defined as . The constant limits extreme adjustments, especially when the th element of approaches 1. Following Fay and Graubard (2001) and Li et al. (2018), we set at 0.75 for our simulations. This approach, validated by Scott et al. (2017) through simulation studies, provides accurate estimation of standard errors even with a small number of clusters, addressing the challenges posed in the analysis of the Washington State EPT trial data with .
3.3. Computation: deterministic and stochastic fitting, R package
The conventional approach for solving GEEs utilizes the deterministic Fisher-Scoring or quasi-Newton-Raphson iterative algorithm, represented as:
where denotes the parameter estimate at iteration , starting from an initial estimate . However, this method is recognized to be computationally demanding, particularly for analyses involving large cluster sizes or a significant number of clusters, and may face convergence issues (Chen et al., 2020; Li et al., 2021). To address these challenges, we propose a stochastic variant of the Newton-Raphson method, formulated as:
where the sequence satisfies and (Chen et al., 2020; Byrd et al., 2016), and and represent computationally efficient, unbiased approximations of and , respectively. An effective strategy for deriving these unbiased approximations involves selecting a subset of clusters and observations within each cluster, with the subset size optimally adjusted for enhanced algorithm performance, and incorporating additional weighting adjustments. Based on our experience, selecting between 30 to 50 clusters and 5 to 20 observations per cluster generally yields satisfactory results. We delineate the expressions for and tailored to simple exchangeable (Example 1), nested exchangeable (Example 2), and block exchangeable (Example 3), in Appendix D.2–D.4. Drawing upon the work of Byrd et al. (2016), we note that converges almost surely to the same solution as under mild regularity conditions. For a more comprehensive discussion on the principles and reasoning behind this stochastic optimization approach to GEEs, referred to as stochastic GEE, see Chen et al. (2020). In this manuscript, we adopt the term stochastic NGEE to describe this fitting procedure.
These numerical methods have been implemented in our R package networkGEE, with much of the underlying computation done in C++ to ensure further computational efficiency. Within this package, we assume the following simplified forms of to allow separate estimations of . Given that the link functions for are the identity, we have and . Furthermore, we take and , an assumption also utilized by popular R packages such as geepack (Yan and Fine, 2004) and geeM (McDaniel et al., 2013). Even with these simplifying assumptions, inverting is of time complexity . Exploiting analytical inverses for further reduces computational bottlenecks to the matrix multiplications of within and within , which has complexity . Analytical inverses for under the simple and block exchangeable have been derived (Li et al., 2018). Decomposing , where and , we can represent the correlation of a nested exchangeable as
where and . The inverse of this matrix is equal to (Appendix C, and also Li et al. (2019); Tian et al. (2022))
where and
Note that setting results in the analytical inverse derived in Teerenstra et al. (2010) assuming equal sub-cluster sizes. Finally, networkGEE takes it one step further by exploiting further analytical simplifications and vectorizations which reduce the complexities down to for select correlation structures, and stochastic NGEE can further reduce this time down to (see Appendix E for proofs of these).
Currently, networkGEE implements NGEE to simple, nested, and block exchangeable correlations for binary and continuous outcomes; standard error sandwich estimators include both the Liang and Zeger (1986) uncorrected and Fay and Graubard (2001) corrected estimators; fitting methods include both deterministic and stochastic fitting.
4. Simulation Study
We conducted two complementary sets of simulation studies to evaluate our proposed methodology. The first set, which we refer to as the “statistical simulations”, focused on the finite-sample statistical properties of the proposed estimators, including mean-squared error (MSE), coverage probability, and confidence interval (CI) lengths. The second set, termed the “computational simulations”, assessed the computational times and capabilities of our method relative to existing packages under increasing data sizes. Both sets were based on separately generated datasets from the same underlying data-generating mechanisms (DGMs) to avoid confounding statistical accuracy with computation-related factors such as convergence or memory allocation.
Our two sets of simulations consisted of binary outcomes covering three distinct design scenarios, corresponding to the correlation structures previously introduced: (1) simple exchangeable, mimicking a cross-sectional SW-CRT; (2) nested exchangeable, mimicking a three-level hierarchical trial; and (3) block exchangeable, mimicking a closed-cohort SW-CRT. These designs span various realistic scenarios by varying the number of clusters and cluster size characteristics, while fixing the dispersion parameter to . The structures align with Figures 1a–1c, and the corresponding DGMs are described below. For the simple exchangeable, we generate outcomes as:
| (4.5) |
with , and . In this SW-CRT setup, each cluster began in the control condition and was randomized to the intervention ( for , where was randomly assigned). For the nested exchangeable scenario, outcomes were generated as:
| (4.6) |
with , and . Here, we can conceptualize an individually randomized trial where outcomes reflect a 3-level hierarchical structure. For example, randomized individuals are nested within providers, who are further nested within clinics (Li et al., 2023). Here, is a treatment indicator, and denotes a baseline covariate. For the block exchangeable scenario, we consider a closed-cohort SW-CRT setup, where the outcomes were generated as:
| (4.7) |
with , , and . This scenario includes within-period (due to ) and within-individual (due to ) correlations. All models are simulated from random effects (conditional) models; when outcomes are continuous, the conditional mean parameters equal the marginal mean parameters, and the correlation structures follow GEMMA. For binary outcomes, however, neither is true. Therefore, when comparing model fit estimates with “true” parameter values, we define the truths as the resulting numerical marginalizations. To aid clarity, Appendix F summarizes the design and parameter settings for both sets of simulations under each scenario.
For the statistical simulations, all models were fit using deterministic networkGEE. For each parameter in the parameter vector , we computed empirical coverage using:
where is the number of simulation replicates. Coverage was evaluated against the nominal 95% level. Our primary interest was in estimation of the treatment effect ; results for nuisance parameters are omitted for brevity.
For the computational simulations, we benchmarked the runtime of networkGEE (deterministic and stochastic versions) against the standard geese() function from the geepack R package and geem() from the geem R package. In all cases, we compared performance under increasing cluster sizes while holding the number of clusters fixed at . For nested and block exchangeable designs, geese() and geem() were run with a mis-specified simple exchangeable structure, while networkGEE implemented the true structure. Additional benchmarking was performed using geemaee() from the geeCRT package for the block exchangeable scenario. All benchmarking was conducted on dual-core nodes from the Orchestra cluster at Harvard Medical School.
Table 1 displays MSE, coverage, and confidence-interval length results for the statistical simulations. FG-corrected sandwich variance attains nominal coverage for across all cluster size scenarios. However, confidence intervals for exhibit moderate undercoverage, particularly when cluster sizes are small .
Table 1:
Statistical simulation results across scenarios. Entries show MSE, coverage (%), and CI length for treatment and correlation parameters (’s) under ngee+FG, by second-order cluster size .
| Second-order cluster size K | ||||
|---|---|---|---|---|
| 20 | 40 | 60 | 80 | |
| Simple exchangeable | ||||
| MSE | ||||
| 4.3 × 10−2 | 2.2 × 10−2 | 1.5 × 10−2 | 1.1 × 10−2 | |
| 4.7 × 10−4 | 2.5 × 10−4 | 1.7 × 10−4 | 1.3 × 10−4 | |
| Coverage (%) | ||||
| 93.5 | 94.3 | 94.8 | 95.0 | |
| 84.6 | 89.9 | 91.8 | 92.3 | |
| CI length | ||||
| 8.1 × 10−1 | 5.7 × 10−1 | 4.7 × 10−1 | 4.1 × 10−1 | |
| 7.8 × 10−2 | 6.0 × 10−2 | 4.9 × 10−2 | 4.3 × 10−2 | |
| Nested exchangeable | ||||
| MSE | ||||
| 1.04 × 10−2 | 0.56 × 10−2 | 0.38 × 10−2 | 0.28 × 10−2 | |
| 1.72 × 10−4 | 0.81 × 10−4 | 0.62 × 10−4 | 0.43 × 10−4 | |
| 1.39 × 10−4 | 0.64 × 10−4 | 0.48 × 10−4 | 0.34 × 10−4 | |
| Coverage (%) | ||||
| 94.1 | 94.4 | 94.7 | 94.6 | |
| 90.4 | 93.3 | 93.0 | 94.4 | |
| 83.9 | 91.6 | 91.5 | 91.5 | |
| CI length | ||||
| 0.40 | 0.29 | 0.24 | 0.21 | |
| 4.8 × 10−2 | 3.6 × 10−2 | 2.9 × 10−2 | 2.6 × 10−2 | |
| 4.1 × 10−2 | 3.1 × 10−2 | 2.6 × 10−2 | 2.3 × 10−2 | |
| Block exchangeable | ||||
| MSE | ||||
| 7.1 × 10−2 | 3.0 × 10−2 | 2.0 × 10−2 | 1.5 × 10−2 | |
| 9.1 × 10−4 | 4.4 × 10−4 | 3.0 × 10−4 | 2.2 × 10−4 | |
| 6.5 × 10−4 | 3.2 × 10−4 | 2.2 × 10−4 | 1.7 × 10−4 | |
| 1.16 × 10−3 | 0.52 × 10−3 | 0.37 × 10−3 | 0.30 × 10−3 | |
| Coverage (%) | ||||
| 90.9 | 94.0 | 95.1 | 94.7 | |
| 83.9 | 88.6 | 89.8 | 90.4 | |
| 86.0 | 88.1 | 91.1 | 91.7 | |
| 91.5 | 91.7 | 93.6 | 94.5 | |
| CI length | ||||
| 9.5 × 10−1 | 6.8 × 10−1 | 5.5 × 10−1 | 4.7 × 10−1 | |
| 9.9 × 10−2 | 7.3 × 10−2 | 6.1 × 10−2 | 5.3 × 10−2 | |
| 8.8 × 10−2 | 6.6 × 10−2 | 5.5 × 10−2 | 4.8 × 10−2 | |
| 1.2 × 10−1 | 9.0 × 10−2 | 7.3 × 10−2 | 6.4 × 10−2 | |
Table 2 reports results for the computational simulations. For stochastic network GEE (S-NGEE), we ran 100 stochastic iterations, with the initial 50 used as burn-in and the latter 50 averaged to reduce parameter variability (Bottou, 2010). Subsampling sizes for each iteration were 30 clusters and 5 individuals per cluster for scenario 1, (30, 5, 2) from 1st, 2nd, and 3rd levels for scenario 2, and 30 clusters and 5 individuals per cluster for scenario 3. A final deterministic iteration ensured stability of results and produced components needed to compute the sandwich standard errors. The geepack package implements most of the computing within C++, although it does not utilize analytical inverses for working covariance matrices and computes per cluster, while geeM is a pure R implementation (slower than C++) but utilizes analytical inverses, computing per cluster. Both geese() and geem() support only built-in simple exchangeable structures; therefore, for the nested and block exchangeable scenarios, we applied simple exchangeable correlation structures with these functions, whereas networkGEE utilized the true correlation structures.
Table 2:
Computation times (milliseconds) of NGEE, S-NGEE, geese, geem.
| K | 20 | 40 | 60 | 80 |
|---|---|---|---|---|
| H&H cross-sectional SW-CRT | ||||
| NGEE | 1.4 ×101 | 2.9 ×101 | 4.4 ×101 | 5.7 ×101 |
| S-NGEE | 2.1 ×101 | 2.2 ×101 | 2.2 ×101 | 2.3 ×101 |
| geese (exchangeable) | 1.0 ×103 | 6.7 ×103 | 1.4 ×104 | 2.3 ×104 |
| geem (exchangeable) | 2.4 ×103 | 1.2 ×104 | 2.8 ×104 | 5.2 ×104 |
| 3-level hierarchical | ||||
| NGEE | 8.6 ×101 | 1.6 ×102 | 2.5 ×102 | 3.5 ×102 |
| S-NGEE | 1.3 ×102 | 1.3 ×102 | 1.3 ×102 | 1.3 ×102 |
| geese (exchangeable) | 8.4 ×104 | 1.7 ×105 | 4.0 ×105 | 5.2 ×105 |
| geem (exchangeable) | 9.6 ×104 | 1.2 ×105 | 1.9 ×105 | 3.8 ×105 |
| Cohort SW-CRT | ||||
| NGEE | 2.1 ×101 | 3.2 ×101 | 4.7 ×101 | 6.2 ×101 |
| S-NGEE | 2.4 ×101 | 2.4 ×101 | 2.4 ×101 | 2.5 ×101 |
| geese (exchangeable) | 1.8 ×103 | 8.8 ×103 | 2.2 ×104 | 4.2 ×104 |
| geem (exchangeable) | 2.0 ×103 | 9.9 ×103 | 2.7 ×104 | 5.1 ×104 |
To enable a more direct comparison, we attempted to fit the true nested exchangeable structure using the zcor custom correlation structure in geese() from the geepack package. However, this approach encountered significant computational challenges, including frequent memory allocation issues and convergence failures for realistic cluster sizes. The only scenario that could be implemented successfully was for a small setting with , which falls outside the primary scope of our simulation study. Supplementary code is included to document this attempt, though no results are presented in the main manuscript due to these limitations.
For the block exchangeable scenario, we utilized the geemaee() function from R package geeCRT, which is based on the work from Li et al. (2018, 2019), and compared computation times between networkGEE, S-NGEE, and geemaee(). The results, presented in Appendix G, demonstrate that deterministic and stochastic networkGEE significantly outperformed geemaee(), with computation times considerably shorter across various cluster sizes. For larger cluster sizes , geemaee() became infeasible, with computation times exceeding 10 hours. Supplementary code has been provided to allow readers to reproduce these findings and confirm the computational performance of the methods.
5. Application to the Washington State EPT Trial
The Washington State EPT trial aimed to assess the efficacy of expedited patient-delivered partner therapy compared to standard partner notification methods for the treatment of chlamydia and gonorrhea infections. The intervention was strategically rolled out to promote patient-delivered partner therapy through both commercial pharmacies and targeted public health partner services, aiming to enhance treatment adoption among the sexual partners of heterosexual patients diagnosed with these infections. Central to the trial’s design was its randomization at the level of local health jurisdictions (LHJs), corresponding to county-level administrative units. In total, data were collected from 22 LHJs, encompassing 141 clinical sites, with randomization occurring in four successive waves from 2007 to 2010 to ensure systematic rollout of the intervention. While patients served as the point of entry into the study and were triaged to receive either EPT or standard partner services, the outcomes of interest were measured on their sexual partners. Each diagnosed patient identified one or more partners, and these partners formed the analytic units for assessing chlamydia outcomes. Henceforth, when we refer to “individuals” in the analysis, we mean the partners of index patients enrolled under the LHJs; all covariates (e.g., sex, age) are defined at the partner level.
Unlike previous analyses by Golden et al. (2015) and Li et al. (2021), which restricted attention to female individuals and used cluster-level summaries, our analysis incorporates both sexes and adjusts for individual-level covariates. Let denote the chlamydia outcome and the vector of covariates for the th individual in time period , at site , within LHJ . Covariates include the intervention status , sex of the individual (0 for female, 1 for male), and age in years, location-shifted so that the mean individual age is zero. The analysis explores six combinations of marginal mean and correlation structures. We fit two mean models:
Covariate-unadjusted: , where accounts for time period effects and represents the average treatment effect
Covariate-adjusted: , adjusting for sex and age, and represents the conditional treatment effect
combined with the following correlation structures:
Independence: .
Simple exchangeable: for
Nested exchangeable:
The covariate-unadjusted + simple exchangeable model reflects the approach used by Golden et al. (2015), while the covariate-unadjusted + nested exchangeable model represents the enhanced methodology proposed by Li et al. (2021), which leveraged cluster-level summaries for computational efficiency. We extended these approaches by incorporating individual-level covariates and expanding the dataset to include both sexes. For completeness, we also included independence correlation as a baseline comparison. Our methodological framework enables the efficient computation of a substantial number of cross-terms–approximately 21 × 106 for the largest LHJ–addressing significant computational challenges faced in earlier analyses.
Table 3 summarizes parameter estimates, standard errors, and computation times for the six models considered. The estimated intervention effects varied across correlation structures, with odds ratios ranging from 0.89 to 0.93 (Models 2, 3, 5, and 6). None of these estimates were statistically significant at the 0.05 level. Moving from simple exchangeable to nested exchangeable correlation reduced standard errors: Model 3 improved on Model 2, and Model 6 improved on Model 5. The variation in treatment effect estimates across models reflects two main phenomena. First, covariate adjustment (Models 4–6) attenuates the treatment effect relative to the unadjusted models (Models 1–3). This attenuation likely reflects residual imbalance: if a covariate is positively associated with the outcome but negatively associated with treatment (or vice versa), the unadjusted estimate will be pulled upward, exaggerating the apparent treatment effect. Adjustment mitigates this bias, leading to estimates closer to the null. Second, differences across correlation structures (e.g., Models 2, 3, 5, 6) arise from how the working covariance matrix reweights clusters of varying sizes. With only 22 LHJs and substantial heterogeneity in cluster sizes, such reweighting can meaningfully shift the estimated effect, even though asymptotic consistency is guaranteed.
Table 3:
Parameter estimates of marginal mean, dispersion, and correlation parameters (and standard errors in parentheses) from the overall analysis of Washington State EPT Trial using networkGEE. is the period fixed effect.
| 1. Ind | 2. SimpExch | 3. NestExch | 4. Ind + Covariates | 5. SimpExch + Covariates | 6. NestExch + Covariates | |
|---|---|---|---|---|---|---|
| Marginal mean | ||||||
| −0.109 (0.076) |
−0.128 (0.066) |
−0.106 (0.055) |
−0.079 (0.074) |
−0.118 (0.060) |
−0.102 (0.055) |
|
| −2.182 (0.083) |
−2.205 (0.086) |
−2.142 (0.082) |
−2.398 (0.082) |
−2.444 (0.086) |
−2.410 (0.088) |
|
| −2.307 (0.058) |
−2.317 (0.064) |
−2.230 (0.067) |
−2.498 (0.062) |
−2.523 (0.070) |
−2.480 (0.078) |
|
| −2.411 (0.058) |
−2.420 (0.072) |
−2.308 (0.069) |
−2.553 (0.058) |
−2.573 (0.076) |
−2.529 (0.084) |
|
| −2.504 (0.079) |
−2.513 (0.078) |
−2.403 (0.073) |
−2.660 (0.079) |
−2.678 (0.083) |
−2.634 (0.083) |
|
| −2.504 (0.095) |
−2.503 (0.105) |
−2.374 (0.099) |
−2.632 (0.095) |
−2.632 (0.109) |
−2.577 (0.112) |
|
| — | — | — | 0.699 (0.100) |
0.733 (0.090) |
0.695 (0.084) |
|
| — | — | — | −0.028 (0.010) |
−0.032 (0.010) |
−0.037 (0.009) |
|
| Correlation | ||||||
| — | 0.0016 (0.0007) |
0.0054 (0.0016) |
— | 0.0020 (0.0008) |
0.0037 (0.0011) |
|
| — | — | −0.0003 (0.0008) |
— | — | 0.0005 (0.0009) |
|
| Fitting time (seconds) | 0.08 | 0.23 | 0.29 | 0.09 | 0.30 | 0.45 |
Covariate effects were consistently strong and statistically significant in Models 4–6. Male individuals had double the odds of infection compared to female individuals , 95% CI: 1.70–2.36), while age was inversely associated with infection risk , 95% CI: 0.95–0.98 per year increase). This effect remains substantial and aligns with expectations, as the age range of 14 to 25 years encompasses a critical period of psychosocial and sexual development, during which behavioral and biological risk factors may evolve rapidly.
6. Discussion
This paper establishes a comprehensive framework for modeling complex correlation structures leveraging network concepts, introducing the concept of network GEE. Network GEE offers a novel method for estimating and making inferences in settings like CRTs. Our approach integrates the GEE methodology with network concepts, enabling flexible and efficient analysis of intricate correlation patterns. In addition, we provide the networkGEE R package to facilitate practical implementation of these methods, demonstrating its utility in handling large-scale, computationally demanding CRTs. For instance, our application to the Washington State EPT trial highlights the computational advantages of our framework, particularly when addressing large and variable cluster sizes. Although our numerical studies focus on complete stepped wedge designs, the networkGEE R package can be used to accommodate incomplete stepped wedge designs with empty cluster-periods (for instance, the class of staircase designs introduced by Grantham et al. (2024)).
While we focus on generalized equicorrelated structures, the flexibility of the network GEE framework allows for extensions to a broader class of correlation structures, including exponential decay, moving average, or other Toeplitz structures. These more complex structures offer additional modeling capabilities but may require deriving analytical inverses or other computationally intensive adaptations, which are not yet implemented in the current software networkGEE package. Extending its implementation to more general correlation structures represents a promising direction for future work.
Finally, addressing missing outcome data remains a critical concern in clinical research. Previous studies have incorporated inverse probability weighting (IPW) into GEE frameworks to account for informative missingness (Chen et al., 2020). Extending such approaches within network GEE enhances robustness not only for treatment effect estimation but also for correlation parameters, which are particularly sensitive to missing data.
Supplementary Material
An R package, networkGEE, is available for download from Github: https://github.com/tomchen00/networkGEE
Appendices can be found in Network_GEE_Supplementary.pdf. The Code & Data .zip file contains code implementing our simulation scenarios as well as the Washington EPT dataset. README.txt includes descriptions of its contents and instructions on usage.
Acknowledgments
The authors thank Professor James P. Hughes for sharing the Washington State EPT study data for our illustrative data analysis in Section 5.
Funding
Research in this article was in part supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health (NIH) R01 AI136947, and a Patient-Centered Outcomes Research Institute Award® (PCORI® Award ME-2022C2-27676). The statements presented in this article are solely the responsibility of the authors and do not necessarily represent the views of the NIH, PCORI® or its Board of Governors or Methodology Committee.
References
- Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, 177–186. Springer. [Google Scholar]
- Byrd RH, Hansen SL, Nocedal J and Singer Y (2016) A stochastic quasi-newton method for large-scale optimization. SIAM Journal on Optimization, 26, 1008–1031. [Google Scholar]
- Carey V, Zeger SL and Diggle P (1993) Modelling multivariate binary data with alternating logistic regressions. Biometrika, 80, 517–526. [Google Scholar]
- Chen T, Tchetgen Tchetgen EJ and Wang R (2020) A stochastic second-order generalized estimating equations approach for estimating association parameters. Journal of Computational and Graphical Statistics, 29, 547–561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fay MP and Graubard BI (2001) Small-sample adjustments for wald-type tests using sandwich estimators. Biometrics, 57, 1198–1206. [DOI] [PubMed] [Google Scholar]
- Golden MR, Kerani RP, Stenger M, Hughes JP, Aubin M, Malinski C and Holmes KK (2015) Uptake and population-level impact of expedited partner therapy (ept) on chlamydia trachomatis and neisseria gonorrhoeae: the washington state community-level randomized trial of ept. PLoS Med, 12, e1001777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grantham KL, Forbes AB, Hooper R and Kasza J (2024) The staircase cluster randomised trial design: a pragmatic alternative to the stepped wedge. Statistical Methods in Medical Research, 33, 24–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harrison LJ and Wang R (2021) Power calculation for analyses of cross-sectional stepped-wedge cluster randomized trials with binary outcomes via generalized estimating equations. Statistics in medicine, 40, 6674–6688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hemming K and Taljaard M (2016) Sample size calculations for stepped wedge and cluster randomised trials: a unified approach. Journal of clinical epidemiology, 69, 137–146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hemming K, Taljaard M and Forbes A (2017) Analysis of cluster randomised stepped wedge trials with repeated cross-sectional samples. Trials, 18, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hussey MA and Hughes JP (2007) Design and analysis of stepped wedge cluster randomized trials. Contemporary clinical trials, 28, 182–191. [DOI] [PubMed] [Google Scholar]
- Kasza J, Hemming K, Hooper R, Matthews J and Forbes A (2019) Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials. Statistical methods in medical research, 28, 703–716. [DOI] [PubMed] [Google Scholar]
- Kauermann G and Carroll RJ (2001) A note on the efficiency of sandwich covariance matrix estimation. Journal of the American Statistical Association, 96, 1387–1396. [Google Scholar]
- Kerkhoff D and Nussbeck FW (2023) Estimation quality and required sample sizes in three-level contextual analysis models. Methodology, 19, 133–151. [Google Scholar]
- Li F (2020) Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure. Statistics in medicine, 39, 438–455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li F, Chen X, Tian Z, Esserman D, Heagerty PJ and Wang R (2023) Designing three-level cluster randomized trials to assess treatment effect heterogeneity. Biostatistics, 24, 833–849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li F, Forbes AB, Turner EL and Preisser JS (2019) Power and sample size requirements for gee analyses of cluster randomized crossover trials. Statistics in medicine, 38, 636–649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li F, Turner EL and Preisser JS (2018) Sample size determination for gee analyses of stepped wedge cluster randomized trials. Biometrics, 74, 1450–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li F, Yu H, Rathouz PJ, Turner EL and Preisser JS (2021) Marginal modeling of cluster-period means and intraclass correlations in stepped wedge designs with binary outcomes. Biostatistics. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liang KY and Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika, 73.1, 13–22. [Google Scholar]
- — (1992) Multivariate regression analyses for categorical data. Journal of the Royal Statistical Society. Series B (Methodological), 3–40. [Google Scholar]
- Liu J and Colditz GA (2020) Sample size calculation in three-level cluster randomized trials using generalized estimating equation models. Statistics in medicine, 39, 3347–3372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mancl LA and DeRouen TA (2001) A covariance estimator for gee with improved small-sample properties. Biometrics, 57, 126–134. [DOI] [PubMed] [Google Scholar]
- McDaniel LS, Henderson NC and Rathouz PJ (2013) Fast pure r implementation of gee: application of the matrix package. The R journal, 5, 181. [PMC free article] [PubMed] [Google Scholar]
- Ouyang Y, Hemming K, Li F and Taljaard M (2023) Estimating intra-cluster correlation coefficients for planning longitudinal cluster randomized trials: a tutorial. International Journal of Epidemiology, dyad062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott JM, deCamp A, Juraska M, Fay MP and Gilbert PB (2017) Finite-sample corrected generalized estimating equation of population average treatment effects in stepped wedge cluster randomized trials. Statistical methods in medical research, 26, 583–597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teerenstra S, Lu B, Preisser JS, Van Achterberg T and Borm GF (2010) Sample size considerations for gee analyses of three-level cluster randomized trials. Biometrics, 66, 1230–1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tian Z, Preisser JS, Esserman D, Turner EL, Rathouz PJ and Li F (2022) Impact of unequal cluster sizes for gee analyses of stepped wedge cluster randomized trials with binary outcomes. Biometrical Journal, 64, 419–439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van der Vaart AW (2000) Asymptotic statistics., vol. 3. Cambridge University Press. [Google Scholar]
- Yan J and Fine J (2004) Estimating equations for association structures. Statistics in medicine, 23, 859–874. [DOI] [PubMed] [Google Scholar]
- Ziegler A (1995) The different parameterizations of the gee1 and the gee2. Springer; New York, 315–324. [Google Scholar]
- Ziegler A, Kastner C and Blettner M (1998) The generalised estimating equations: An annotated bibliography. Biometrical Journal, 40.2, 115–139. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
