Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 May 25.
Published in final edited form as: Biom J. 2021 Mar 10;63(5):1052–1071. doi: 10.1002/bimj.202000230

Sample size and power considerations for cluster randomized trials with count outcomes subject to right truncation

Fan Li 1,2,3, Guangyu Tong 1,3
PMCID: PMC9132617  NIHMSID: NIHMS1804828  PMID: 33751620

Abstract

Cluster randomized trials (CRTs) are widely used in epidemiological and public health studies assessing population-level effect of group-based interventions. One important application of CRTs is the control of vector-borne disease, such as malaria. However, a particular challenge for designing these trials is that the primary outcome involves counts of episodes that are subject to right truncation. While sample size formulas have been developed for CRTs with clustered counts, they are not directly applicable when the counts are right truncated. To address this limitation, we discuss two marginal modeling approaches for the analysis of CRTs with truncated counts and develop two corresponding closed-form sample size formulas to facilitate the design of such trials. The proposed sample size formulas allow investigators to explore the power under a large number of scenarios without computationally intensive simulations. The proposed formulas are validated in extensive simulations. We further explore the implication of right truncation on power and apply the proposed formulas to illustrate the power calculation for a malaria control CRT where the primary outcome is subject to right truncation.

Keywords: arm-specific exchangeable correlation, coefficient of variation, generalized estimating equations, group-randomized trials, Poisson distribution, unequal cluster sizes

1 |. INTRODUCTION

Cluster randomized trials (CRTs) are widely used in epidemiological and public health studies to evaluate the effect of interventions delivered at the group level (Donner & Klar, 2000; Murray, 1998). A notable feature of CRTs is that members within the same cluster share physical, geographical, or social connections, and therefore the outcomes measured within the same cluster tend to be correlated, as reflected by a positive intraclass correlation coefficient (ICC) (Eldridge et al., 2009). The ICC plays a central role in the design and analysis of CRTs. Particularly, it has been well established that sample size and power calculation should adequately reflect the variance inflation due to ICC, and related methods for designing CRTs have been developed for the past several decades and summarized in a recent review (Turner et al., 2017a).

One important application of CRTs is the control of vector-borne disease, such as malaria (Foy et al., 2019; Halliday et al., 2014). In assessing the efficacy of new vector control tools, a cohort of participants without infection may be recruited from each village, which is the unit of randomization. Participants are followed up for active case detection, defined by disease onset or presence of infection, at regular 1-month time intervals. The primary endpoint is the count of malaria episodes within the study duration. However, a frequent issue is that such a primary outcome is right truncated by the maximum number of detections allowable by the study design (Mwandigha et al., 2020). For example, one can in theory assume no more than 12 episodes will be detected for each participant during a 1-year study. However, because malaria is seasonal and treatment could provide a short period of protection against reinfection (Bijker & Sauerwein, 2012; Cairns et al., 2012), it is also reasonable in practice to assume that a maximum of six episodes will be detected within a 1-year trial (White et al., 2015). Mwandigha et al. (2020) conducted a series of simulations and demonstrated that right truncation could attenuate statistical power in random-effects model analysis of CRTs with count outcomes. This attenuation is more pronounced when the number of clusters is smaller than 30, which happens to be the range of sample sizes in most CRTs (Ivers et al., 2011). In a recent commentary, Li and Harhay (2020) found in simulations that the similar power attenuation due to right truncation applies to the marginal analysis of CRTs using the generalized estimating equations (GEE) (Liang & Zeger, 1986). These observations motivate us to develop appropriate sample size procedures to account for outcome truncation in CRTs.

There are several existing sample size formulas developed for analyzing clustered count outcomes, all of which assume the absence of truncation (Amatya et al., 2013; Hayes & Moulton, 2009; Wang et al., 2020). Ogungbenro and Aarons (2010) obtained a closed-form power formula for longitudinal pharmacodynamic studies with repeated count measurements based on the Poisson random-effects model. A linearization approach (Breslow & Clayton, 1993) is taken to approximate the variance expression of the treatment effect. Assuming known variance components, Amatya et al. (2013) proposed a refined sample size formula by deriving the exact form of the information matrix. As indicated by Mwandigha et al. (2020), these closed-form sample size expressions will underestimate the required sample size when the outcomes are subject to right truncation. Furthermore, it is challenging to derive the sample size formula suitable for random-effects models in closed-form under truncation because the corresponding probability mass is a complicated function of the mean. For this reason, Mwandigha et al. (2020) proposed a simulation-based sample size procedure in the spirit of Landau and Stahl (2012).

On the other hand, Li and Harhay (2020) discussed a valid marginal model approach for analyzing CRTs when the outcomes are subject to truncation. The marginal model approach has long been recognized as an alternative to the random-effects model approach and is sometimes preferred in CRT applications due to its population-averaged interpretation (Preisser et al., 2003). Without truncation, Amatya et al. (2013) developed a sample size formula in CRTs based on the marginal log-linear model with the Poisson variance and exchangeable working correlation. Li et al. (2019) developed a sample size formula for longitudinal count data based on the marginal model with the negative binomial variance and independence working correlation. Because outcome truncation attenuates the power of marginal analysis (Li & Harhay, 2020), these existing sample size procedures will underestimate the power and so are not directly applicable to the malaria vector control CRTs. To address this limitation, we develop two new sample size formulas based on marginal models in CRTs with truncated counts. We consider both the independence and arm-specific exchangeable working correlation structures and provide approximations under unequal cluster sizes. The new sample size formulas provide additional insights under outcome truncation and complements the simulation-based procedures of Mwandigha et al. (2020) and Li and Harhay (2020), both of which are computationally intensive when investigators wish to quickly explore a large number of scenarios in the design stage.

The remainder of this article is organized as follows. Section 2 reviews the conditional and marginal model approach for the analysis of CRTs with truncated counts. Section 3 develops sample size formulas based on the marginal model. A simulation study is carried out in Section 4 to investigate the accuracy of the new sample size formulas. In Section 5, we explore the implication of right truncation on power and illustrate the new formulas using the context of a malaria vector control CRT. Section 6 concludes.

2 |. STATISTICAL METHODS

2.1 |. Conditional model

We consider a parallel CRT with N clusters (e.g., villages), where (0 < π < 1) clusters are randomized to the intervention arm and N(1 − π) clusters to the control arm. We assume a total of mi participants are recruited in each cluster for follow-up. Let Yij denote the count outcome (e.g., the number of malaria episodes) recorded for participant j (j = 1, …, mi) in cluster i (i = 1, …, N) for a maximum of follow-up time Lij (e.g., Lij can be expressed in unit of years). For the sake of sample size calculation, we assume a common follow-up time is pre-specified and no dropout, so that Lij = L (discussions on dropout and missing data can be found in Section 6 and Web Appendixes E and F). The following log-linear random-effects model can be used to represent the conditional mean of the count outcome, λij

log(λij)=log(L)+β0*+β1Xi+ϕiXi+ψi(1Xi)=β0+β1Xi+ϕiXi+ψi(1Xi), (1)

Where β0=log(L)+β0* is the intercept accounting for the common follow-up time, Xi is the binary cluster-level intervention indicator (Xi = 1 if assigned to the intervention arm and Xi = 0 if assigned to the control arm), ϕi~N(0,σ12) is the random intercept inducing the positive ICC within the intervention clusters, and ψi~N(0,σ02) is the random intercept inducing the positive ICC within the control clusters. Notice that this model extends the model in Mwandigha et al. (2020), who assumed σ02=σ12. In the absence of right truncation, one typically assumes Yij follows a Poisson distribution with cluster-specific mean λij, leading to the Poisson random-effects model. When the outcome is subject to right truncation by T ≥ 1, we assume Yij follows a right-truncated distribution. A truncated distribution is defined as a conditional distribution derived from restricting the upper bound of the underlying untruncated distribution. The right-truncated distributions frequently arise when the ability to record the occurrences of outcomes is limited to values not exceeding some threshold, which is the case in malaria vector control CRTs, due to the maximum number of episode detections allowable by the study design and other practical considerations. To account for truncation, Mwandigha et al. (2020) assume that the outcome follows a truncated Poisson distribution with

P(Yij=yλij,0YijT)=λijyy!QT(λij),y=0,1,,T, (2)

where Qt(λ)=k=0tλk/k! for t ≥ 0 and zero for t < 0. Conditional on the random intercept ϕi = ϕ or ψi = ψ, we show in Web Appendix A that the mean of Yij in the intervention and control arms to be

λ(1)=exp(β0+β1+ϕ)QT1(exp(β0+β1+ϕ))QT(exp(β0+β1+ϕ)), (3)
λ(0)=exp(β0+ψ)QT1(exp(β0+ψ))QT(exp(β0+ψ)), (4)

both of which are no larger than their respective expectations in the absence of truncation.

When there is no truncation and T = ∞, the probability mass function (2) is the standard Poisson distribution because Q(λ) = exp(λ). In this case, the ith cluster-specific baseline event rate is exp(β0 + ψi), which differs from the overall marginal baseline event rate exp(β0+σ02/2) (Ritz & Spiegelman, 2004; Young et al., 2007). In addition, the cluster-specific or conditional intervention effect is exp(β1) on the rate ratio (RR) scale. As pointed out in Li and Harhay (2020), when there is right truncation, exp(β1) should be interpreted as the RR that corresponds to the unobserved Poisson distribution had there been no truncation. Although the conditional model (1) is a valid approach for the analysis of CRTs with truncated counts, a convenient sample size formula cannot be obtained in closed form because the likelihood function involves a complicated integral. Therefore, Mwandigha et al. (2020) suggested a computationally intensive simulation-based approach for power calculation.

2.2 |. Marginal model

Li and Harhay (2020) proposed to use the marginal model for the analysis of CRTs with right-truncated counts. Specifically, the observed event rate is modeled by the marginal log-linear model

log(μij)=log(L)+γ0*+γ1Xi=γ0+γ1Xi, (5)

where exp(γ0) is the marginal event rate and exp(γ1) is the marginal RR that corresponds to the observed distribution. In the absence of truncation, the results in Ritz and Spiegelman (2004) and Young et al. (2007) implied γ0=β0+σ02/2 and γ1=β1+(σ12σ02)/2. In the special case of σ12=σ02, it is clear that the conditional RR and the marginal RR coincide with each other, a result commonly referred to as collapsibility. When there is right truncation, however, the conditional model is no longer collapsible. In fact, we can marginalize over the random effect in the conditional model and obtain the induced marginal expectation in the intervention and control arms to be (details are provided in Web Appendix B)

μ(1)=exp(β0+β1)Eϕ{exp(ϕ)QT1(exp(β0+β1+ϕ))QT(exp(β0+β1+ϕ))}, (6)
μ(0)=exp(β0)Eψ{exp(ψ)QT1(exp(β0+ψ))QT(exp(β0+ψ))}. (7)

When σ02=σ12, it follows that exp(γ1) = μ(1)/μ(0) ≤ exp(β1). In this case, as the right truncation point T moves towards one, exp(γ1) will move away from exp(β1), suggesting that the right truncation could reduce the effect size corresponding to the observed marginal distribution. Nevertheless, right truncation would also reduce the variance of the intervention effect in the marginal model, and its effect on power could depend on relative change of the effect size and variance.

Estimating the intervention effect in the marginal model can proceed with GEE (Liang & Zeger, 1986). Define θ = (γ0, γ1)′ and Ai=diag(vi1,,vimi) for i = 1, …, N, where vij is the working variance function. For a given working correlation matrix R(ρ) indexed by a common parameter vector ρ, one can define the working variance matrix as V1i=Ai1/2Ri(ρ)Ai1/2 (we include a subscript 1 in V1i to help distinguish the working variances in the presence of multiple estimating equations as in Section 3.2) and solve the following GEE for θ:

i=1ND1iV1i1(Yiμi)=0,

where Yi=(Yi1,,Yimi) is the vector of observed counts in cluster i, μi=(μi1,,μimi) is the vector of marginal means, and D1i = μi/∂θ′ = (1, Xi) ⊗ μi for i = 1, …, N. An advantage of the marginal model approach is that the GEE estimator θ^ is consistent even when the working variance matrix V1i is incorrectly specified. This is especially important because the mean–variance relationship may be challenging to capture parametrically when the outcomes are subject to right truncation. Further, the application of the robust sandwich variance could consistently quantify the uncertainty of θ^ when both the variance function and correlation matrix are incorrectly specified (Liang & Zeger, 1986). The sandwich variance estimator of θ^ can be written as Ω^11Ω^0Ω^11, where Ω^11=(i=1ND^1iV^1i1D^1i)1 is the model-based variance,

Ω^0=i=1ND^1iV^1i1Cov^(Yi)V^1i1D^1i, (8)

and Cov^(Yi)=(Yiμ^i)(Yiμ^i) is the empirical covariance estimator of Yi. Although the sandwich variance is accurate with a sufficiently large number of clusters (usually no fewer than 40), it may be subject to negative bias and the resulting test may carry an inflated type I error rate, if only a small number of clusters are recruited (Turner et al., 2017b). Li and Harhay (2020) considered using the bias-corrected sandwich variance of Kauermann and Carroll (2001) and demonstrated adequate test size in their limited simulations. In Section 4, we will compare several finite-sample corrections to the sandwich variance to make more robust recommendations. Finally, although Li and Harhay (2020) indicated that the marginal model is a valid procedure to analyze truncated counts in CRTs, they have not developed convenient closed-form sample size formulas and instead followed Mwandigha et al. (2020) by proposing a simulation-based procedure. In Section 3, we propose two closed-form sample size procedures based on the marginal analysis of CRTs with truncated counts, which allows investigators to analytically explore a large number of scenarios in the design stage without resorting to intensive simulation-based computations.

3 |. SAMPLE SIZE AND POWER CONSIDERATIONS

Based on the marginal model (5), we are interested in testing the null H0 : γ1 = 0 versus the alternative H1 : γ1 ≠ 0. A two-sided t-test proceeds with the statistic γ^1/Var^(γ^1), which approximately follows a t-distribution with N − 2 degrees of freedom under the null. In this test statistics, γ^1 is the GEE estimate for the log RR, and Var^(γ^1) is the associated variance estimate. We have chosen the t-test here due to its robust control of type I error rate in CRTs when the number of clusters is limited; for example, some empirical evidence of the robust small-sample behavior of the t-test in CRTs can be found in Li et al. (2018, 2019), Li (2020), and Teerenstra et al. (2010). Further, the degrees of freedom N − 2 is referred to as the between-within degrees of freedom and has been recommended for both random-effects model and GEE analyses of CRTs (Ford & Westgate, 2017; Li & Redden, 2015a, 2015b; Li et al., 2017). For a prespecified type I error rate ϵ1 and type II error rate ϵ2, the required number of clusters based on the t-test is the smallest N such that

N(tN2,ϵ1/2+tN2,ϵ2)2σ2Δ2, (9)

where tN−2,q is the qth quantile of the t distribution with N − 2 degrees of freedom, σ2=NVar(γ^1) is the asymptotic variance of the GEE estimator γ^1, and Δ = log{μ(1)/μ(0)} is the effect size on the log RR scale. Operationally, a closed-form sample size formula depends on the derivation of the variance σ2. In what follows, we provide the form of the variance under two modeling strategies, distinguished by different choices of the working variance and working correlation structures.

3.1 |. Independence working correlation

The first modeling strategy is to simply specify the variance function to be the Poisson variance, namely vij = μij, and use the working independence correlation, Ri(ρ)=Ri=Imi for i = 1, …, N, where Is is the s × s identity matrix. In this case, the GEE can be simplified to

i=1N(1Xi)j=1mi(Yijμij)=0,

which we show in Web Appendix C to be equivalent to a moment-based estimator of the log marginal RR

γ^1=log(i=1Nj=1miXiYiji=1NXimi)log(i=1Nj=1mi(1Xi)Yiji=1N(1Xi)mi). (10)

Assuming that the true correlation is exchangeable, and the ICC depends on the intervention arm (Crespi et al., 2009), we can write the true arm-specific exchangeable correlation model as

R˜i(ρ)={R˜i(ρ(0))=(1ρ(0))Imi+ρ(0)Jmi for Xi=0R˜i(ρ(1))=(1ρ(1))Imi+ρ(1)Jmi for Xi=1, (11)

where ρ = (ρ(0), ρ(1))′ includes the ICCs for the control and intervention clusters and i = 1, …, N. Notice that the exchangeable correlation is a common assumption used in CRTs (Turner et al., 2017a), and in the special case where ρ(0) = ρ(1), R˜i(ρ) becomes the usual exchangeable correlation matrix parameterized by a common ICC.

With these assumptions, we show in Web Appendix C that the model-based variance converges

NΩ11p{E(1miRi11mi)}1(1π)μ(0)(1111+{(1π)/π}(μ(0)/μ(1))),

where 1s is the s × 1 vector of one’s, μ(0) = exp(γ0), μ(1) = exp(γ0 + γ1), and the expectation sign is over the cluster size distribution. Under right truncation, the marginal variance is no longer equal to the mean and hence the Poisson working variance is misspecified. We generically denote the true marginal variance in each arm by τ(0) and τ(1), and in Web Appendix C, we show that

N1Ω0p{E(1miRi1R˜i(ρ(0))Ri11mi)}(1π)τ(0)G0+{E(1miRi1R˜i(ρ(1))Ri11mi)}πτ(1)G1,

where the 2 × 2 matrices are defined as

G0=(1000),G1=(1111). (12)

Multiplying out the sandwich variance can lead to an explicit variance expression. Specifically, we obtain the probability limit of the (2,2)-th element of NΩ11Ω0Ω11 as

[NΩ11Ω0Ω11](2,2)pσind2=κ(0)2(1π)E[mi{1+(mi1)ρ(0)}]m¯2+κ(1)2πE[mi{1+(mi1)ρ(1)}]m¯2, (13)

where κ(0)=τ(0)/μ(0) and κ(1)=τ(1)/μ(1) are the coefficient of variation (CV) of the truncated counts in the control and intervention arms, respectively, and m¯=E(mi) is the mean cluster size. Notice that we have used σind2 to specifically refer to the expression of σ2 obtained under the working independence assumption. In the absence of right truncation, and if the marginal distribution of Yij is Poisson, we naturally have κ(0)2=μ(0)1, κ(1)2=μ(1)1 and Equation (13) resembles the variance expression derived in Wang et al. (2020) (up to a delta-method transformation). Further, when the cluster sizes are all equal and mi = m, the variance (13) simplifies to

σind2=κ(0)2{1+(m1)ρ(0)}(1π)m+κ(1)2{1+(m1)ρ(1)}πm, (14)

which includes the variance derived in Amatya et al. (2013) as a special case when π = 1/2, ρ(0) = ρ(1) and κ(0)2=μ(0)1, κ(1)2=μ(1)1. However, expressions (13) and (14) apply more generally to clustered right-truncated counts by allowing for arbitrary mean–variance relationships (e.g., with τ(0)μ(0), τ(1)μ(1)) and arm-specific ICCs. For example, if the count outcome exhibits overdispersion and follows a negative binomial distribution with a quadratic variance function τ(l)=μ(l)+χμ(l)2 for l = 0, 1 (χ is the overdispersion parameter), Equations (13) and (14) characterize a valid asymptotic variance of the independence Poisson GEE fitted to the clustered overdispersed counts.

Variance (13) can be further simplified. Frequently, the moments of the cluster size distribution are more accessible (Rutterford et al., 2015), which we use to reexpress σind2 when the cluster sizes are variable. Particularly, for the control arm we write

E[mi{1+(mi1)ρ(0)}]=m¯[1+{(1+η2)m¯1}ρ(0)],

where η is the CV of the cluster sizes. A similar expression can also be obtained for the intervention arm, and we can obtain the final expression of the variance by

σind2=κ(0)2(1π)m¯[1+{(1+η2)m¯1}ρ(0)]+κ(1)2πm¯[1+{(1+η2)m¯1}ρ(1)]. (15)

Importantly, 1+{(1+η2)m¯1}ρ(0) and 1+{(1+η2)m¯1}ρ(1) carry the same form of the variance inflation factor (VIF) previously derived for the weighted cluster-level analysis of continuous, binary and count outcomes (Eldridge et al., 2006; Kang et al., 2003; Manatunga et al., 2001; Rutterford et al., 2015; Wang et al., 2020). The above analytical expression indicates that the same design effect applies to marginal analysis of clustered truncated counts, under the independence working correlation. The required sample size can then be computed by combing (9) and (15). It is evident that the required sample size increases when either the arm-specific ICC, the CV of the outcome, or the CV of cluster sizes increases.

3.2 |. Arm-specific exchangeable working correlation

Although the within-cluster correlation parameters have traditionally been regarded as nuisance parameters in longitudinal studies (Liang & Zeger, 1986), they are of great interest in CRTs because these values are critical input parameters for sample size determination (Eldridge et al., 2009; Murray, 1998). Reporting ICC parameters, such as ρ(0) and ρ(1), and their standard errors has also been recommended in the CONSORT extensions to CRTs (Campbell et al., 2012) because they could elicit a range of plausible design parameters to improve planning of future trials. For this reason, we provide an alternative strategy to explicit model the arm-specific ICCs with clustered counts subject to right truncation and develop a corresponding sample size procedure.

Define ξi=(ξi1,,ξimi)=(1Xi)τ(0)1mi+Xiτ(1)1mi and ωi=(1Xi)ρ(0)1mi(mi1)/2+Xiρ(1)1mi(mi1)/2 as the arm-specific variance and ICC models, where mi is the ith cluster size, i = 1, …, N. Further write Si=(Si1,,Simi) as the mi × 1 vector of the squared residuals, with Sij = (Yijμij)2, and Zi=(Zi11,Zi12,,Zi,mi1,mi) as the mi(mi − 1)/2 × 1 vector of pairwise standardized residual product, with Zijk=(Yijμij)(Yikμik)/ξijξik. We use the stacked estimating equations (SEE) (Yan & Fine, 2004) to simultaneously estimate the marginal mean, variance, and correlations

i=1N(D1i000D2i000D3i)(V1i000V2i000V3i)1(YiμiSiξiZiωi)=0, (16)

where D1i is defined in Section 2.2, D2i = ξi/τ′, D3i = ωi/ρ′, τ = (τ(0), τ(1))′, and ρ = (ρ(0), ρ(1))′. Under right truncation, because the marginal variance τ(0) and τ(1) will differ from the marginal mean μ(0) and μ(1) in a complex fashion, the central idea of the SEE approach is to model τ(0) and τ(1) nonparametrically through the a saturated variance estimating equation, i=1ND2iV2i1(Siξi)=0,. With a consistent estimator of τ(0) and τ(1), we could then estimate the arm-specific ICCs, ρ(0) and ρ(1), through the correlation-estimating equation, i=1ND3iV3i1(Ziωi)=0. Finally, define A˜i(τ)=(1Xi)τ(0)Imi+Xiτ(1)Imi for i = 1, …, N, then the working variance matrix V1i=A˜i(τ)1/2R˜i(ρ)A˜i(τ)1/2 contains the arm-specific variances and ICCs and is considered as correctly specified because they respect our assumptions on the truncated distributions. To avoid complexities in modeling higher order moments of Yij and improve convergence, we further specify V2i, V3i as identity matrices and therefore the estimators for τ and ρ reduce to the moment-based estimators extending those studied in Liang and Zeger (1986).

It is important to notice that the SEE approach differs from the usual Poisson GEE with an arm-specific exchangeable working correlation. While the SEE assumes a correct working variance, the usual Poisson GEE uses an misspecified working variance vij = μij, leading to biased estimates for ρ(0) and ρ(1). One primary drawback of biased estimates of the ICCs is that they provide misleading information for sample size calculation in future trials. Therefore, we do not consider the Poisson GEE with an arm-specific exchangeable working correlation and focus on the SEE approach instead.

When the true correlation structure is arm-specific exchangeable and follows (11), the asymptotic variance of the SEE estimator θ^ becomes the model-based variance Ω11. Explicitly writing out the model-based variance allows us to derive the following expression for the asymptotic variance of the SEE estimator γ^1. To be specific, in Web Appendix D, we show that

N1Ω1p{E(1miR˜i1(ρ(0))1mi)}(1π)μ(0)2τ(0)G0+{E(1miR˜i1(ρ(1))1mi)}πμ(1)2τ(1)G1,

where the expectation sign is over the cluster size distribution, G0 and G1 are two constant matrices defined in Equation (12). By matrix inversion, we obtain

[NΩ11](2,2)pσaexch2=κ(0)2(1π){E(mi1+(mi1)ρ(0))}1+κ(1)2π{E(mi1+(mi1)ρ(1))}1.

Here we have used σaexch2 to specifically refer to the expression of asymptotic variance σ2 obtained under the true arm-specific exchangeable working correlation structure. In the special case of equal cluster sizes, mi = m for i = 1, …, N, the above expression reduces to

σaexch2=κ(0)2{1+(m1)ρ(0)}(1π)m+κ(1)2{1+(m1)ρ(1)}πm=σind2, (17)

which is identical to the variance (14) derived under the assumption of misspecified variance and correlation structure. This interesting equivalence indicates that there is no asymptotic efficiency loss if the variance is misspecified and the ICC is ignored for estimating γ1, when the cluster sizes are all equal. In fact, this result extends the earlier result of Pan (2001) under the logistic regression for clustered binary outcomes. However, when the cluster sizes are variable, the asymptotic variance σaexch2 would be smaller than σind2, suggesting the need for modeling ICCs in the estimation of the marginal mean parameters.

Similar to Section 3.1, we further approximate the variance expression σaexch2 using the first two moments of the cluster size distribution. Recall that

E(mi1+(mi1)ρ(0))=1ρ(0)E{mimi+(1ρ(0))/ρ(0)}.

Using the approximation techniques developed in the seminal work of van Breukelen et al. (2007) and Candel and van Breukelen (2010), we can find the second-order approximation of the above expression as

E{mimi+(1ρ(0))/ρ(0)}(m¯m¯+(1ρ(0))/ρ(0))[1η2m¯(1ρ(0))/ρ(0){m¯+(1ρ(0))/ρ(0)}2].

Replicating the same approximation for the expression involving ρ(1), we obtain

σaexch2κ(0)2{1+(m¯1)ρ(0)}(1π)m¯[1η2m¯ρ(0)(1ρ(0)){1+(m¯1)ρ(0)}2]1+κ(1)2{1+(m¯1)ρ(1)}πm¯[1η2m¯ρ(1)(1ρ(1)){1+(m¯1)ρ(1)}2]1, (18)

where η is the CV of cluster sizes. While variance expression (15) suggests that σind2 is monotonically increasing in the ICC and CV of cluster sizes, the variance expression (18) suggests a more subtle relationship between these parameters. First, compared to the variance expression under equal cluster sizes (17), the additional arm-specific VIF due to unequal cluster sizes is

f(η,ρ(l))=[1η2m¯ρ(l)(1ρ(l)){1+(m¯1)ρ(l)}2]1,l=0,1,

whose maximum is reached at ρ(l)=(m¯+1)1 (van Breukelen et al., 2007). This means that the variance inflation in each arm due to unequal cluster sizes is bounded above by f(η,(m¯+1)1) and does not further increase even when the ICC exceeds (m¯+1)1. Second, the VIF f(η, ρ(l)) heavily depends on the mean cluster size. As the mean cluster size increases, f(η, ρ(l)) approaches one, indicating that unequal cluster sizes should have minimum impact on σaexch2, and hence the power of the analysis becomes robust to cluster size variation. These observations highlight the potential efficiency advantage in correctly modeling the variance and correlations using the SEE approach. Finally, based on the assumed arm-specific ICCs, the CV of the outcome, mean cluster size, and the CV of cluster sizes, the required sample size can be computed by combing Equations (9) and (18).

4 |. SIMULATION STUDIES

4.1 |. Simulation design

We carry out a simulation study to investigate the accuracy of the proposed sample size formulas for marginal analysis of CRTs subject to outcome truncation. We assume a balanced allocation and π = 1/2. We simulate clustered count outcomes from the conditional model (1), with two sets of conditional baseline event rate exp(β0) ∈ {1.25, 2.70}. These two choices follow Mwandigha et al. (2020) and Li and Harhay (2020) and represent low to moderate event rates in malaria CRTs. We also consider six levels of variance components parameters,

(σ02,σ12){(0.05,0.05),(0.05,0.10),(0.05,0.20),(0.10,0.10),(0.10,0.20),(0.20,0.20)},

implying different degrees of clustering within the two arms. A larger value of σ02 or σ12 corresponds to a higher value of ICC within the control or treatment arm, and we provide exact conversion formulas to connect the variance components with arm-specific ICCs in Web Appendix B. These values of variance components are chosen such that induced ICCs are contained between 0.01 and 0.2, a range commonly reported in CRTs (Murray & Blitstein, 2003; Murray et al., 2004). Under the null of no marginal intervention effect, we numerically determine the conditional RR exp(β1) used for data generation based on the values of the variance components. Under the alternative, we choose exp(β1) = 0.55 when exp(β0) = 1.25 and exp(β1) = 0.60 when exp(β0) = 2.70, and numerically compute the corresponding marginal RR (the effect size for the marginal model) exp(Δ) = μ(1)/μ(0) by Equations (6) and (7). The effect sizes are chosen following Mwandigha et al. (2020). We assume the mean cluster size m¯=25 when exp(β0) = 1.25 and m¯=50 when exp(β0) = 2.70 and consider four levels of cluster size variability with CV η ∈ {0, 0.3, 0.6, 0.9}; these CV values are within the range commonly considered for CRTs, except for CV = 0.9 which is very extreme (Li & Redden, 2015b; van Breukelen et al., 2007). We also consider four levels of right truncation, with truncation points T ∈ {∞, 4, 2, 1} when exp(β0) = 1.25, and T ∈ {∞, 6, 3, 1} when exp(β0) = 2.70. These truncation thresholds reflect the anticipated practice in malaria CRTs (Mwandigha et al., 2020). Collectively, we design a factorial experiment with two levels of baseline event rates, six levels of variance components, four levels of cluster size variability, and four levels of truncation points, totaling 96 scenarios. Importantly, combinations of all simulation parameters ensure the estimated number of clusters to be contained between 10 and 52, a range that closely track the interquartile range of sample sizes reported by Ivers et al. (2011) based on a review of about 300 published CRTs.

Throughout, we fix the nominal type I error rate at 5%. For each scenario, we calculate the required number of clusters to achieve at least 80% power using the formulas (9) and (15) (assuming working independence) and formulas (9) and (18) (assuming the true working correlation). The estimated required number of clusters N is then rounded to the nearest even integer above to ensure a balanced allocation. For using these formulas, the true values of marginal event rate μ(0), marginal RR, arm-specific marginal variance (τ(0), τ(1)), arm-specific CV of the outcome (κ(0), κ(1)), and arm-specific ICC (ρ(0), ρ(1)) are computed through the formulas presented in Web Appendix B. These values also critically depend on the right truncation point T and are summarized in Web Tables 1 and 8. The actual predicted power from the formula based on N can be calculated by inverting Equation (9). We then simulate 10,000 datasets using the conditional model (1) for each set of N, m¯, η, (exp(β0), exp(β1)), (σ02,σ12). With a given CV value η, the variable cluster sizes are simulated using mi~Gamma(η2,m¯η2), and the minimum mi is set to 2 to ensure computational stability. For each simulated dataset, we fit the corresponding independence GEE or SEE approaches introduced in Section 3.1 or 3.2, depending on which formula has been used to calculate N. Under the null hypothesis, we report the empirical type I error rate of the t-test as the proportions of incorrect rejections over the 10,000 simulations. Under the alternative, we report the empirical power of the t-test as the proportions of correct rejections over the 10,000 simulations. The sample size formula is considered accurate if the empirical power agrees well with the predicted power.

Although both the model-based variance and the sandwich variance can be used to construct the t-test under the SEE approach with the correct working variance and correlation models, only the sandwich variance can be used to construct the t-test under the independence GEE. As we explained earlier, this is because only the sandwich variance is robust to misspecification of the variance and correlation models. Nevertheless, the default sandwich variance may have negative bias when there are only a limited number of clusters (Kauermann & Carroll, 2001), a scenario frequently seen in CRTs. To reduce this negative bias, we consider three popular bias-corrected sandwich variances when constructing the t-test when either the working correlation model is used. The Mancl and DeRouen (MD) bias-corrected variance (Mancl & DeRouen, 2001) and the Kauermman and Carroll (KC) bias-corrected variance (Kauermann & Carroll, 2001) modify Equation (8) with

Cov^MD(Yi)=(IH^i)1(Yiμ^i)(Yiμ^i)(IH^i)1Cov^KC(Yi)=(IH^i)1/2(Yiμ^i)(Yiμ^i)(IH^i)1/2,

where H^i=D^1iΩ^1D^1iV^1i1 is the cluster leverage matrix. The Fay and Graubard (FG) bias-corrected variance (Fay & Graubard, 2001) modifies (8) with

Ω^0=i=1Ndiag{(1min{r,[Q^i]jj})1/2}D^1iV^1i1Cov^(Yi)V^1i1D^1idiag{(1min{r,[Q^i]jj})1/2},

where Q^i=D^1iV^1i1D^1iΩ^11 and r = 0.75 is the default choice to avoid overcorrection. All three sandwich variances utilize a multiplicative bias-correction term to inflate the original sandwich variance and reduce the negative bias. Assuming equal cluster sizes, there are a number of previous reports indicating that the t-test with the KC bias-corrected variances has adequate control of the type I error rate in CRTs with equal cluster sizes (Li et al., 2017, 2018, 2019; Li, 2020; Teerenstra et al., 2010). In CRTs with binary outcomes, Li and Redden (2015b) compared these bias-corrected variances; they recommended the KC bias-corrected variance estimator when the CV of cluster sizes is small and the FG bias-corrected variance estimator otherwise. Ford and Westgate (2017) further considered a hybrid bias-correction that averages the MD and KC standard errors and found the t-test with this new standard error maintains valid type I error rate under variable cluster sizes with both continuous and binary outcomes. The adequate performance on type I error of this hybrid bias-correction has also been observed in simulations with more complex stepped wedge designs (Ford & Westgate, 2020). We will also examine whether the average MD/KC standard error (abbreviated as AVG) has adequate performance in our simulation scenario with clustered counts. In general, because the existing empirical evaluations on bias-corrected variances are mainly confined to continuous and binary outcomes, our simulations could help validate those findings with count outcomes and in the presence of right truncation.

4.2 |. Simulation results

Table 1 summarizes the empirical type I error rates for independence GEE analysis and SEE analysis, when the outcomes are simulated from the conditional model with exp(β0) = 1.25, six levels of variance components, mean cluster size m¯=25, and equal cluster sizes (η = 0). We consider the empirical type I error rates between 4.5% and 5.5% to be close to nominal according to the margin of error under a binomial model with 10,000 replications. As expected, the results confirm that the t-test with the model-based standard error leads to grossly inflated type I error rate under working independence. For both working correlation models, although the t-test with the Liang and Zeger (LZ) sandwich standard error often led to liberal test size, the t-test with the MD standard error could often be conservative. By contrast, t-tests with KC, FG, and AVG standard errors have close to nominal test size, when the cluster sizes are all equal. Web Tables 24 present the corresponding results when the cluster size CV increases to 0.3, 0.6, and 0.9. Echoing the findings in Li and Redden (2015b) and Ford and Westgate (2017), the t-test with the KC standard error tends to exhibit inflated type I error rate when the CV becomes larger (η = 0.6 and 0.9). With variable cluster sizes, the t-test with the MD, FG, or AVG standard error generally has a better control of type I error. Finally, the FG and AVG standard errors can lead to a slightly liberal test when the CV of cluster size becomes extreme, that is, η = 0.9.

TABLE 1.

Empirical type I error rates (%) for GEE and SEE analyses when the mean cluster size m¯=25, cluster size CV η = 0, (exp(β0), exp(β1)) = (1.25, 0.55). The empirical type I error rate between 4.5% and 5.5% is considered close to nominal according to the margin of error under a binomial model with 10,000 replications and is highlighted in bold

(σ02,σ12) Variance Independence (GEE) Arm-specific exchangeable (SEE)
T = ∞ T = 4 T = 2 T = 1 T = ∞ T = 4 T = 2 T = 1
(0.05,0.05) N 12 12 14 22 12 12 14 22
MB 16.4 14.4 9.2 6.2 7.0 6.9 6.5 5.9
LZ 7.0 6.9 6.5 5.9 7.0 6.9 6.5 5.9
MD 3.5 3.3 3.5 3.7 3.5 3.3 3.5 3.7
KC 5.0 4.8 4.8 4.7 5.0 4.8 4.8 4.7
FG 4.3 4.2 4.0 4.2 4.2 4.1 4.1 4.2
AVG 4.2 4.1 4.0 4.1 4.2 4.1 4.0 4.1
(0.05,0.10) N 14 14 18 26 14 14 18 26
MB 22.5 19.4 13.4 8.0 6.8 6.6 6.2 5.8
LZ 6.8 6.6 6.2 5.8 6.8 6.6 6.2 5.8
MD 4.0 3.8 3.8 4.4 4.0 3.8 3.8 4.4
KC 5.1 5.0 5.0 5.0 5.1 5.0 5.0 5.0
FG 4.4 4.3 4.2 4.6 4.7 4.4 4.3 4.6
AVG 4.5 4.3 4.2 4.6 4.5 4.3 4.2 4.6
(0.05,0.20) N 24 22 24 32 24 22 24 32
MB 33.3 29.0 19.3 11.6 7.0 6.4 5.9 5.4
LZ 7.0 6.4 5.9 5.4 7.0 6.4 5.9 5.4
MD 4.9 4.3 4.1 4.1 4.9 4.3 4.1 4.1
KC 5.9 5.3 4.9 4.8 5.9 5.3 4.9 4.8
FG 5.2 4.6 4.3 4.4 5.6 5.0 4.6 4.5
AVG 5.4 4.7 4.4 4.4 5.4 4.7 4.4 4.4
(0.10,0.10) N 16 16 18 28 16 16 18 28
MB 27.8 24.6 16.2 10.1 6.7 6.4 6.2 5.6
LZ 6.7 6.4 6.2 5.6 6.7 6.4 6.2 5.6
MD 4.2 4.0 3.7 4.2 4.2 4.0 3.7 4.2
KC 5.2 5.0 4.8 4.7 5.2 5.0 4.8 4.7
FG 4.6 4.6 4.2 4.4 4.7 4.5 4.2 4.3
AVG 4.7 4.5 4.2 4.3 4.7 4.5 4.2 4.3
(0.10,0.20) N 24 24 26 32 24 24 26 32
MB 36.4 32.5 21.1 13.0 6.6 6.2 5.9 5.6
LZ 6.6 6.2 5.9 5.6 6.6 6.2 5.9 5.6
MD 4.5 4.4 4.3 4.0 4.5 4.4 4.3 4.0
KC 5.6 5.2 5.0 4.9 5.6 5.2 5.0 4.9
FG 4.9 4.7 4.6 4.4 5.2 4.8 4.7 4.5
AVG 5.1 4.7 4.7 4.5 5.1 4.7 4.7 4.5
(0.20,0.20) N 26 26 28 36 26 26 28 36
MB 41.4 36.4 26.4 16.2 6.4 6.0 5.7 5.3
LZ 6.4 6.0 5.7 5.3 6.4 6.0 5.7 5.3
MD 4.6 4.3 4.1 4.0 4.6 4.3 4.1 4.0
KC 5.5 5.0 4.9 4.7 5.5 5.0 4.9 4.7
FG 5.1 4.7 4.6 4.4 5.0 4.6 4.5 4.4
AVG 5.1 4.6 4.6 4.4 5.1 4.6 4.6 4.4

Table 2 summarizes the difference between the empirical power and formula-predicted power for independence GEE analysis and SEE analysis, when the outcomes are simulated from the conditional model with (exp(β0), exp(β1)) = (1.25, 0.55), six levels of variance components, mean cluster size m¯=25, and equal cluster sizes. Given the nominal power is at least 80%, we consider the difference between the empirical and predicted power within 0.8% to be accurate. Above all, even though the t-test with the MB or LZ standard errors has higher empirical power than predicted, one should be cautious in interpreting their empirical power as these two tests are liberal under the null. On the other hand, the t-test with the MD standard error is frequently underpowered, which is consistent with previous findings with continuous and binary outcomes (Li et al., 2018; Teerenstra et al., 2010). When the cluster sizes are equal, the t-test with the KC standard error carries empirical power that corresponds to the best to the formula prediction. In this case, while the t-test with the FG or AVG standard error generally has power close to prediction, occasionally it may be slightly underpowered, especially when the number of clusters is limited. Web Tables 57 summarize the corresponding results on power when the cluster size CV increases. When the cluster size CV equals 0.3, the results on power are similar to Table 2. However, when the cluster size CV increases to 0.6 and 0.9, the t-test with the KC standard error can become liberal, while the t-test with the MD standard error remains underpowered; we instead focus on the t-test with the FG or AVG standard error. When the number of clusters is at least 15, these two tests generally have similar empirical power that is no less the nominal value (with FG performing better in several occasions with a small number of clusters). However, these two tests can become slightly underpowered when the number of clusters is smaller than 15, which arises when σ02=σ12=0.05. This observation highlights the inferential challenge with a limited sample size and high degree of cluster size variability. We have also replicated the simulation study with the conditional baseline event rate exp(β0) = 2.7, conditional RR exp(β1) = 0.6, and mean cluster size m¯=50 and found similar conclusions on the empirical performance of the t-tests. Those simulation results are summarized in Web Tables 916. Overall, the results on type I error and power suggest the use of the t-test with the KC standard error when the cluster size CV η = 0 and 0.3. With a larger cluster size variability (η ≥ 0.6), our results favor the use of the t-test with the FG or AVG standard error, as the t-test with the KC standard error can become too liberal.

TABLE 2.

Difference between empirical power and predicted power for GEE and SEE analyses when the mean cluster size m¯=25, cluster size CV η = 0, (exp(β0), exp(β1)) = (1.25, 0.55). Based on 80% nominal power, difference within 0.8% is considered close to nominal according to the margin of error under a binomial model with 10,000 replications and is highlighted in bold

(σ02,σ12) Variance Independence (GEE) Arm-specific exchangeable (SEE)
T = ∞ T = 4 T = 2 T = 1 T = ∞ T = 4 T = 2 T = 1
(0.05,0.05) N 12 12 14 22 12 12 14 22
MB 11.2 11.4 11.4 7.1 4.3 4.1 4.2 2.6
LZ 4.3 4.1 4.2 2.6 4.3 4.1 4.2 2.6
MD −4.5 −5.5 −4.7 −2.9 −4.5 −5.5 −4.7 −2.9
KC 0.7 0.2 0.1 0.0 0.7 0.2 0.1 0.0
FG −0.9 −1.5 −1.7 −1.3 −1.2 −1.9 −1.6 −1.0
AVG −1.8 −2.3 −2.1 −1.4 −1.8 −2.3 −2.1 −1.4
(0.05,0.10) N 14 14 18 26 14 14 18 26
MB 15.9 16.2 12.0 9.2 5.0 4.7 3.5 3.2
LZ 5.0 4.7 3.5 3.2 5.0 4.7 3.5 3.2
MD −3.5 −4.0 −2.5 −1.2 −3.5 −4.0 −2.5 −1.2
KC 1.2 0.8 0.8 1.2 1.2 0.8 0.8 1.2
FG 0.6 −1.2 −1.0 0.1 0.3 0.6 0.4 0.5
AVG −1.2 −1.6 −1.0 0.0 −1.2 −1.6 −1.0 0.0
(0.05,0.20) n^ 24 22 24 32 24 22 24 32
MB 15.4 16.8 14.8 12.1 3.3 3.0 4.0 4.3
LZ 3.3 3.0 4.0 4.3 3.3 3.0 4.0 4.3
MD −1.0 −2.0 0.7 0.8 −1.0 −2.0 0.7 0.8
KC 1.3 0.7 1.7 2.6 1.3 0.7 1.7 2.6
FG 0.2 0.8 0.5 1.6 0.9 0.1 1.2 2.1
AVG 0.2 0.7 0.6 1.7 0.2 0.7 0.6 1.7
(0.10,0.10) N 16 16 18 28 16 16 18 28
MB 14.9 14.7 14.6 9.1 4.2 3.7 3.2 2.4
LZ 4.2 3.7 3.2 2.4 4.2 3.7 3.2 2.4
MD −2.9 −3.9 −3.8 −1.1 −2.9 −3.9 −3.8 −1.1
KC 1.1 0.4 0.0 0.9 1.1 0.4 0.0 0.9
FG 0.2 −1.1 −1.6 0.1 0.6 −1.3 −1.3 0.2
AVG −0.9 −1.8 −1.8 0.2 −0.9 −1.8 −1.8 0.2
(0.10,0.20) N 24 24 26 32 24 24 26 32
MB 16.9 15.7 14.6 13.7 3.5 3.0 3.0 3.6
LZ 3.5 3.0 3.0 3.6 3.5 3.0 3.0 3.6
MD 0.8 −1.7 −0.9 0.2 0.8 −1.7 −0.9 0.2
KC 1.4 0.9 1.0 1.8 1.4 0.9 1.0 1.8
FG 0.3 0.3 0.0 0.8 0.7 0.2 0.5 1.1
AVG 0.2 0.3 0.0 0.8 0.2 0.3 0.0 0.8
(0.20,0.20) N 26 26 28 36 26 26 28 36
MB 16.9 15.6 15.1 13.4 3.1 1.9 2.5 2.0
LZ 3.1 1.9 2.5 2.0 3.1 1.9 2.5 2.0
MD 0.8 −2.3 −1.6 −0.9 0.8 −2.3 −1.6 −0.9
KC 1.4 0.0 0.7 0.5 1.4 0.0 0.7 0.5
FG 0.6 −0.9 0.4 0.2 0.4 −1.0 0.2 0.1
AVG 0.2 −1.3 0.5 0.3 0.2 −1.3 0.5 0.3

Our simulation results also shed light on the impact of the working correlation structure, cluster size variability, and right truncation on the sample size estimate. With equal cluster sizes, Table 1 and Web Table 9 confirm that the sample size estimates are identical regardless of the working correlation structure, across all values of the variance components and right truncation point T, matching our analytical insights in Sections 3.1 and 3.2. However, as the cluster sizes become more variable, modeling the arm-specific ICCs via SEE can provide notable gains in power, leading to a smaller sample size N compared to using the independence working correlation. This efficiency gain becomes the most pronounced when the variance components (hence the ICCs) are large, and the CV of cluster sizes become large. For example, even in the absence of right truncation, the independence GEE can require as many as 20 more clusters compared to the SEE approach in Web Table 16, in the extreme scenario when σ02=σ12=0.20. These observations indicate the necessity of modeling the ICCs for estimating the treatment effect parameter when cluster sizes vary. Finally, right truncation affects the required sample size N under both working correlation models. Generally, when the truncation point moves towards 1, the required number of clusters for the marginal analysis first decreases and then increases. With few exceptions, right truncation by T = 1 has the largest attenuating effect on the power and can required 10 or more clusters to achieve the same level of power compared to no right truncation, when either correlation model is considered in the analysis.

5 |. NUMERICAL ILLUSTRATION

5.1 |. Impact of right truncation on power

The closed-form sample size formulas provide an efficient approach to explore the impact of right truncation on power in CRTs with count outcomes. For illustration, we consider expanding the results in Li and Harhay (2020), where the empirical power of the GEE t-test is studied using the simulation-based approach. While the simulation-based approach could be time consuming, our formulas allow fast exploration of the power under a wide range of scenarios.

Table 3 presents the predicted power of GEE analyses of CRTs with count outcomes with and without truncation, under 10 combinations of conditional model parameters, (exp(β0), exp(β1)) and (σ02,σ12), and sample size, N and m¯, assuming equal cluster sizes (η = 0). In this case, the power does not depend on the choice of the working correlation. When the ICC values are small (e.g., σ02=σ12=0.05) and the number of clusters is small, the power monotonically decreases as the right truncation point moves towards 1. However, with a large degree of clustering (e.g., σ02,σ120.10), the power first increases and then decreases as the right truncation point moves towards 1. Such a nonmonotone relationship has also been found via simulations in Li and Harhay (2020). In Web Table 17, we further obtain the marginal RR and the asymptotic variance of the GEE estimator under each scenario. As T changes from ∞ to 6, we observe that the asymptotic variance decreases at a faster rate than the marginal RR moves towards the null. But when T moves further towards one, the marginal RR approaches the null much faster than the variance shrinks. Such interesting dynamics explain why the power first slightly increases before it decreases. In addition, the power of the GEE analysis is most sensitive to right truncation when the number of clusters is relatively limited, say N ≤ 30, in which case the power under T = 1 (truncating the count outcome as a binary outcome) could be less than half of that under T = ∞ (no truncation). By contrast, with 90 or more clusters, right truncation leads to no more than 10% decrease in power. These observations indicate the necessity to account for right truncation in trial planning.

TABLE 3.

Predicted power (%) of the GEE analyses with the independence or arm-specific exchangeable working correlation (SEE) when the cluster sizes are all equal (CV η = 0)

(exp(β0), exp(β1)) (σ02,σ12) N m¯ T = ∞ T = 6 T = 5 T = 4 T = 3 T = 2 T = 1
(1.25, 0.70) (0.05, 0.05) 30 15 79.7 79.6 79.1 77.7 73.3 61.9 37.1
(0.10, 0.10) 30 45 75.9 76.1 76.2 76.0 74.7 70.5 56.9
(0.20, 0.20) 60 35 78.3 79.2 79.5 79.7 79.2 76.5 66.6
(0.30, 0.30) 90 40 78.7 80.7 81.3 81.8 81.8 80.5 74.3
(0.40, 0.40) 110 40 73.8 77.4 78.3 79.0 79.3 78.4 73.5
(2.70, 0.70) (0.05, 0.05) 25 10 79.1 76.8 73.6 67.4 56.6 40.0 20.4
(0.10, 0.10) 30 30 78.0 78.4 77.5 75.6 71.6 63.3 44.4
(0.20, 0.20) 55 20 75.3 77.2 76.6 75.1 71.9 65.2 48.7
(0.30, 0.30) 80 30 74.6 78.7 78.6 78.0 76.4 72.9 62.8
(0.40, 0.40) 110 25 74.3 80.1 80.1 79.6 78.2 75.1 65.8

Analogous to Table 3, Tables 4 and 5 present the predicted power of the GEE analyses with the independence working correlation and SEE analyses with the arm-specific exchangeable working correlation when the CV of cluster sizes is η = 0.6. When the cluster sizes are variable, our analytical derivations and simulations suggest that the power under the arm-specific exchangeable working correlation is usually no smaller than that under working independence. Tables 4 and 5 further imply that their difference in power could be over 10% when the number of clusters and the ICCs become larger, both with and without right truncation. Finally, the same nonmonotone relationship between power and truncation point is observed when the cluster sizes are variable. Web Table 18 presents the marginal RR and the asymptotic variance of the GEE and SEE estimators under each scenario. We still observe different rates of change for the marginal RR and asymptotic variance under either correlation structure as T moves towards 1, which underlies the nonmonotone relationship between T and power.

TABLE 4.

Predicted power (%) of the GEE analyses with the independence working correlation when the CV of cluster sizes is η = 0.6

(exp(β0), exp(β1)) (σ02,σ12) N m¯ T = ∞ T = 6 T = 5 T = 4 T = 3 T = 2 T = 1
(1.25, 0.70) (0.05, 0.05) 30 15 73.4 73.4 73.0 71.7 67.7 57.5 35.4
(0.10, 0.10) 30 45 64.3 64.6 64.8 64.8 63.8 60.5 49.7
(0.20, 0.20) 60 35 66.3 67.3 67.8 68.2 67.9 65.8 57.8
(0.30, 0.30) 90 40 66.2 68.5 69.3 70.0 70.2 69.2 64.1
(0.40, 0.40) 110 40 60.9 64.8 65.7 66.6 67.1 66.6 62.6
(2.70, 0.70) (0.05, 0.05) 25 10 71.6 69.7 66.8 61.3 51.9 37.5 19.8
(0.10, 0.10) 30 30 66.1 66.9 66.2 64.5 61.2 54.6 39.7
(0.20, 0.20) 55 20 62.9 65.3 64.8 63.6 61.1 55.8 43.0
(0.30, 0.30) 80 30 61.6 66.3 66.3 65.8 64.5 61.6 53.8
(0.40, 0.40) 110 25 61.3 67.8 67.8 67.5 66.3 63.7 56.4

TABLE 5.

Predicted power (%) of the SEE analyses with the arm-specific exchangeable working correlation when the CV of cluster sizes is η = 0.6

(exp(β0), exp(β1)) (σ02,σ12) N m¯ T = ∞ T = 6 T = 5 T = 4 T = 3 T = 2 T = 1
(1.25, 0.70) (0.05, 0.05) 30 15 75.9 75.8 75.3 73.8 69.4 58.5 35.6
(0.10, 0.10) 30 45 73.7 73.9 74.0 73.6 72.0 67.1 52.9
(0.20, 0.20) 60 35 76.9 77.7 78.1 78.1 77.2 73.9 62.7
(0.30, 0.30) 90 40 77.9 79.8 80.4 80.8 80.6 78.8 71.2
(0.40, 0.40) 110 40 73.3 76.8 77.5 78.2 78.3 77.0 70.8
(2.70, 0.70) (0.05, 0.05) 25 10 75.3 72.8 69.5 63.3 53.1 37.9 19.9
(0.10, 0.10) 30 30 76.4 76.5 75.4 73.0 68.4 59.4 41.1
(0.20, 0.20) 55 20 74.2 75.6 74.8 72.9 69.1 61.6 45.2
(0.30, 0.30) 80 30 74.1 77.9 77.7 76.8 74.9 70.5 59.1
(0.40, 0.40) 110 25 73.8 79.4 79.2 78.5 76.8 72.9 62.2

5.2 |. Illustrative sample size calculation

We demonstrate how the required sample size could be efficiently computed in the context of the malaria control CRT discussed in Mwandigha et al. (2020). The CRT aims to investigate the efficacy of a new vector control tool, the Attractive Target Sugar Baits (ATSB), for reducing the malaria prevalence and clinical incidence. A nested cohort of children are recruited within each participating village, and their malaria infection is cleared at baseline. These children will be followed using active case detection for every 1-month interval. To avoid cohort fatigue and prevent dropout, we consider the multiple-cohort design in Mwandigha et al. (2020), where three cohorts of children are recruited over three equally spaced waves and each cohort will be followed for 4 months. The total study duration is assumed to 12 months. We further assume the study could afford to recruit around 10 children for each cohort in each village, and so the average cluster sizes become m¯=10×3=30.

We first consider that the study can recruit exactly 30 children over three cohorts in each village (η = 0). In this case, the required number of villages for GEE analyses is identical regardless of the working correlation specification. Assuming the number of episodes for each children follows the conditional model (1), with a common offset log(L) = log(4/12), conditional baseline event rate exp(β0) = 2.70, conditional RR in the absence of truncation exp(β1) = 0.70, variance components σ02=σ12=0.1 (corresponding to marginal RR exp(Δ) = μ(1)/μ(0) = 0.70 and ICCs ρ(0) = 0.09 and ρ(1) = 0.06 in the absence of truncation, based on the conversion formulas in Web Appendix B), the marginal analysis requires at least N = 39 villages to achieve 80% power in the absence of right truncation. However, each recruited child participant is followed for a maximum of 4 month, and it is reasonable in practice to assume that a maximum of two episodes per child can be detected during his/her follow-up time (White et al., 2015). Accounting for right truncation at T = 2, the marginal RR becomes exp(Δ) = 0.76 and five additional villages are required (N = 44) to compensate the power loss due to truncation and maintain the 80% power.

Figure 1 further explores the sensitivity of sample size estimates N to the truncation point T and cluster size variability η. The effect size in marginal RR moves towards the null and becomes exp(Δ) = {0.71, 0.73, 0.76, 0.82} when T = {4, 3, 2, 1}. When the outcomes are right truncated at T = 2, larger cluster size variability inflates the required number of villages. For example, using the independence working correlation, the study requires 47, 53, and 64 villages to achieve 80% power, when the CV of cluster sizes is 0.3, 0.6, and 0.9, respectively. In contrast, using the true arm-specific exchangeable working correlation, the study requires 45, 48, and 54 villages to achieve 80% power, when the CV of cluster sizes is 0.3, 0.6, and 0.9. With the largest degree of cluster size variability (η = 0.9), modeling the true correlation structure could reduce the required number of villages by 10 and leads to a notable saving of financial and logistical resources. Finally, right truncation at T = 4 (corresponding to assuming a maximum of 12 episode detections per year) appears to have a negligible impact on the required number of villages, regardless of the working correlation and cluster size variability. In contrast, the required number of villages will be inflated most dramatically if the right truncation engenders binary outcomes (T = 1; the outcome becomes a dichotomous incidence or presence of disease). For example, more than 20 villages are required to maintain 80% power under T = 1 compared to that without truncation, when the arm-specific exchangeable working correlation is used in the SEE analysis.

FIGURE 1.

FIGURE 1

Estimated required number of villages to achieve 80% power in the illustrative ATSB CRTs, under truncation points T ∈ {∞, 4, 3, 2, 1} and CV of cluster sizes η ∈ {0, 0.3, 0.6, 0.9}. T = ∞ indicated the absence of right truncation. Panel (a) corresponds to GEE analysis assuming the independence working correlation; Panel (b) corresponds to SEE analysis assuming the arm-specific exchangeable working correlation

6 |. DISCUSSION

In this article, we have developed new sample size formulas for marginal analyses of CRTs with count outcomes subject to right truncation. The two sample size formulas correspond to two choices of the working correlation model: the independence working correlation structure and the arm-specific working correlation structure; the latter separately specifies an ICC for each treatment arm and is considered to be the true correlation model. Because right truncation affects the mean–variance relationship of the outcome, we introduced the CV of the outcome, κ(0) and κ(1), as two key parameters to account for the impact of right truncation. Specifically, in the absence of right truncation, the mean–variance relationship under the Poisson model holds, that is, κ(0)=μ(0)1/2, κ(1)=μ(1)1/2, and therefore the sample size formula could reduce to the usual formula developed for clustered counts without truncation (by further assuming the same ICC in two treatment arms); see, for example, Amatya et al. (2013), Liu and Colditz (2018), and Wang et al. (2020). In the presence of right truncation, the values of κ(0) and κ(1) become a function of the truncation point T and are required input for sample size calculation. Assuming a log-linear conditional model with arm-specific variance components, we provide a set of conversion formulas in Web Appendix B to compute the values for κ(0), κ(1), as well as ρ(0), ρ(1). In the design stage, one could first elicit variance components parameters based on the conditional model and then use the conversion formulas to characterize κ(0), κ(1), and ρ(0), ρ(1) for prespecified T, thus determining the required sample size for marginal analysis. Overall, our approach enables fast explorations of a large number of parameter combinations in the design stage without resorting to the computationally intensive simulation approach.

While we have shown that the required sample size is identical regardless of the choice of working correlation when the cluster sizes are equal, the required sample size could be different when the cluster sizes are unequal. To account for variable cluster sizes, we provide approximate sample size formulas as a function of the CV of cluster sizes. Under the independence working correlation, the arm-specific VIF extends the VIF studied for cluster-level analysis of CRTs (Eldridge et al., 2006; Kang et al., 2003; Manatunga et al., 2001; Wang et al., 2020). Under the arm-specific exchangeable working correlation, the arm-specific VIF instead extends the VIF previously studied for random-effects analysis of CRTs (Candel & van Breukelen, 2010; van Breukelen et al., 2007). In this latter case, our derivation also extends those in Liu and Colditz (2018), who have assumed the same correlation between arms and the absence of outcome truncation. We have specifically connected these previous results on VIF under the same marginal modeling framework but allowing for different specifications of the working correlation model. In particular, our numerical studies found that the sample size estimates under the arm-specific exchangeable working correlation are generally less sensitive to variable cluster sizes than those under the independence working correlation. This observation highlights the importance of exploiting the within-cluster correlations in estimating the treatment effect in CRTs. On the other hand, estimating within-cluster correlations also respects the recommendation in the CONSORT extension to CRTs (Campbell et al., 2012), as these values are helpful in informing the design of future studies.

We have carried out an extensive simulation study to examine the accuracy of the proposed sample size formulas. We confirm that the t-test with the usual sandwich variance often carries a liberal test size, even when the number of clusters is over 50 (Web Table 12). The t-tests with several bias-corrected variances were found to have improved performance in terms of both type I error rate and power. Specifically, the t-test with the KC standard error may be favored when the CV of cluster sizes is 0 and 0.3. This recommendation is consistent with Li and Redden (2015b). On the other hand, our results favor the t-test with the FG or the average MD/KC standard error when the CV of cluster sizes is at least 0.6, a finding consistent with both Li and Redden (2015b) and Ford and Westgate (2017). In addition, the type I error rate and power performance of the bias-corrected t-tests are generally similar under both the independence and arm-specific exchangeable working correlations. Nevertheless, our results indicate that the required sample size could be much smaller under the arm-specific exchangeable working correlation. Of note, these bias-corrected GEE t-tests are now standard in SAS GLIMMIX, R package geesmv and a recent Stata module xtgeebcv (Gallis et al., 2020), at least for commonly used working correlation models. With the arm-specific exchangeable working correlation model, we developed our own R program to implement the SEE approach and the associated bias-corrected variances. Source code to implement the estimating equations procedures and to reproduce the simulation results is available as Supporting Information on the journals web page http://onlinelibrary.wiley.com/doi/10.1002/bimj.202000230/suppinfo.

While our main derivation assumes the absence of missing outcome, missing outcome (e.g., due to dropout) is common in both individually randomized trials and CRTs. In Web Appendixes E and F, we provide a set of new variance expressions under the independence and arm-specific exchangeable working correlation structures to accommodate missing outcomes. For the purpose of design calculations, we assume the outcome is missing completely at random. This simple condition ensures the validity of complete-case GEE and SEE analyses (a complete case is contributed by a participant who has continued through the end of the study) and allows us to present closed-form variance expressions for sample size estimation. In Web Appendixes E and F, we identified two messages. First, the new sample size formula can capture the variance inflation due to missing outcomes with assumptions on the marginal missingness proportion and the ICC of the missingness indicator (Turner et al., 2020). When the missingness is independent across individuals, assuming only the marginal missingness proportion is sufficient to apply those formulas. Second, we show that inflating the sample size by the inverse of one minus missingness rate provides a conservative sample size estimate under the independence working correlation. However, if the arm-specific exchangeable correlation structure is adopted, this finding holds asymptotically when the mean cluster size approaches infinity. These insights could potentially provide guidance on the number of additional participants required to ensure adequate power. On the other hand, we acknowledge that a complete-case analysis may not be the most efficient as it ignores participants with partial follow-up information before dropping out. A more efficient approach could include the participant-specific follow-up time Lij in model (5), which in general does not permit the analytical derivation of the treatment effect variance. Therefore, simulation-based power calculation may be considered assuming a specific distribution of Lij as a result of dropout. In any case, trial implementation methodologies to prevent dropout should also be emphasized in the design. For example, in Section 5.2, a multiple-cohort design was considered to prevent dropout as in Mwandigha et al. (2020).

One possible limitation of the current investigation is that the sample size methodology corresponds to an intervention effect in terms of RR. In the presence of right truncation, one may alternatively prespecify a marginal logistic regression for repeated binary outcomes to estimate the OR, which necessitates different sample size considerations. It is important to notice that repeated measurements in CRTs pose additional challenges for modeling because one may need more complex correlation structure to accommodate both the conventional ICC (just as ρ(0) and ρ(1) in our setting) and the serial correlation between repeated binary outcomes. Similar multilevel correlation structures have been previously introduced in stepped wedge designs (Li, 2020; Li et al., 2018, 2020) and will be used to obtain a sample size formula for logistic regression in our future work. In any case, the choice of effect measures (RR vs. OR) depends on the scientific question and should be prespecified in the design stage. Second, while our sample size formulas are quite general in accommodating arbitrary mean–variance relationships, we have, for simplicity, focused on a Poisson random-effects model in our data-generating process for the numerical evaluations in Sections 4 and 5. An alternative data-generating process assuming the negative binomial random-effects model can also be considered, but requires a different set of conversion formulas (e.g., for μ(1) and μ(0)) from those derived in Web Appendix B. This extension will be considered in our future work. Finally, the proposed sample size formulas are only applicable to cross-sectional parallel CRTs. As there is an increasing interest in alternative cluster-randomized designs, such as the cluster-randomized crossover design (Li et al., 2019) and the stepped wedge design (Grayling et al., 2018; Li et al., 2018), it would be valuable to develop corresponding sample size formulas to account for right truncations for these alternative designs. For example, one may extend the sample size formulas in Li et al. (2018) to Poisson outcomes without right truncation and then similarly introduce the CV of the outcome to generalize those formulas in the presence of right truncation. However, a particular complication is that the variance expression may not come in closed form as the treatment parameter now varies within a cluster, and therefore sample size estimation would proceed by numerically inverting the covariance or correlation matrix. In our future work, we plan to investigate the impact of right truncation and the corresponding sample size requirements in these more complex designs.

Supplementary Material

Supplement

ACKNOWLEDGEMENTS

This work is partially supported by the Clinical and Translational Science Awards Grant Number UL1 TR000142 from the National Center for Advancing Translational Science, a component of the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We thank the editor, associate editor, and three anonymous reviewers for their constructive comments and suggestions, which greatly improved the exposition of this work.

Funding information

National Center for Advancing Translational Sciences, Grant/Award Number: UL1TR000142

Footnotes

CONFLICT OF INTEREST

The authors have declared no conflict of interest.

OPEN RESEARCH BADGES

This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section.

This article has earned an open data badge “Reproducible Research” for making publicly available the code necessary to reproduce the reported results. The results reported in this article could fully be reproduced.

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of the article.

REFERENCES

  1. Amatya A, Bhaumik D, & Gibbons RD (2013). Sample size determination for clustered count data. Statistics in Medicine, 32, 4162–4179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bijker EM, & Sauerwein RW (2012). Enhancement of naturally acquired immunity against malaria by drug use. Journal of Medical Microbiology, 61, 904–910. [DOI] [PubMed] [Google Scholar]
  3. Breslow NE, & Clayton DG (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9. [Google Scholar]
  4. Cairns M, Roca-Feltrer A, Garske T, Wilson AL, Diallo D, Milligan PJ, Ghani AC, & Greenwood BM (2012). Estimating the potential public health impact of seasonal malaria chemoprevention in African children. Nature Communications, 3, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Campbell MK, Piaggio G, Elbourne DR, & Altman DG (2012). Consort 2010 statement: Extension to cluster randomised trials. BMJ, 345, 1–21. [DOI] [PubMed] [Google Scholar]
  6. Candel MJ, & van Breukelen GJ (2010). Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression. Statistics in Medicine, 29, 1488–1501. [DOI] [PubMed] [Google Scholar]
  7. Crespi CM, Wong WK, & Mishra SI (2009). Using second-order generalized estimating equations to model heterogeneous intraclass correlation in cluster-randomized trials. Statistics in Medicine, 28, 814–827. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Donner A, & Klar N (2000). Design and analysis of group-randomized trials in health research. Oxford University Press. [Google Scholar]
  9. Eldridge SM, Ashby D, & Kerry S (2006). Sample size for cluster randomized trials: Effect of coefficient of variation of cluster size and analysis method. International Journal of Epidemiology, 35, 1292–1300. [DOI] [PubMed] [Google Scholar]
  10. Eldridge SM, Ukoumunne OC, & Carlin JB (2009). The intra-cluster correlation coefficient in cluster randomized trials: A review of definitions. International Statistical Review, 77, 378–394. [Google Scholar]
  11. Fay MP, & Graubard BI (2001). Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics, 57, 1198–1206. [DOI] [PubMed] [Google Scholar]
  12. Ford WP, & Westgate PM (2017). Improved standard error estimator for maintaining the validity of inference in cluster randomized trials with a small number of clusters. Biometrical Journal, 59, 478–495. [DOI] [PubMed] [Google Scholar]
  13. Ford WP, & Westgate PM (2020). Maintaining the validity of inference in small-sample stepped wedge cluster randomized trials with binary outcomes when using generalized estimating equations. Statistics in Medicine, 39, 2779–2792. [DOI] [PubMed] [Google Scholar]
  14. Foy BD, Alout H, Seaman JA, Rao S, Magalhaes T, Wade M, Parikh S, Soma DD, Sagna AB, Fournet F, Slater HC, Bougma R, Drabo F, Diabaté A, Coulidiaty AGV, Rouamba N, & Dabiré RK (2019). Efficacy and risk of harms of repeat ivermectin mass drug administrations for control of malaria (RIMDAMAL): A cluster-randomised trial. The Lancet, 393, 1517–1526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gallis JA, Li F, & Turner EL (2020). xtgeebcv: A command for bias-corrected sandwich variance estimation for GEE analyses of cluster randomized trials. The Stata Journal, 20, 363–381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Grayling MJ, Mander AP, & Wason JM (2018). Blinded and unblinded sample size reestimation procedures for stepped-wedge cluster randomized trials. Biometrical Journal, 60, 903–916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Halliday KE, Okello G, Turner EL, Njagi K, Mcharo C, Kengo J, Allen E, Dubeck MM, Jukes MC, & Brooker SJ (2014). Impact of intermittent screening and treatment for malaria among school children in Kenya: A Cluster randomised trial. PLoS Medicine, 11, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hayes RJ, & Moulton LH (2009). Cluster randomised trials. Taylor & Francis Group, LLC. [Google Scholar]
  19. Ivers NM, Taljaard M, Dixon S, Bennett C, McRae A, Taleban J, Skea Z, Brehaut JC, Boruch RF, Eccles MP, Grimshaw JM, Weijer C, Zwarenstein M, & Donner A (2011). Impact of CONSORT extension for cluster randomised trials on quality of reporting and study methodology: Review of random sample of 300 trials, 2000–8. BMJ, 343, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kang S-H, Ahn C, & Jung S-H (2003). Sample size calculation for dichotomous outcomes in cluster randomization trials with varying cluster size. Drug Information Journal, 37, 109–114. [Google Scholar]
  21. Kauermann G, & Carroll R (2001). A note on the efficiency of sandwich covariance matrix estimation. Journal of the American Statistical Association, 96, 1387–1396. [Google Scholar]
  22. Landau S, & Stahl D (2012). Sample size and power calculations for medical studies by simulation when closed form expressions are not available. Statistical Methods in Medical Research, 22, 324–345. [DOI] [PubMed] [Google Scholar]
  23. Li D, Zhang S, & Cao J (2019). Sample size calculation for clinical trials with correlated count measurements based on the negative binomial distribution. Statistics in Medicine, 38, 5413–5427. [DOI] [PubMed] [Google Scholar]
  24. Li F (2020). Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure. Statistics in Medicine, 39, 438–455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Li F, Forbes AB, Turner EL, & Preisser JS (2019). Power and sample size requirements for GEE analyses of cluster randomized crossover trials. Statistics in Medicine, 38, 636–649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Li F, & Harhay MO (2020). Commentary: Right truncation in cluster randomized trials can attenuate the power of a marginal analysis. International Journal of Epidemiology, 49, 964–967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Li F, Hughes JP, Hemming K, Taljaard M, Melnick ER, & Heagerty PJ (2020). Mixed-effects models for the design and analysis of stepped wedge cluster randomized trials: An overview. Statistics Methods in Medical Research Early View, 10.1177/0962280220932962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Li F, Turner EL, Heagerty PJ, Murray DM, Vollmer WM, & Delong ER (2017). An evaluation of constrained randomization for the design and analysis of group-randomized trials with binary outcomes. Statistics in Medicine, 36, 3791–3806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Li F, Turner EL, & Preisser JS (2018). Sample size determination for GEE analyses of stepped wedge cluster randomized trials. Biometrics, 74, 1450–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Li P, & Redden DT (2015a). Comparing denominator degrees of freedom approximations for the generalized linear mixed model in analyzing binary outcome in small sample cluster-randomized trials. BMC Medical Research Methodology, 15, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Li P, & Redden DT (2015b). Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Statistics in Medicine, 34, 281–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Liang K-Y, & Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22. [Google Scholar]
  33. Liu J, & Colditz GA (2018). Relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using generalized estimating equation models. Biometrical Journal, 60, 616–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Manatunga AK, Hudgens MG, & Chen S (2001). Sample size estimation in cluster randomized studies with varying cluster size. Biometrical Journal, 43, 75–86. [Google Scholar]
  35. Mancl LA, & DeRouen TA (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics, 57, 126–134. [DOI] [PubMed] [Google Scholar]
  36. Murray DM (1998). Design and analysis of group-randomized trials. Oxford University Press. [Google Scholar]
  37. Murray DM, & Blitstein JL (2003). Methods to reduce the impact of intraclass correlation in group-randomized trials. Evaluation Review, 27, 79–103. [DOI] [PubMed] [Google Scholar]
  38. Murray DM, Varnell SP, & Blitstein JL (2004). Design and analysis of group-randomized trials: A review of recent methodological developments. American Journal of Public Health, 94, 423–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Mwandigha LM, Fraser KJ, Racine-Poon A, Mouksassi M-S, & Ghadi AC (2020). Power calculations for cluster randomized trials (CRTs) with right-truncated Poisson-distributed outcomes: A motivating example from a malaria vector control trial. International Journal of Epidemiology, 49, 954–962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Ogungbenro K, & Aarons L (2010). Sample size/power calculations for population pharmacodynamic experiments involving repeated-count measurements. Journal of Biopharmaceutical Statistics, 20, 1026–1042. [DOI] [PubMed] [Google Scholar]
  41. Pan W (2001). Sample size and power calculations with correlated binary data. Controlled Clinical Trials, 22, 211–227. [DOI] [PubMed] [Google Scholar]
  42. Preisser JS, Young ML, Zaccaro DJ, & Wolfson M (2003). An integrated population-averaged approach to the design, analysis and sample size determination of cluster-unit trials. Statistics in Medicine, 22, 1235–1254. [DOI] [PubMed] [Google Scholar]
  43. Ritz J, & Spiegelman D (2004). Equivalence of conditional and marginal regression models for clustered and longitudinal data. Statistical Methods in Medical Research, 13, 309–323. [Google Scholar]
  44. Rutterford C, Copas A, & Eldridge S (2015). Methods for sample size determination in cluster randomized trials. International Journal of Epidemiology, 44, 1051–1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Teerenstra S, Lu B, Preisser JS, Van Achterberg T, & Borm GF (2010). Sample size considerations for GEE analyses of three-level cluster randomized trials. Biometrics, 66, 1230–1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Turner EL, Li F, Gallis JA, Prague M, & Murray DM (2017a). Review of recent methodological developments in group-randomized trials: Part 1–design. American Journal of Public Health, 107, 907–915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Turner EL, Prague M, Gallis JA, Li F, & Murray DM (2017b). Review of recent methodological developments in group-randomized trials: Part 2–analysis. American Journal of Public Health, 107, 1078–1086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Turner EL, Yao L, Li F, & Prague M (2020). Properties and pitfalls of weighting as an alternative to multilevel multiple imputation in cluster randomized trials with missing binary outcomes under covariate-dependent missingness. Statistical Methods in Medical Research, 29, 1338–1353. [DOI] [PubMed] [Google Scholar]
  49. van Breukelen GJP, Candel MJJM, & Berger MPFB (2007). Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Statistics in Medicine, 26, 2589–2603. [DOI] [PubMed] [Google Scholar]
  50. Wang J, Zhang S, & Ahn C (2020). Sample size calculation for count outcomes in cluster randomization trials with varying cluster sizes. Communications in Statistics – Theory and Methods, 49, 116–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. White MT, Verity R, Griffin JT, Asante KP, Owusu-Agyei S, Greenwood B, Drakeley C, Gesase S, Lusingu J, Ansong D, Adjei S, Agbenyega T, Ogutu B, Otieno L, Otieno W, Agnandji ST, Lell B, Kremsner P, Hoffman I, .., Ghani AC (2015). Immunogenicity of the RTS, S/AS01 malaria vaccine and implications for duration of vaccine efficacy: Secondary analysis of data from a phase 3 randomised controlled trial. The Lancet Infectious Diseases, 15, 1450–1458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Yan J, & Fine J (2004). Estimating equations for association structures. Statistics in Medicine, 23, 859–874. [DOI] [PubMed] [Google Scholar]
  53. Young ML, Preisser JS, Qaqish BF, & Wolfson M (2007). Comparison of subject-specific and population averaged models for count data from cluster-unit intervention trials. Statistical Methods in Medical Research, 16, 167–184. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

RESOURCES