Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Sep 25.
Published in final edited form as: Biom J. 2018 Mar 25;60(3):616–638. doi: 10.1002/bimj.201600262

Relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using generalized estimating equation models

Jingxia Liu 1, Graham A Colditz 2
PMCID: PMC6760674  NIHMSID: NIHMS1050069  PMID: 29577363

Abstract

There is growing interest in conducting cluster randomized trials (CRTs). For simplicity in sample size calculation, the cluster sizes are assumed to be identical across all clusters. However, equal cluster sizes are not guaranteed in practice. Therefore, the relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. One of the most important approaches to analyze a set of correlated data is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which the “working correlation structure” is introduced and the association pattern depends on a vector of association parameters denoted by ρ. In this paper, we utilize GEE models to test the treatment effect in a two-group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect are derived for the different types of outcome. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We discuss a commonly used structure in CRTs- exchangeable, and derive the simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster size distributions through simulation studies. We propose an adjusted sample size due to efficiency loss. Additionally, we also propose an optimal sample size estimation based on the GEE models under a fixed budget for known and unknown association parameter (ρ) in the working correlation structure within the cluster.

Keywords: Cluster randomized trial (CRT), generalized estimating equation (GEE), relative efficiency (RE), working correlation structure, intraclass correlation coefficient (ICC)

1. Introduction

In recent years, there has been growing interest in conducting cluster randomized trials (CRTs) (Campbell et al., 2000; Gulliford et al., 2014; Gravenstein et al., 2016; Kalfon et al., 2016; Mehring et al., 2016; Nagayama et al., 2016; Yamagata et al., 2016). CRTs are performed when randomization at the individual-level is practically infeasible or may lead to severe estimation bias of treatment effect. It is reasonable to expect correlation among individuals in the same cluster, since they may share similar characteristics or be exposed to common factors. The degree of such similarity is commonly quantified by the intracluster correlation coefficient (ICC). Even if the ICCs are pretty small for most CRTs (Murray et al., 2004; Turner et al., 2004), they must be considered in study design and data analysis to avoid underestimation of variance of the treatment effect and inflated type I error rates (Murray, Varnell, et al., 2004).

Methods of analyzing correlated data have been extensively developed. One of the most important approaches is the generalized estimating equation (GEE) proposed by Liang and Zeger (Liang et al., 1986), which fits a marginal model in the context of longitudinal studies. In order to describe the pattern of measures within the individual, the idea of “working correlation structure” is introduced and the pattern depends on a vector of association parameters denoted by ρ. The advantage of this approach is the consistent estimates provided that the marginal model is correctly specified, even if the working correlation matrix is incorrectly assumed. Therefore, GEEs have been commonly applied (Sanda et al., 2008; Lin et al., 2015; Park et al., 2015; Toriola et al., 2015; Jeffe et al., 2016).

Sample size calculation or power estimation is an important topic in study design for investigators. The equations of sample size for CRTs have been published as a function of ICC or the coefficient of variation (CV) of cluster sizes (Donner et al., 1981; Liu et al., 1997; Shih, 1997; Rosner et al., 2003; Eldridge et al., 2006; Austin, 2007; Candel et al., 2008; Van Breukelen et al., 2008; Teerenstra et al., 2010; Rosner et al., 2011; Van Breukelen et al., 2012; Amatya et al., 2013). Shih (Shih, 1997) provides a sample size formula based on the GEE methods for CRTs designed to test the treatment effect. The general formula for a two-group comparison is given. Specifically, the formulas for continuous and binary outcomes are demonstrated with the assumptions of equal cluster sizes n and the exchangeable working correlation R(ρ), i.e., 1 for diagonal entries and ρ otherwise. For count data, Bhaumik and Gibbons (Amatya, Bhaumik, et al., 2013) consider population-average estimators using GEE models for cluster-level randomized design. They assume half of the clusters are randomized to the treatment and the other half are randomized to the control. The asymptotic variance of the treatment effect estimator is derived and the required number of clusters is calculated with the same assumptions: equal cluster sizes n and the exchangeable working correlation R(ρ).

In the design of CRTs, the required sample sizes include the number of clusters m and cluster sizes ni, i = 1, ⋯, m. For simplicity, the cluster sizes are assumed to be identical across clusters, i.e., ni = n for all i, resulting in total sample size of N = nm. In general, the cluster size n is a pre-determined number, therefore the sample size calculation is for the number of clusters only. However, equal cluster sizes are not guaranteed in practice. The relative efficiency (RE) of unequal versus equal cluster sizes has been investigated when testing the treatment effect. For continuous outcomes in cluster randomized studies, Manatunga et al (Manatunga et al., 2001) derive sample size estimation while accounting for the variability due to cluster size and a lower bound for RE is given. For both CRTs and person randomization within centers, Breukelen et al (Van Breukelen et al., 2007) address the RE of unequal versus equal cluster sizes with continuous outcomes. They investigate RE based on maximum likelihood parameter estimation for a range of cluster size distributions. Additionally, they present an approximate formula for computing the RE as a function of the mean and variance of cluster size and the intraclass correlation (ρ). They conclude that the loss of efficiency due to variation of cluster sizes rarely exceeds 10 percent and can be compensated by sampling 11 percent more clusters. Candel and Breukelen (Candel et al., 2010) give the adjusted sample size formula for varying cluster sizes with a binary outcome in CRTs. They derive the asymptotic RE of unequal versus equal cluster sizes from first-order marginal quasi-likelihood (MQL) estimation with mixed effect logistic regression model. A simpler formula of sample size calculation is presented to estimate the efficiency loss due to variation in cluster sizes. They find that 14% more clusters are needed to cover the efficiency loss in many cases.

In this paper, we will utilize GEE models to test the treatment effect in a two-group comparison for continuous, binary, or count data in CRTs. The variances of the estimator of the treatment effect will be derived for different types of outcome. Obviously, they are dependent on which working correlation structure is chosen. As shown (Shih, 1997; Amatya, Bhaumik, et al., 2013), researchers often consider the exchangeable structure since the independent working correlation structure is a special case of exchangeable and first order auto-regressive (AR(1)) fits time-structured data more appropriately. Here, we will also discuss this commonly used structure. RE is defined as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. We will derive a simpler formula of RE with continuous, binary, and count outcomes. Finally, REs are investigated for several scenarios of cluster sizes distribution through simulation studies. We will propose the adjusted sample size due to the efficiency loss. Additionally, cost is an important factor in study design of CRTs since it is associated with recruiting an additional cluster instead of an additional subject in an individual-level randomized trial. Under a fixed budget we also have the same proposal about the optimal sample size, which maximizes the power, based on the GEE models for known and unknown association parameter (ρ) in the working correlation structure within the cluster.

The outline of this paper is as follows. In section 2, we briefly summarize the GEE methods proposed by Liang and Zeger (Liang and Zeger, 1986) and derive the variance of the estimator of the treatment for three kinds of outcomes (continuous, binary, and count) in a two-group comparison. Section 3 introduces the REs of unequal versus equal cluster sizes for the treatment effect. The formula for RE is derived for the exchangeable working correlation structure. Section 4 presents the possible patterns of cluster size distribution and the results through simulation studies that investigate the REs. In section 5, we show the sample size formulas under two different situations- no cost constrains and given a fixed budget. The practical uses are discussed when planning a CRT, followed by discussion of the limitations and directions for future research.

2. Generalized estimating equation models

Let yi=(yi1, yi2,, yini) be a vector of responses from the ith cluster, i = 1, ⋯, m. The responses are assumed to be independent across clusters but correlated within each cluster. The marginal model is

g(μij)=Cij,  i=1, , m; j=1,,ni, (1)

where μij = E(yij) and g(∙) is a linking function. Cij is a vector of covariates, and is an unknown l × 1 vector of regression coefficients. This model specifies a relationship between μij and covariates Cij. The conditional variance of yij given Cij is defined as var (yij|Cij) = γ(μij)θ, where γ is a known variance function of μij nd θ is a scale parameter. The mean of yi is denoted by μi = E(yi) and the variance-covariance matrix for yi is denoted by Vi=θAi1/2Ri(ρ)Ai1/2, where Ai=diag{γ(μi1),, γ(μini)} and working correlation structure Ri(ρ) describes the pattern of measures within the ith cluster. Ri(ρ) is an ni × ni matrix and depends on a vector of association parameters denoted by ρ. Both γ and θ are dependent on the distribution of responses. For instance, if yij is continuous, γ(μij) = 1 and θ is the random error variance; if yij is binary, γ(μij) = μij(1 − μij) and θ=1; If yij is count, γ(μij) = μij and θ = 1.

The estimate of is obtained by solving the following estimating equation

U()=i=1mDiVi1(yiμi)=0, (2)

Where Di = μi/∅′. Let ^ be the solution of Eq. (2). Liang and Zeger (Liang and Zeger, 1986) show that m(^) is asymptotically multivariate normal with covariance matrix

VR=limmm(Σ11Σ0Σ11), (3)

Where Σ1=i=1mDiVi1Di and Σ0=i=1mDiVi1cov(yi|Ci)Vi1Di. As noted, GEE gives asymptotically consistent estimate ^ even when the working correlation matrix is incorrectly assumed (Liang and Zeger, 1986).

For the purpose of sample size calculation at the design stage, we need to assume values for the variance and covariance of yi. Therefore, Σ0 = Σ1 and VR=limmmΣ11 when cov(yi|Ci) is assumed to be same as Vi. Suppose we are interested in testing the treatment effect for a two-group comparison: the treated vs. control group. For simplicity, let us consider a special case with l = 2, where = (β0, β1)′ and the design matrices Ci. Specifically, the coefficient β0 is the intercept, β1 is the treatment effect, Cit=(1ni, 1ni) for ith cluster assigned to the treated group, Cit=(1ni, 0ni) for ith cluster assigned to the control group, and 1n and 0n an n × 1 vector of 1’s and 0’s, respectively. Let Vβ denote the (2, 2)th element of VR. Thus, m(β1^β1) has an asymptotically normal distribution N(0, Vβ), equivalently, Var(β1^)=Vβ/m. The cluster allocations of the treated and control groups are, mt = and mc = m(1 – π), respectively. The hypotheses of interest are H0:β1 = 0 versus H1:β1 = β.

2.1. Continuous outcome

This section shows details of using an identity link function on the continuous outcome, that is, μi=Ci, and Vi = σ2Ri(ρ). With this setting, Di = μi/′ = Ci. Thus,

Σ1=1σ2i=1mCiRi1Ci=1σ2itrtCitRi1Cit+1σ2icontCitRi1Cit=1σ2itrt1niRi11ni(1111)+1σ2icont1niRi11ni(1000)=1σ2(t+cttt),

Where i ∈ trt denotes all the clusters assigned to treated group, and i ∈ cont denotes all the clusters assigned to control group, respectively; t=itrt1niRi11ni and c=icont1niRi11ni. Then

VR=limmmΣ11=σ2limmm(1c1c1c1t+1c),and
Vβ=σ2limmm[1t+1c]=σ2limmm[(itrt1niRi11ni)1+(icont1niRi11ni)1]=σ2limm[mπi=1m1niRi11ni+m(1π)i=1m1niRi11ni]=σ2mπ(1π)(i=1m1niRi11ni). (4)

If we assume equal cluster size ni = n for all i, then

Vβ=σ2π(1π)1nRi11n. (5)

For the exchangeable working correlation structure, 1niRi11ni=ni1+(ni1)ρ. Thus,

Vβ=σ2mπ(1π)(i=1mni1+(ni1)ρ)

and

Vβ=σ2[1+(n1)ρ]π(1π)n, (6)

for unequal and equal cluster sizes, respectively. Please note that Eq. (6) is same as (6) in Shih’s paper (Shih, 1997).

2.2. Binary outcome

For a binary outcome, we consider the logit model: μi=pi=exp(Ci)/(1+exp(Ci)). Here, pij = p0 = exp(β0)/(1 + exp(β0)), for all the subjects in the control group and j = 1, ⋯, ni; and pij = p1 = exp(β0 + β1)/(1 + exp(β0 + β1)), for all the subjects in the treated group and j = 1, ⋯, ni. We can show β1=log(p1(1p0)(1p1)p0), which is the logarithm of odds ratio for the response. Therefore, the null hypothesis H0:β1 = 0 is equivalent to p0 = p1 and the alternative hypothesis is p0p1.

Under this scenario, Di=μi/=p0(1p0)(1ni, 0ni) for the subjects in the control group, denoted by Dic, and Di=p1(1p1)(1ni, 1ni) for the subjects in the treated group, denoted by Dit, and Vi = pi(1 − pi)Ri(ρ). Thus,

Σ1=i=1m1pi(1pi)DiRi1Di=itrt1p1(1p1)DitRi1Dit+icont1p0(1p0)DicRi1Dic=p1(1p1)(itrt1niRi11ni)(1111)+p0(1p0)(icont1niRi11ni)(1000)=(t+cttt),

where c=p0(1p0)(icont1niRi11ni) and t=p1(1p1)(itrt1niRi11ni). Then

VR=limmm(1c1c1c1t+1c) (7)

And

Vβ=limm[mp1(1p1)(itrt1niRi11ni)+mp0(1p0)(icont1niRi11ni)]=limm[mp1(1p1)π(i=1m1niRi11ni)+mp0(1p0)(1π)(i=1m1niRi11ni)]= mi=1m1niRi11ni(1πp1(1p1)+1(1π)p0(1p0)). (8)

Assuming equal cluster size ni = n for all i, we have

Vβ=11nRi11n(1πp1(1p1)+1(1π)p0(1p0)). (9)

When the working correlation structure is exchangeable, Eq. (8) is equivalent to the one substituting Eq. (7) into (5) in Pan’s paper (Pan, 2001) and Eq. (8) derived by Shih (Shih, 1997). Additionally, Eq. (9) becomes 1+(n1)ρn(1πp1(1p1)+1(1π)p0(1p0)).

2.3. Count data

The log model is used to analyze count data, μi=exp(Ci). Di=μi/=Ciexp(Ci) and Vi=Ai1/2Ri(ρ)Ai1/2, where Ai=diag{exp(Ci1),, exp(Cini)}. Specifically, Dic=Cicexp(Cic)=eβ0(1ni, 0ni) for the subjects in the control group, and Dit=eβ0+β1(1ni, 1ni) for the subjects in the treated group. Thus,

Σ1=itrtDitVit1Dit+icontDitVit1Dit=eβ0+β1(itrt1niRi11ni)(1111)+eβ0(icont1niRi11ni)(1000)=(t+cttt),

where c=eβ0(icont1niRi11ni) and t=eβ0+β1(itrt1niRi11ni). Following (7),

Vβ=limm[meβ0+β1(itrt1niRi11ni)+meβ0(icont1niRi11ni)]=limm[meβ0+β1π(i=1m1niRi11ni)+meβ0(1π)(i=1m1niRi11ni)]= meβ0(i=1m1niRi11ni)(1πeβ1+11π). (10)

If we assume equal cluster size ni = n for all i, then

Vβ=1eβ01nRi11n(1πeβ1+11π). (11)

With the assumptions of exchangeable working correlation structure, equal cluster size n and equal allocation π = 0.5, we have Vβ=2[1+(n1)ρ]neβ0(1eβ1+1). It is Eq. (19) developed by Bhaumik and Gibbons (Amatya, Bhaumik, et al., 2013).

3. Relative efficiency of unequal versus equal cluster sizes

We will use relative variance of the treatment effect to define RE when comparing unequal to equal cluster sizes. Let Ωequal denote a design with equal cluster sizes n, and let Ωunequal denote a design with unequal cluster sizes ni, i = 1, ⋯, m. Obviously mn=i=1mni. The RE of unequal versus equal cluster sizes for the treatment effect, RE(β1^), is defined as

RE(β1^)=Var(β1^|Ωequal)Var(β1^|Ωunequal). (12)

From equations (4) and (5) for a continuous outcome, (8) and (9) for a binary outcome, and (10) and (11) for count data, we have the same

RE(β1^)=i=1m1niRi11nim1nRi11n . (13)

REs are surprisingly kept the same for all three types of outcomes. In other words, REs are independent of the outcome type provided that GEE models are used.

With the assumption of exchangeable working correlation structure,

1niRi11ni=ni1+(ni1)ρ,

then Eq. (13) equals

RE(β1^)=1+(n1)ρn1mi=1mni1+(ni1)ρ. (14)

Let τ = (1 – ρ)/ρ, Eq. (14) becomes

RE(β1^)=n+τn1mi=1mnini+τ. (15)

It is identical to Eq. (9) developed by Breukelen et al.(Van Breukelen, Candel, et al., 2007). For binary outcomes, the same finding is shown in equation (A8) (Candel and Van Breukelen, 2010) and section 7 (Van Breukelen et al., 2015). They show that

  1. There is a trade-off between mean cluster size n and τ. Multiplying all ni’s and τ by a factor >0 does not change the RE(β1^).

  2. As ρ → 0 or as ρ → 1, the RE(β1^)1, for any cluster size distribution. For 0<ρ<1, the RE(β1^) is smaller than 1 and so the equal cluster size design is optimal.

4. Simulation studies

In order to compute RE(β1^) in Eq. (14), Breukelen et al (Van Breukelen, Candel, et al., 2007) proposed a uniform, positively skewed, negatively skewed, bimodal and unimodal distribution of cluster size, respectively. Let Xi be random variable of cluster size ni. Here, ni, i = 1, …, m are dependent since they must sum to N = mn. We assume that cluster sizes (X1, ⋯, Xm) follow a multinomial distribution with N and probabilities (p1, ⋯, pm), where i=1mpi=1. Under this distribution, the mean of Xi is Npi, the variance is Npi(1 − pi) and covariance of (Xi, Xj) is −Npipj for ij. Given the number of clusters m and equal cluster size n, the total sample size is N=mn=i=1mni under the designs of Ωequal and Ωunequal. For convenience of the following discussion, we sort the i-th cluster with the distribution probability pi’s, i = 1, ⋯, m, by a non-decreasing order such that p1p2 ≤ ⋯ ≤ pm. Obviously, the cluster sizes ni’s, i = 1, ⋯, m may be different even if pi’s are equal. Therefore, six patterns of (p1, ⋯, pm) are discussed under the design Ωunequal.

  1. Constant: p1 = p2 ⋯= pm

  2. Monotonically increasing: p1 < p2 < ⋯ < pm;

  3. Constant followed by monotonically increasing: p1 = p2⋯ = pk < ⋯ <pm;

  4. Monotonically increasing followed by constant:p1 < p2 ⋯ < pk = ⋯ = pm;

  5. Constant, monotonically increasing followed by constant: p1=p2=pk1<pk1+1<<pk2==pm;

  6. Monotonically increasing, constant followed by monotonically increasing: p1<p2<pk1=pk1+1==pk2<<pm;

These six patterns are shown in Figure 1, which demonstrates the probabilities pi, i = 1, ⋯, 100. We sort these 100 probabilities by a non-decreasing order, such that the 1st cluster has the minimum of probabilities and the 100th cluster has the maximum of the probabilities. For example, pattern 2, the probabilities of 1st cluster and 100th cluster are 0.006 and 0.013, respectively; they are 0.004 and 0.0275 for pattern 3. There could be more complicated patterns but they should be combinations of the above six patterns. Therefore, RE(β1^) in Eq. (14) will be computed through simulation studies only for these six patterns.

Figure 1.

Figure 1

Six basic patterns of probabilities (p1, ⋯, pm).

4.1. Designs

As shown in Eq. (14), RE(β1^) is independent of cluster allocations π and the parameters β0 and β1. To investigate the RE based on GEE models, the following factors are considered: (1) number of clusters m and equal cluster sizes n (or N); (2) the values of p1, ⋯, pm, equivalently, the pattern of p1, ⋯, pm; (3) association parameter (ρ) in the exchangeable working correlation structure.

Number of clusters m and equal cluster sizes n results in total sample size of N = nm. We will consider three scenarios: small, medium, and large CRTs; eg. N = 200, 600, 2000, respectively. Within each scenario we investigate three situations: small, medium, and large number of clusters. For example, m = 20 and n = 100; m =50 and n = 40; m =100 and n = 20 in a large CRT with a total sample size of N = 2000. All six patterns in Figure 1 are used for each study design.

Table 1 shows the values of (p1, ⋯, pm) for the simulation design with m = 100 and n = 20. During the interval of monotonically increasing within each pattern, we make pis as an arithmetic sequence for code efficiency. All the values of pi, i = 1, ⋯, m are shown in the 3rd column for each pattern. For example, pattern 2, pi = p1 + (i − 1)d, i = 2, ⋯, m. Once p1 is given, d will be calculated from p1 and m and the formula is shown in the last column. Consequentially, pi, i = 2, ⋯, m will be computed from the formula in the 3rd column. Pattern 3 has the probabilities of pi with constant followed by monotonically increasing:p1 = ⋯ = pk < pk+1 < ⋯ < pm. In this pattern, k and Pk are pre-determined. Here k = 50 and Pk = 0.2 are used, respectively. Then pi, i = 1, ⋯, k and d will be computed from the formulas in the 3rd column and the last column, respectively. Finally, pi, i = k + 1, ⋯, m will be calculated from the formula in the 3rd column. The logic for pattern 4 is the same as the one for pattern 2. For pattern 5, the pre-determined values are k1, k2, and Pk1. Then pi, i = 1, ⋯, k1 and d will be computed from the formula in the 3rd column and the last column, respectively. Finally, pi, i = k1 + 1, ⋯, k2 will be calculated from the formula in the 3rd column. Please note pk2==pm, thus, all the values of pi, i = 1, ⋯, m are obtained. Pattern 6 is the most complicated case. Four parameters including k1, k2, Pk1, and Pk2 must be pre-determined. Then pi, i = k1, ⋯, k2 and di, i = 1, 2 will be computed from the formula in the 3rd column and the last column, respectively. Finally, pi, i = 1, ⋯, k1 – 1, k2 + 1, ⋯, m will be calculated from the formula in the 3rd column. The required parameters are marked with # in table 1. Provided the values of (p1, ⋯, pm) are available for each pattern, we will simulate the cluster sizes ni, i = 1, ⋯, m from a multinomial distribution with N and probabilities (p1, ⋯, pm).

Table 1.

Multinomial distribution with probabilities (p1, ⋯, pm) with m = 100 and n = 20

Pattern Description (p1, ⋯, pm)[1] Other Parameters
1 Constant
P1 = p2 ⋯ = pm
Pi = 1/m, i = 1, ⋯, m
2 Monotonically increasing
P1 < p2 < ⋯ < pm
P1# = 0.006,
pi = p1 + (i − 1)d, i = 2, ⋯, m
d=21/mp1m1
3 Constant followed by monotonically increasing
P1 = ⋯ = pk < pk+1 < ⋯ < pm
k# = 50,
pi=Pkk, i=1, , k
pi = pk + (ik)d, i = k + 1, ⋯, m
Pk#=i=1kpi=0.2,
d=2mk(1Pk+Pkkmk+1Pkk)
4 Monotonically increasing followed by constant
P1 < ⋯ < pk = pk+1 = ⋯ = pm
k# = 50
Pk#=i=k+1mpi=0.65
pi = pk − (ki)d, i = 1, ⋯, k − 1
 Pk#=i=k+1mpi=0.65
d=2k1(Pkmk1Pkk)
5 Constant, monotonically increasing followed by constant
p1=p2=pk1<pk1+1<<pk2==pm
k1# = 20, k2# = 70,
pi=Pk1k1, i=1, , k1
pi=pk1+(ik1)d,i=k1+1, , k2
Pk1#=i=1k1pi=0.05,
d=2(1mPk1/k1)( k2k1)[2*(m k2)+( k2k1+1)]
6 Monotonically increasing, constant followed by monotonically increasing
p1<p2<pk1=pk1+1==pk2<<pm
k1# = 25, k2# = 75
pi=Pk2k2k1, i=k1, , k2
pi=pk1(k1i)d1,i=1, , k11
pi=pk2+(ik2)d2,i=k2+1, , m
Note:k12Pk2k2k1<Pk1<k1Pk2k2k1
Pk1#=i=1k1pi=0.15,   Pk2#=i=k1+1k2pi=0.5
d1=2k11(Pk2 k2k1Pk1k1),
d2=2 mk2(1Pk1Pk2+Pk1k1 mk2+1Pk2 k2k1)
[1]

(p1, ⋯, pm) are probabilities and i=1mpi=1.

#:

required parameters.

For all the scenarios, the required parameters for (p1, ⋯, pm) calculation are shown in Table 2. Even if the ICCs are pretty small for most CRTs (Murray, Varnell, et al., 2004; Turner, Prevost, et al., 2004), the association parameter (ρ) ranged from 0 to 0.95 with steps of 0.01 considered for the illustration purposes. 1000 simulation samples are generated for each design. REs are calculated using Eq. (14) for all the samples, and mean, standard deviation, minimum and maximum of REs are obtained at each ρ correspondingly. Source code to reproduce the results is available as Supporting Information on the journal’s web page (http://onlinelibrary.wiley.com/doi/xxx/suppinfo).

Table 2.

Required parameters in simulation designs

Pattern Total Sample Size (N = 200) Total Sample Size (N = 600) Total Sample Size (N = 2000)
m = 5 m = 20 m = 40 m = 6 m = 30 m = 60 m = 20 m = 50 m = 100
n = 40 n = 10 n = 5 n = 100 n = 20 n = 10 n = 100 n = 40 n = 20
2 p1 = 0.1 p1 = 0.006 p1 = 0.006 p1 = 0.006 p1 = 0.006 p1 = 0.006 p1 = 0.006 p1 = 0.006 p1 = 0.006
3 k = 3 k = 10 k = 20 k = 3 k = 15 k = 30 k = 10 k = 25 k = 50
pk = 0.5 pk = 0.4 pk = 0.4 pk = 0.2 pk = 0.2 pk = 0.2 pk = 0.2 pk = 0.2 pk = 0.2
4 k = 2 k = 10 k = 20 k = 3 k = 15 k = 30 k = 10 k = 25 k = 50
pk = 0.65 pk = 0.65 pk = 0.65 pk = 0.65 pk = 0.65 pk = 0.65 pk = 0.65 pk = 0.65 pk = 0.65
5 k1 = 2 k1 = 4 k1 = 8 k1 = 2 k1 = 6 k1 = 12 k1 = 4 k1 = 10 k1 = 20
k2 = 3 k2 = 14 k2 = 28 k2 = 4 k2 = 21 k2 = 42 k2 = 14 k2 = 35 k2 = 70
Pk1=0.05 Pk1=0.05 Pk1=0.05 Pk1=0.05 Pk1=0.05 Pk1=0.05 Pk1=0.05 Pk1=0.05 Pk1=0.05
6 k1 = 2 k1 = 5 k1 = 10 k1 = 2 k1 = 7 k1 = 15 k1 = 5 k1 = 12 k1 = 25
k2 = 3 k2 = 15 k2 = 30 k2 = 5 k2 = 22 k2 = 45 k2 = 15 k2 = 37 k2 = 75
Pk1=0.3 Pk1=0.15 Pk1=0.15 Pk1=0.2 Pk1=0.15 Pk1=0.15 Pk1=0.15 Pk1=0.15 Pk1=0.15
Pk2=0.2 Pk2=0.5 Pk2=0.5 Pk2=0.5 Pk2=0.5 Pk2=0.5 Pk2=0.5 Pk2=0.5 Pk2=0.5

4.2. Results

Figure 2 shows the plots of the mean, minimum and maximum of RE as a function of association parameter (ρ) based on the simulation data with m = 100 and n = 20 for patterns 1–6, respectively. The parameters used in this simulation study are shown in Table 1. As can be seen, the mean RE starts from 1, reaches the minimum, and then increases to 1 at ρ=1. The minimum of mean REs for these six patterns are 0.9871 at ρ=0.05, 0.9734 at ρ=0.05, 0.8589 at ρ=0.05, 0.9343 at ρ=0.08, 0.8962 at ρ=0.07, and 0.9570 at ρ=0.06, respectively. The standard deviations are so small (<0.008) for all the ρ’s and all six patterns, thus we do not include 95% CIs in the plots. When comparing these six patterns, we find that the first pattern shows the highest RE. This is expected since the setting of p1 = p2= ⋯ = pm has the highest chance of equal cluster sizes. The second pattern has the next highest RE while the third pattern shows the lowest one. Pattern 6 has a better RE than patterns 4 and 5. Simultaneously, after the minimum is reached, the first pattern is the least steep, while the third one is very steep, and pattern 5 is a little steeper than pattern 6. The closer to ρ that the minimum of mean REs is reached, the larger range RE has. The same findings are observed across all six patterns. The plots of the mean and range of RE based on the simulation data with m = 20, n = 100 and m = 50, n = 40 for six patterns are shown in Appendix Figure 1 and 2, respectively. Among three situations including small, medium, and large number of clusters, we notice that the range looks narrower when the cluster size increases at any value of ρ for any pattern. For a small cluster size (n = 20), the range of mean RE stays large when ρ increases after the minimum of mean REs is reached for patterns 2–6. There are not many differences otherwise.

Figure 2.

Figure 2

Relative efficiency of the treatment effect for six patterns, with m = 100 and n = 20.

Table 3 presents mean (min, max) of CV, the minimum and median of mean RE and corresponding ρ’s among 1000 repeated samples for all three total sample sizes (200, 600, and 2000), the three situations of cluster sizes and number of clusters, and all six patterns. The first pattern indeed shows the highest RE for any scenario. Below discussions focus on the remaining five patterns.

Table 3.

Minimum and median of mean RE for the different simulation study designs

Total Sample Size (N) Number of Clusters (m) Cluster Sizes (n) Pattern CV
Mean (min, max)
Mean RE[1]
ρ, Minimum[2] ρ, Median[2]
200 5 40 1 0.15 (0.02, 0.34) 0.03, 0.9948 0.48–0.49, 0.9994
2 0.42 (0.17, 0.69) 0.03, 0.9617 0.48–0.49, 0.9952
3 0.29 (0.06, 0.59) 0.02, 0.9833 0.48–0.49, 0.9983
4 0.24 (0.06, 0.45) 0.03, 0.9868 0.48–0.49, 0.9984
5 0.81 (0.69, 0.92) 0.06, 0.7974 0.48–0.49, 0.9311
6 0.35 (0.13, 0.59) 0.03, 0.9717 0.48–0.49, 0.9962
20 10 1 0.31 (0.18, 0.47) 0.10, 0.9754 0.49–0.50, 0.9903
2 0.63 (0.43, 0.84) 0.14, 0.8918 0.51–0.52, 0.9306
3 0.40 (0.19, 0.61) 0.10, 0.9612 0.49–0.50, 0.9843
4 0.52 (0.37, 0.68) 0.16, 0.9170 0.52–0.53, 0.9427
5 0.62 (0.47, 0.81) 0.14, 0.8908 0.50–0.51, 0.9324
6 0.46 (0.26, 0.66) 0.13, 0.9425 0.50–0.51, 0.9669
40 5 1 0.44 (0.29, 0.62) 0.21, 0.9482 0.53–0.54, 0.9649
2 0.63 (0.43, 0.81) 0.24, 0.8924 0.56–0.57, 0.9146
3 0.51 (0.34, 0.71) 0.21, 0.9338 0.54–0.55, 0.9544
4 0.60 (0.43, 0.75) 0.28, 0.8945 0.59–0.60, 0.9112
5 0.70 (0.50, 0.88) 0.28, 0.8600 0.59–0.60, 0.8816
6 0.55 (0.36, 0.76) 0.24, 0.9175 0.56–0.57, 0.9365
600 6 100 1 0.09 (0.01, 0.19) 0.01, 0.9979 0.48–0.49, 0.9999
2 0.73 (0.62, 0.83) 0.03, 0.8485 0.48–0.49, 0.9531
3 0.76 (0.60, 0.91) 0.01, 0.8848 0.48–0.49, 0.9936
4 0.51 (0.44, 0.60) 0.03, 0.9071 0.48–0.49, 0.9848
5 0.72 (0.62, 0.84) 0.02, 0.8450 0.48–0.49, 0.9817
6 0.51 (0.40, 0.64) 0.02, 0.9315 0.48–0.49, 0.9936
30 20 1 0.22 (0.13, 0.31) 0.05, 0.9876 0.48–0.49, 0.9974
2 0.54 (0.44, 0.65) 0.07, 0.9195 0.48–0.49, 0.9725
3 0.80 (0.64, 0.92) 0.05, 0.8620 0.48–0.49, 0.9625
4 0.46 (0.37, 0.55) 0.08, 0.9314 0.49–0.50, 0.9693
5 0.59 (0.48, 0.69) 0.07, 0.9002 0.48–0.49, 0.9636
6 0.36 (0.26, 0.46) 0.06, 0.9654 0.48–0.49, 0.9899
60 10 1 0.31 (0.22, 0.44) 0.10, 0.9742 0.49–0.50, 0.9892
2 0.49 (0.35, 0.61) 0.11, 0.9366 0.49–0.50, 0.9690
3 0.83 (0.71, 0.96) 0.11, 0.8478 0.49–0.50, 0.9217
4 0.51 (0.41, 0.58) 0.15, 0.9206 0.50–0.51, 0.9481
5 0.63 (0.54, 0.72) 0.14, 0.8839 0.51–0.52, 0.9282
6 0.46 (0.36, 0.56) 0.12, 0.9436 0.50–0.51, 0.9699
2000 20 100 1 0.09 (0.06, 0.15) 0.01, 0.9976 0.48–0.49, 0.9999
2 0.56 (0.49, 0.61) 0.02, 0.9146 0.48–0.49, 0.9922
3 0.77 (0.69, 0.84) 0.01, 0.8738 0.48–0.49, 0.9932
4 0.43 (0.38, 0.48) 0.02, 0.9380 0.48–0.49, 0.9929
5 0.55 (0.47, 0.60) 0.02, 0.9140 0.48–0.49, 0.9933
6 0.35 (0.29, 0.41) 0.02, 0.9651 0.48–0.49, 0.9972
50 40 1 0.16 (0.11, 0.22) 0.03, 0.9938 0.48–0.49, 0.9993
2 0.44 (0.38, 0.51) 0.03, 0.9475 0.48–0.49, 0.9925
3 0.79 (0.72, 0.86) 0.03, 0.8667 0.48–0.49, 0.9819
4 0.43 (0.37, 0.48) 0.04, 0.9393 0.48–0.49, 0.9855
5 0.57 (0.51, 0.62) 0.04, 0.9044 0.48–0.49, 0.9820
6 0.34 (0.28, 0.39) 0.03, 0.9686 0.48–0.49, 0.9952
100 20 1 0.22 (0.18, 0.29) 0.05, 0.9871 0.48–0.49, 0.9972
2 0.32 (0.27, 0.38) 0.05, 0.9734 0.48–0.49, 0.9939
3 0.80 (0.74, 0.88) 0.05, 0.8589 0.48–0.49, 0.9612
4 0.45 (0.39, 0.49) 0.08, 0.9343 0.48–0.49, 0.9723
5 0.59 (0.53, 0.64) 0.07, 0.8962 0.48–0.49, 0.9619
6 0.40 (0.34, 0.45) 0.06, 0.9570 0.48–0.49, 0.9867
[1]

The mean RE among 1000 simulations is calculated for each ρ.

[2]

The minimum and median of mean RE including the corresponding ρ’s are identified across all the values of ρ.

  1. Total sample size is small (N=200); the minimums of mean REs are less than 90% for one-third of cases. When the number of clusters is small (m = 5), the minimums of mean RE for pattern 4 and pattern 5 are 0.9868 and 0.7974, respectively, the largest and smallest estimates among the five patterns. Additionally, the minimums are reached at ρ ∈ [0.02, 0.06]. The remaining two situations (m = 20, 40). obtain the largest estimate of minimum of mean RE for pattern 3 and the smallest estimate for pattern 5. We also find that the minimum is reached at the larger ρ when cluster size decreases. For example, ρ is around 0.05 for n = 40, around 0.15 for n = 20, and around 0.25 for n = 5, respectively. The median of mean REs are at least 93% for m = 5 and n = 40, 93% for m = 20 and n =10, 88% for m = 40 and n = 5. The corresponding values of ρ are between 0.48 and 0.49, between 0.49 and 0.53, and between 0.53 and 0.60, respectively.

  2. Total sample size is medium (N=600); All the minimums of mean REs are at least 84% and more than half are above 90%. When the number of clusters is small (m = 6), the minimums of mean RE for pattern 5 and pattern 6 are 0.8450 and 0.9315, respectively. They are the largest and smallest estimates among patterns 2–6. The remaining two situations (m = 30, 60) obtain the smallest estimate of minimum of mean RE for pattern 3. We have the same findings about the trend of ρ, where the minimum is reached, for different cluster sizes: the smaller the cluster size, the larger the ρ. Additionally, all the medians of mean REs are at least 95% for m = 6 and n = 100, 96% for m = 30 and n = 20, and 92% for m = 60 and n = 10, respectively. The corresponding values of ρ are between 0.48 and 0.49, between 0.48 and 0.50, and between 0.49 and 0.52, respectively.

  3. Total sample size is large (N=2000); Such large studies will definitely yield sufficient power. They have the best estimate of minimal REs with at least 85% for all the different settings (m, n, and pattern) and approximately 75% cases are above 90%. The situations (m = 20, 50) obtain the largest estimate of minimum of RE for pattern 6. Pattern 3 demonstrates the smallest estimate of minimum mean RE for all three situations (87.38%, 86.67%, 85.89%). ρ, where the minimum is reached, ranged from 0.01 to 0.08. All the medians of mean REs are larger than 99% for m = 20 and n = 100, above 98% for m = 50 and n = 40, and at least 96% m = 100 and n = 20.

We find that minimum of mean RE decreases when CV increases for all simulation studies. Pattern 1 indicates the lowest CV as expected while pattern 5 has the largest CV in most cases. In summary, the worst scenarios for small CRTs give 21%, 11%, and 14% efficiency loss when the number of cluster is small, medium and large, respectively. They are 16%, 14%, and 16% for medium CRTs. For large CRTs all three are about the same, 14%.

Additionally, Table 3 shows that ρ at which the RE is minimum decreases with increasing average cluster size n given a pattern. Breukelen et al (Van Breukelen, Candel, et al., 2007) shows the RE minimum is reached at ρ = 1/(n + 1) based on their Taylor approximation (Eq. (10)). We have the similar results for large CRTs with mn = 2000, especially for n = 40 and 60. Furthermore, Breukelen et al (Van Breukelen, Candel, et al., 2007) shows that the number of clusters does not affect RE, Eq. (15), given the cluster distribution. We make m twice and keep n same in the simulation designs. The parameters for (p1, ⋯, pm) and simulation results are shown in Appendix Tables 1 and 2. When we compare Table 3 and Appendix Table 2, e.g. m = 5, n = 40 and m = 10, n = 40, the different findings within a same pattern are noticed. For example, pattern 2, the minimums of mean REs are 0.9617 and 0.8802, respectively. On the other hand, they have different CVs, 0.42 (0.17, 0.69) and 0.65 (0.52, 0.78). That is, the same patterns in our simulation setting which strongly depends on (p1, ⋯, pm) may have the different cluster distributions when the number of cluster doubles.

5. Sample size calculations

In sections 5.1 and 5.2, we show the sample size formulas for the different types of outcome based on GEE models with the assumption of equal cluster sizes with no cost constraints and under a given cost, respectively. Section 5.3 demonstrates how the sample sizes are adjusted due to efficiency loss from unequal cluster sizes.

5.1. No cost constraints

For continuous and binary count outcomes, a two-sided test based on β with type I error of α and nominal power η requires the sample size

m=(z1α/2+zη)2Vββ2, (16)

Where β is the treatment effect. Substituting Equations (6) and (9) for a two-group study with the assumption of exchangeable correlation matrix, we have the sample size formulas

m=(z1α/2+zη)2σ2[1+(n1)ρ]π(1π)nβ2, (17)

and

m=(z1α/2+zη)2(1πp1(1p1)+1(1π)p0(1p0))[1+(n1)ρ]n[log(p1(1p0)p0(1p1))]2,

respectively, where zα is the 100 × α percentile of a standard normal distribution. Please note Eq. (17) reduces to Eq. (3) (Van Breukelen and Candel, 2012) when the equal allocation occurs.

For count outcomes, we use formula (3) (Amatya, Bhaumik, et al., 2013) to obtain

m=(z1α/21π+11π+zη1πeβ+11π)21+(n1)ρneβ0β2

through Eq. (11).

5.2. Optimal design

Van Breukelen et al (Van Breukelen and Candel, 2015) define the total budget B in CRTs. Assume the study cost per cluster in the enrollment stage is c currency units (e.g. USD), and each subject costs s currency units. Total budget B includes m clusters and n subjects within a cluster,

B=m(c+sn). (18)

They define “optimal” as that the variance of the treatment effect estimator in the linear regression model is minimized under the cost constraints. In this paper the term “optimal” using GEE models refers to maximum power for a given sampling budget. Actually, they are equivalent given an unbiased point estimator of the treatment effect β. Here, the optimal design is to find the design which maximizes the power given the constraint in Eq. (18).

Define

Δ1=σ2π(1π)β2

for continuous outcome, and

Δ1= 1πp1(1p1)+1(1π)p0(1p0)[log(p1(1p0)p0(1p1))]2

for binary outcome, the sample sizes based on GEE models (Shih, 1997; Pan, 2001) can be re-written as

m=(z1α/2+zη)21+(n1)ρn Δ1. (19)

Alternatively,

zη=n1+(n1)ρmΔ1z1α/2.

For count data,

zη=(n1+(n1)ρmeβ0β2z1α/21π+11π)/1πeβ+11π. (20)

Consequently, the power is calculated by ϕ−1 (zη), where ϕ is cumulative density function of standard normal distribution.

As noted that π, α, Δ1 in Eq. (19) and (β0, β) in Eq. (20) are pre-determined values under null and alternative hypothesis for three types of outcomes, maximizing the power means maximizing

K=nm1+(n1)ρ.

For a known value ρ, the locally optimal design (LOD) is reached at

nLOD=ϑcs,  mLOD=Bϑsc+c, (21)

Where ϑ=1ρρ. It is identical to the optimal designs proposed by other researchers (Raudenbush, 1997; Moerbeek et al., 2000; Van Breukelen and Candel, 2015), even though they use the different definition of “optimal.”

In the scenario of an unknown parameter value of ρ, Van Breukelen and Candel (Van Breukelen and Candel, 2015) proposed the Maximin designs (MMDs) using the linear regression model and the variance of the maximum likelihood estimator. However, there is no closed form for binary outcomes and the variance from binary outcomes needs to be transformed to that of continuous outcomes. When using GEE models, Liu et al (Liu et al., 2017) proposed an algorithm, shown as below, to find the optimal design (mOD, nOD) based on parameter space (ρmin, ρmax) and design space (mmin, mmax).

  • Step 1, Define the parameter and design space, (ρmin, ρmax) and (mmin, mmax), respectively.

  • Step 2, Consider ρ = ρmax, calculate mLOD using equation (21).

  1. If it is within the range (mmin, mmax), then set mOD = mLOD.

  2. If it is outside of (mmin, mmax), then set mOD = mmax. The cluster size nOD is calculated by (BmODc)/s.

Liu et al (Liu and Colditz, 2017) shows proof details of LODs and ODs when GEE models are considered in the sample size calculation. When comparing the algorithm in MMDs based on RE and efficiency (Van Breukelen and Candel, 2015), the revised algorithm is same as one of their maximin designs which maximizes the minimum efficiency. However, it works for any type of outcome including continuous, binary, and count.

Given a fixed budget, we can find the optimal design which maximizes the power for the proposed study but please note that the maximized power may be much different from nominal power η, eg. 80%. When a known value ρ is the same, or the parameter and design space are the same for three different types of outcomes, the optimal sample sizes including number of clusters and cluster size are same for these three as well. However, the maximized powers are different and depends on the outcome type.

5.3. Application

Equal cluster sizes are assumed in sections 5.1 and 5.2. However, equal cluster sizes are not guaranteed in practice. When designing a CRT to examine the treatment effect through GEE models, first we assume equal cluster sizes and n is given. The required number of clusters can be obtained from these two sections. We will then increase number of clusters in order to compensate for the efficiency loss due to unequal cluster sizes. Below are examples to demonstrate the calculation of number of clusters for unequal cluster sizes.

If we consider GEE models to design a CRT without financial constraints, the sample size formulas in section 5.1 are used for calculation. For instance, we want to compare two physical therapy treatments designed to increase muscle flexibility in a CRT. We must determine how many clusters are required to achieve a power of at least 80% to detect a group mean difference with n = 40 at the type I error of 5%. The mean difference of muscle flexibility between the standard treatment and the new treatment is 1, the common standard deviation of muscle flexibility is 3.1, and equal allocation (π = 0.5) and ρ = 0.01 are considered. Using equation (17), the number of clusters m = 20 are needed. The total sample size is mn = 800. We consider this study as a medium CRT with medium number of clusters. As shown in section 4.2, REs are dependent of the pattern of p1, ⋯, pm and the worst scenarios for medium CRTs give 16%, 14%, and 16% efficiency loss for medium CRTs when the number of cluster is small, medium and large, respectively. The adjusted number of clusters will be 20/0.86=24.

If a study is designed under a given budget, then the sample size formulas in section 5.2 will be used. For example, total budget of the proposed study is $55,000, the study cost per cluster in the enrollment stage is $1,000, and each subject costs $100. Using equation (21), the locally optimal design with nLOD ≈ 8.00 and mLOD ≈ 30.55 are obtained given ρ=0.135. It is considered a small CRT with large number of clusters. The worst scenario for small CRTs gives 14% efficiency loss when the cluster size is small. Therefore, the adjusted number of clusters will be 30.55/0.86 ≈ 35.5. We need to enroll 36 clusters with cluster size of 8. The original proposed budget is 30(1000+100*8)=$54,000 and is within the budget while the budget after the adjustment is 36*1000+100*36*8=$64,800 and is beyond the budget. Obviously, the cost could be increased because the number of clusters rise. Therefore, there is a trade-off between the increase in number of clusters due to efficiency loss and the actual cost within the budget in the optimal design.

6. Discussion

The sample size formulas based on GEE methods have been derived in recent years (Liu and Liang, 1997; Shih, 1997; Pan, 2001; Amatya, Bhaumik, et al., 2013). As we notice, the sample size formulas assume the exchangeable working correlation structure and equal cluster sizes. However, the assumption of equal cluster size is not realistic. Therefore, researchers investigate the RE of unequal versus equal cluster sizes when testing the treatment effect. In the paper (Van Breukelen, Candel, et al., 2007), Breukelen et al consider linear regression models to address the RE of unequal versus equal cluster sizes with continuous outcome for CRTs. For a binary outcome in CRTs, Candel and Breukelen (Candel and Van Breukelen, 2010) utilize mixed effect logistic regression model and present the adjusted sample size formula for varying cluster sizes. In this paper, we investigate the REs based on GEE models to test the treatment effect in a two-group comparison in CRTs. The three outcomes of continuous, binary, or count data are discussed simultaneously. The variances of the estimator of the treatment effect were derived for three different types of outcome given the exchangeable working correlation structure. We define RE as the ratio of variance of the estimator of the treatment effect for equal to unequal cluster sizes. The simpler formulas of RE with continuous, binary, and count outcomes are given.

First, we find the formulas of REs are the same for continuous, binary, and count data using GEE models. That is, RE is not dependent on the type of outcome at the stage of study design. Second, RE(β1^) is independent of cluster allocations π and the parameters β0 and β1. Third, the performance of RE for the exchangeable working correlation structure is investigated through simulations since no closed form is provided. We make a conclusion about efficiency loss for the different sizes of CRTs-small, medium, and large. Fourth, the adjusted number of clusters accounting for efficiency loss are illustrated for CRTs without financial constraints and under a budget. Fifth, the expressions (Eq. (14) and (15)) are same as those (Eq. (8) and (9)) in Breukelen et al (Van Breukelen, Candel, et al., 2007). They are an extension of published results on optimal sample sizes and on the relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using GEE models. Last, Breukelen et al (Van Breukelen, Candel, et al., 2007) approximate the RE through Taylor series by assuming that ni, i = 1, ⋯, m are independent realizations of a random variable while we conduct simulation studies to investigate RE performance given that unequal cluster sizes follow a multinomial distribution under mn=i=1mni. The rationale here is that the number of clusters under the assumption of equal cluster sizes in the sample size calculation are obtained, resulting in a fixed sample size (N = mn). Next we investigate the performance of RE for unequal cluster size ni. Breukelen et al (Van Breukelen, Candel, et al., 2007) find that the loss of efficiency could be larger than 10% when CV exceeds 0.63. However, such a large CV is not expected to occur in real cluster randomized trials. As such, they conclude that the loss of efficiency due to variation of cluster sizes rarely exceeds 10 per cent and can be compensated by sampling 11 per cent more clusters. Candel and Breukelen (Candel and Van Breukelen, 2010) find that 14% more clusters are needed to cover the efficiency loss in many cases with a binary outcome in CRTs. Our simulation results show that small CRTs give 21%, 11%, and 14% efficiency loss when the number of cluster is small, medium and large, respectively. They are 16%, 14%, and 16% for medium CRTs. The efficiency loss could be approximately 14%, even in large CRTs under the fact of mn=i=1mni. Please note that the corresponding CVs are larger than 0.63, thus, we make the similar conclusion as Breukelen et al (Van Breukelen, Candel, et al., 2007).

There are several limitations to this approach. The first limitation is that the covariates are not included in the consideration. If we include the covariates in GEE model, the sample size formulas are not shown to be simple as sections 5.1 and 5.2. Moreover, the performance of the sample size formula is sensitive to the distribution of the covariates (Liu and Liang, 1997). However, it is reasonable to calculate the sample size without the consideration of covariates in the design stage in general. The next limitation is that our proposed RE is investigated on the exchangeable working correlation matrix only. The assumption of the exchangeable working correlation matrix may not hold in a real world application. However, the sample size formulas always use this assumption and it is acceptable in practice. The third limitation is that the performance of RE is determined by the pattern of p1, ⋯, pm as well as the number of cluster, cluster sizes, and association parameter (ρ). Obviously, we have no prior knowledge of p1, ⋯, pm. This paper shows the different scenarios of efficiency loss from six basic patterns. Sample size adjustments are made based on the worst scenario in order to be conservative. The last limitation is that GEE models may have poor performance when the number of clusters is not large enough. The performance of RE defined in section 3 and the sample size formula in section 5 must be used cautiously.

In conclusion, this paper discusses the efficiency loss based on GEE models when unequal cluster sizes occur in a real world. We believe that the proposed method is very useful and practical, especially for designing CRTs with any type of outcomes.

Supplementary Material

Supp Fig 1

Figure 1 Mean cluster size for six patterns, with m = 100 and n = 20.

Supp Fig 2

Figure 2 Mean cluster size for six patterns, with m = 20 and n = 100.

Supp Fig 3

Figure 3 Mean cluster size for six patterns, with m = 50 and n = 40.

Supp Tables 1-2

Table 1 Multinomial distribution with probabilities (p1, ⋯, pm) with m = 100 and n = 20

Table 2 Required parameters in simulation designs

Figure 3.

Figure 3

Relative efficiency of the treatment effect for six patterns, with m = 20 and n = 100.

Figure 4.

Figure 4

Relative efficiency of the treatment effect for six patterns, with m = 50 and n = 40.

Acknowledgements

The authors wish to express sincere thanks to associate editor and two reviewers for their constructive and valuable comments and suggestions which considerably improved the manuscript. We thank the Alvin J. Siteman Cancer Center at Washington University School of Medicine and Barnes-Jewish Hospital in St. Louis, MO., for supporting this research (P30 CA91842).

Footnotes

Conflict of Interest

The authors have declared no conflict of interest.

References

  1. Amatya A, Bhaumik D and Gibbons R (2013). Sample size determination for clustered count data. Statistics in Medicine 32:4162–4179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Austin P (2007). A comparison of the statistical power of different methods for the analysis of cluster randomization trials with binary outcomes. Statistics in Medicine 26:3550–3565. [DOI] [PubMed] [Google Scholar]
  3. Campbell MK, Mollison J, Steen N, Grimshaw JM and Eccles M (2000). Analysis of cluster randomized trials in primary care: a practical approach. Family practice 17(2):192–196. [DOI] [PubMed] [Google Scholar]
  4. Candel MJ and Van Breukelen GJ (2010). Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression. Stat Med 29(14):1488–1501. [DOI] [PubMed] [Google Scholar]
  5. Candel MJJM, Van Breukelen GJP, Kotova L and Berger MPF (2008). Optimality of Equal vs. Unequal Cluster Sizes in Multilevel Intervention Studies: A Monte Carlo Study for Small Sample Sizes. Communications in Statistics - Simulation and Computation 37(1):222–239. [Google Scholar]
  6. Donner A, Birkett N and Buck C (1981). Randomization by cluster-sample size requirements and analysis. Am J Epidemiol 114:906–914. [DOI] [PubMed] [Google Scholar]
  7. Eldridge S, Ashby D and Kerry S (2006). Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int J Epidemiol 35:1292–1300. [DOI] [PubMed] [Google Scholar]
  8. Gravenstein S, Dahal R, Gozalo PL, Davidson HE, Han LF, Taljaard M and Mor V (2016). A cluster randomized controlled trial comparing relative effectiveness of two licensed influenza vaccines in US nursing homes: Design and rationale. Clinical trials (London, England). [DOI] [PubMed] [Google Scholar]
  9. Gulliford MC, van Staa TP, McDermott L, McCann G, Charlton J and Dregan A (2014). Cluster randomized trials utilizing primary care electronic health records: methodological issues in design, conduct, and analysis (eCRT Study). Trials 15:220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Jeffe DB, Perez M, Cole EF, Liu Y and Schootman M (2016). The Effects of Surgery Type and Chemotherapy on Early-Stage Breast Cancer Patients’ Quality of Life Over 2-Year Follow-up. Annals of surgical oncology 23(3):735–743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Kalfon P, Mimoz O, Loundou A, Geantot MA, Revel N, Villard I, Amour J, Azoulay E, Garrouste-Orgeas M, Martin C, Sharshar T, Baumstarck K and Auquier P (2016). Reduction of self-perceived discomforts in critically ill patients in French intensive care units: study protocol for a cluster-randomized controlled trial. Trials 17(1):87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Liang K-Y and Zeger SL (1986). Longitudinal Data Analysis Using Generalized Linear Models. Biometrika 73(1):13–22. [Google Scholar]
  13. Lin CC, Bruinooge SS, Kirkwood MK, Olsen C, Jemal A, Bajorin D, Giordano SH, Goldstein M, Guadagnolo BA, Kosty M, Hopkins S, Yu JB, Arnone A, Hanley A, Stevens S and Hershman DL (2015). Association Between Geographic Access to Cancer Care, Insurance, and Receipt of Chemotherapy: Geographic Distribution of Oncologists and Travel Distance. Journal of clinical oncology : official journal of the American Society of Clinical Oncology 33(28):3177–3185. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Liu G and Liang K-Y (1997). Sample Size Calculations for Studies with Correlated Observations. Biometrics 53(3):937–947. [PubMed] [Google Scholar]
  15. Liu J and Colditz GA (2017). Optimal design of longitudinal data analysis using generalized estimating equation models. Biometrical journal. Biometrische Zeitschrift 59(2):315–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Manatunga AK, Hudgens MG and Chen S (2001). Sample Size Estimation in Cluster Randomized Studies with Varying Cluster Size. Biometrical Journal 43(1):75–86. [Google Scholar]
  17. Mehring M, Haag M, Linde K, Wagenpfeil S and Schneider A (2016). Effects of a Web-Based Intervention for Stress Reduction in Primary Care: A Cluster Randomized Controlled Trial. Journal of medical Internet research 18(2):e27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Moerbeek M, Van Breukelen G and Berger M (2000). Design Issues for Experiments in Multilevel Populations. Journal of Educational and Behavioral Statistics 25(3):271–284. [Google Scholar]
  19. Murray DM, Varnell SP and Blitstein JL (2004). Design and analysis of group-randomized trials: a review of recent methodological developments. American journal of public health 94(3):423–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Nagayama H, Tomori K, Ohno K, Takahashi K, Ogahara K, Sawada T, Uezu S, Nagatani R and Yamauchi K (2016). Effectiveness and Cost-Effectiveness of Occupation-Based Occupational Therapy Using the Aid for Decision Making in Occupation Choice (ADOC) for Older Residents: Pilot Cluster Randomized Controlled Trial. PloS one 11(3):e0150374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Pan W (2001). Sample size and power calculations with correlated binary data. Control Clin Trials 22(3):211–227. [DOI] [PubMed] [Google Scholar]
  22. Park YH, Jung KH, Im SA, Sohn JH, Ro J, Ahn JH, Kim SB, Nam BH, Oh do Y, Han SW, Lee S, Park IH, Lee KS, Kim JH, Kang SY, Lee MH, Park HS, Woo SY, Jung SH, Ahn JS and Im YH (2015). Quality of life (QoL) in metastatic breast cancer patients with maintenance paclitaxel plus gemcitabine (PG) chemotherapy: results from phase III, multicenter, randomized trial of maintenance chemotherapy versus observation (KCSG-BR07–02). Breast cancer research and treatment 152(1):77–85. [DOI] [PubMed] [Google Scholar]
  23. Raudenbush S (1997). Statistical analysis and optimal design for cluster randomized trials. Psychol Methods 2:173–185. [DOI] [PubMed] [Google Scholar]
  24. Rosner B and Glynn R (2011). Power and Sample size estimation for the clustered Wilcoxon test. Biometrics 67:646–653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Rosner B, Glynn R and Lee M (2003). Incorporation of clustering effects for the Wilcoxon rank sum test: a large-sample approach. Biometrics 59:1089–1098. [DOI] [PubMed] [Google Scholar]
  26. Sanda MG, Dunn RL, Michalski J, Sandler HM, Northouse L, Hembroff L, Lin X, Greenfield TK, Litwin MS, Saigal CS, Mahadevan A, Klein E, Kibel A, Pisters LL, Kuban D, Kaplan I, Wood D, Ciezki J, Shah N and Wei JT (2008). Quality of life and satisfaction with outcome among prostate-cancer survivors. The New England journal of medicine 358(12):1250–1261. [DOI] [PubMed] [Google Scholar]
  27. Shih WJ (1997). Sample Size and Power Calculations for Periodontal and Other Studies with Clustered Samples Using the Method of Generalized Estimating Equations. Biometrical Journal 39(8):899–908. [Google Scholar]
  28. Teerenstra S, Lu B, Preisser JS, van Achterberg T and Borm GF (2010). Sample size considerations for GEE analyses of three-level cluster randomized trials. Biometrics 66(4):1230–1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Toriola AT, Liu J, Ganz PA, Colditz GA, Yang L, Izadi S, Naughton MJ, Schwartz AL and Wolin KY (2015). Effect of weight loss on bone health in overweight/obese postmenopausal breast cancer survivors. Breast cancer research and treatment 152(3):637–643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Turner RM, Prevost AT and Thompson SG (2004). Allowing for imprecision of the intracluster correlation coefficient in the design of cluster randomized trials. Statistics in medicine 23(8):1195–1214. [DOI] [PubMed] [Google Scholar]
  31. Van Breukelen G and Candel M (2015). Efficient design of cluster randomized and multicentre trials with unknown intraclass correlation. Stat Methods Med Res 24(5):540–556. [DOI] [PubMed] [Google Scholar]
  32. Van Breukelen GJ and Candel MJ (2012). Calculating sample sizes for cluster randomized trials: we can keep it simple and efficient! Journal of clinical epidemiology 65(11):1212–1218. [DOI] [PubMed] [Google Scholar]
  33. Van Breukelen GJ, Candel MJ and Berger MP (2008). Relative efficiency of unequal cluster sizes for variance component estimation in cluster randomized and multicentre trials. Stat Methods Med Res 17(4):439–458. [DOI] [PubMed] [Google Scholar]
  34. Van Breukelen GJP, Candel MJJM and Berger MPF (2007). Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Statistics in Medicine 26(13):2589–2603. [DOI] [PubMed] [Google Scholar]
  35. Yamagata K, Makino H, Iseki K, Ito S, Kimura K, Kusano E, Shibata T, Tomita K, Narita I, Nishino T, Fujigaki Y, Mitarai T, Watanabe T, Wada T, Nakamura T and Matsuo S (2016). Effect of Behavior Modification on Outcome in Early- to Moderate-Stage Chronic Kidney Disease: A Cluster-Randomized Trial. PloS one 11(3):e0151422. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Fig 1

Figure 1 Mean cluster size for six patterns, with m = 100 and n = 20.

Supp Fig 2

Figure 2 Mean cluster size for six patterns, with m = 20 and n = 100.

Supp Fig 3

Figure 3 Mean cluster size for six patterns, with m = 50 and n = 40.

Supp Tables 1-2

Table 1 Multinomial distribution with probabilities (p1, ⋯, pm) with m = 100 and n = 20

Table 2 Required parameters in simulation designs

RESOURCES