Sample size estimation for stratified individual and cluster randomized trials with binary outcomes

Lee Kennedy-Shaffer; Michael D Hughes

doi:10.1002/sim.8492

. Author manuscript; available in PMC: 2021 May 15.

Published in final edited form as: Stat Med. 2020 Jan 31;39(10):1489–1513. doi: 10.1002/sim.8492

Sample size estimation for stratified individual and cluster randomized trials with binary outcomes

Lee Kennedy-Shaffer ^1,^*, Michael D Hughes ¹

PMCID: PMC7247053 NIHMSID: NIHMS1584367 PMID: 32003492

Abstract

Individual randomized trials (IRTs) and cluster randomized trials (CRTs) with binary outcomes arise in a variety of settings and are often analyzed by logistic regression (fitted using generalized estimating equations for CRTs). The effect of stratification on the required sample size is less well understood for trials with binary outcomes than for continuous outcomes. We propose easy-to-use methods for sample size estimation for stratified IRTs and CRTs and demonstrate the use of these methods for a tuberculosis prevention CRT currently being planned. For both IRTs and CRTs, we also identify the ratio of the sample size for a stratified trial versus a comparably-powered unstratified trial, allowing investigators to evaluate how stratification will affect the required sample size when planning a trial. For CRTs, these can be used when the investigator has estimates of the within-stratum intra-cluster correlation coefficients (ICCs) or by assuming a common within-stratum ICC. Using these methods, we describe scenarios where stratification may have a practically important impact on the required sample size. We find that in the two-stratum case, for both IRTs and for CRTs with very small cluster sizes, there are unlikely to be plausible scenarios in which an important sample size reduction is achieved when the overall probability of a subject experiencing the event of interest is low. When the probability of events is not small, or when cluster sizes are large, however, there are scenarios where practically important reductions in sample size result from stratification.

Keywords: sample size, stratification, cluster randomized trials, generalized estimating equations, intracluster correlation coefficient, design effect

1 |. INTRODUCTION

Clinical trials often have binary outcomes, such as death or disease acquisition within a specified time period, as primary endpoints. While methods to analyze binary data are well-known, the effects of stratification on power, efficiency, and sample size are more complicated for binary outcomes than for normally-distributed continuous outcomes. ^1,2 Perhaps because of this, use of stratification or covariate adjustment remains inconsistent for trials with binary outcomes.^3,4 While some studies have shown an increase in power and consequent decrease in required sample size by stratifying on a covariate predictive of the outcome, easily-used methods to quantify this reduction and properly size a stratified trial are hard to find. ^4,5 Simulation studies have shown that the specific parameters matter a great deal in this reduction, with one study finding anywhere from negligible (3%) to large (46%) reductions in sample size for a study that adjusts for covariates compared to a comparably-powered study without adjustment.³

For cluster randomized trials (CRTs), stratification is often proposed to improve balance in covariates between treatment arms or to ameliorate practical challenges of study implementation.^6,7 The effects of stratification on sample size and power, however, are less frequently discussed. Methods for determining the required sample size for a CRT often ignore stratification or consider it only with continuous outcomes or in special cases with simple design effects or assumptions about cluster sizes.^7–10

In recent years, use of generalized estimating equations (GEEs) for the analysis of CRTs with large numbers of clusters has become more common.^6,7,11,12 Robust variance estimators can be used such that the variance is consistently estimated even if the working covariance matrix is misspecified.¹³ Since they rely on asymptotic properties, GEE methods are not appropriate for trials with a small number of clusters.¹³ In particular, for binary outcomes, GEE methods with few clusters lead to anti-conservative results, inflating the Type I error.¹² This has led to suggestions that GEEs are most preferable for CRTs with a large number of relatively small clusters.^7,9,14,15 Various rules of thumb have been suggested for the minimum number of clusters required to use standard GEE methods, ranging from as few as 10 clusters⁹ to at least 40.^16–19 In this paper, we consider sample size methods suitable for CRTs with binary outcomes analyzed using GEE methods; throughout, we assume that there are enough clusters for the asymptotic properties of GEE estimators to hold approximately. While many CRTs are not large enough to be analyzed by stratified GEEs, there have recently been many examples of CRTs with reasonably large numbers of clusters, studying out-of-hospital medical interventions,^20,21 health policies,^22,23 and especially infectious disease interventions.^24–28 This work was originally motivated by the design of a tuberculosis prevention trial, described in the example in Section 4, which plans to enroll about 1600 households as clusters.²⁹

Stratification in both individual randomized trials (IRTs) and CRTs can lead to reductions in the required sample size when the stratification variable is predictive of the outcome.^1,10,30,31 In order to properly plan these trials, however, investigators need to be able to determine the sample size for the stratified trial. For CRTs, these sample size calculations must be flexible enough to incorporate varying cluster sizes and design effects suitable for the trial at hand.^7,10,32 In considering stratification, investigators must weigh any logistical challenges associated with stratification with the potential benefits of a reduced sample size or increased power. The ratio of the required sample size for a stratified trial versus that for a comparably-powered unstratified trial is useful in determining whether the benefits of stratification outweigh the potential costs. Similar metrics have been discussed in the context of IRTs,^1,3 for unequal versus equal cluster sizes in CRTs,^33,34 and in simulation studies of CRT analysis methods.³⁵ Here, we present analytic formulae for the sample size required for a stratified trial and the ratio of the sample sizes required for stratified versus unstratified IRTs and CRTs with binary outcomes in the context of stratification by a cluster-level covariate.

In Section 2, we first review a method proposed by Gail for determining the sample size required for stratified IRTs with binary outcomes.³⁶ We then present a novel expression for the ratio of the sample size required for a stratified IRT with that for a comparably powered trial without stratification. We illustrate use of this expression to explore whether practically important reductions in sample size might be achieved by stratifying IRTs for the case of two strata. In Section 3, we develop a new approach, similar to that used for IRTs, for sample size estimation for stratified CRTs with binary outcomes by using a weighted average of within-stratum GEE estimators. We then present an expression, derived from this sizing method, for the ratio of the sample size required for a stratified CRT with that for a comparable CRT without stratification. By considering illustrative examples with commonly-used design effects, we determine settings where stratification may lead to practically important sample size reductions for CRTs and settings where stratification is unlikely to lead to such reductions and trial planning can proceed based on unstratified methods. In Section 4, we illustrate the use of these methods for a planned CRT of a prophylactic tuberculosis drug and describe the implications of these results. Finally, in Section 5, we discuss the utility of these methods as well as limitations and potential areas for further research.

2 |. STRATIFIED INDIVIDUAL RANDOMIZED TRIALS

2.1 |. Notation for IRTs

Consider first an unstratified IRT with N subjects, labeled ℓ = 1, …, N. Assume that the subjects are randomly assigned in equal numbers to either the experimental treatment arm (x_ℓ = 1) or the control arm (x_ℓ = 0). Let Y_ℓ denote the binary outcome of subject ℓ and let π₁ = E[Y_ℓ|x_ℓ = 1] and π₀ = E[Y_ℓ|x_ℓ = 0] denote the probability of subject ℓ experiencing the event of interest under treatment and control, respectively. Then the overall treatment effect might be evaluated through the log-odds ratio comparing treatment to control, given by $β = \log (\frac{π_{1} / (1 - π_{1})}{π_{0} / (1 - π_{0})}) .$

Now consider this trial with the subjects categorized into S mutually exclusive strata, labeled s = 1, …, S. Stratum s has n_s subjects, labeled i = 1, …, n_s, where $\sum_{s = 1}^{S} n_{S} = N .$ Let $f_{s} = \frac{n_{s}}{N}$ denote the proportion of subjects in stratum s. Within each stratum, half of the subjects are randomly assigned to each arm. Let π_1s = E[Y_si|x_si = 1, s] and π_0s = E[Y_si|x_si = 0, s] be the probability of a subject in stratum s experiencing the event under treatment and control, respectively. By the law of total expectation, the probability of the event for an individual, ignoring stratification, is the weighted sum of the within-stratum probabilities, with weights equal to the proportion of subjects in each stratum. So $π_{0} = \sum_{s = 1}^{S} f_{s} π_{0 s}$ and $π_{1} = \sum_{s = 1}^{S} f_{s} π_{1 s}$ . The within-stratum log-odds ratio of treatment for stratum s is given by $β_{s} = \log (\frac{π_{1 s} / (1 - π_{1 s})}{π_{0 s} / (1 - π_{0 s})}) .$

2.2 |. Sample Size Estimation for Unstratified IRTs

In an IRT without stratification, the log-odds ratio of treatment can be estimated by $\hat{β} = \log (\frac{{\hat{π}}_{1} / (1 - {\hat{π}}_{1})}{{\hat{π}}_{0} / (1 - {\hat{π}}_{0})}),$ where ${\hat{π}}_{1}$ and ${\hat{π}}_{0}$ are the observed proportions of events under treatment and control, respectively. The approximate variance of this estimator in large samples is $Var (\hat{β}) = \frac{1}{N_{I R T}} (\frac{2}{π_{0} (1 - π_{0})} + \frac{2}{π_{1} (1 - π_{1})})$ .³⁶ Hence, the sample size for an IRT without stratification, which we denote by N_IRT, required to detect β = b in a two-sided test of H₀ : β = 0 versus H _A : β = b ≠ 0, with significance level α and power 1 − γ is ³⁶:

N_{I R T} = \frac{{(Z_{a / 2} + Z_{γ})}^{2}}{b^{2}} (\frac{2}{π_{0} (1 - π_{0})} + \frac{2}{π_{1} (1 - π_{1})}),

(1)

where Z_α/2 and Z_γ are the standard normal distribution critical values for upper tail probabilities of α/2 and γ, respectively.

2.3 |. Sample Size Estimation for Stratified IRTs

For stratified IRTs with binary outcomes, sample size estimation can be approached using an inverse-variance weighted estimator, as shown by Gail.³⁶ The log-odds ratio of treatment within stratum s can be estimated by ${\hat{β}}_{s} = \log (\frac{{\hat{π}}_{1 s} / (1 - {\hat{π}}_{1 s})}{{\hat{π}}_{0 s} / (1 - {\hat{π}}_{0 s})}),$ where ${\hat{π}}_{1 s}$ and ${\hat{π}}_{0 s}$ are the observed proportions of events within stratum s under treatment and control, respectively. The approximate variance of this estimator for large n_s is $Var ({\hat{β}}_{s}) = \frac{1}{n_{s}} (\frac{2}{π_{0 s} (1 - π_{0 s})} + \frac{2}{π_{1 s} (1 - π_{1 s})})$ .³⁶

We assume that the within-stratum log-odds ratio of treatment is constant across all strata, so that β_s = β* for all s. Note that when β* ≠ 0 and the stratifying variable is predictive of the outcome, β* will not equal the overall log-odds ratio, β, in the total population due to the non-collapsibility of the odds ratio.³⁷ This, along with the mean-variance relationship, distinguishes the binary outcome setting from the continuous outcome setting.^1,4

Now, the minimum variance linear unbiased estimate of β* is given by the inverse-variance weighted estimator³⁶:

{\hat{β}}^{*} = \frac{\sum_{s = 1}^{S} \frac{1}{Var ({\hat{β}}_{s})} {\hat{β}}_{s}}{\sum_{s = 1}^{S} \frac{1}{Var ({\hat{β}}_{s})}}, with large-sample variance Var ({\hat{β}}^{*}) = {(\sum_{s = 1}^{S} \frac{1}{Var ({\hat{β}}_{s})})}^{- 1} .

(2)

Hence, the sample size in a stratified IRT, which we denote by N_{I RT(S)}, required to detect β* = b* in a two-sided test of H₀ : β* = 0 versus H _A : β* = b* ≠ 0, with significance level α and power 1 − γ is³⁶:

N_{I RT (S)} = \frac{{(Z_{a / 2} + Z_{γ})}^{2}}{{(b^{*})}^{2}} {[\sum_{s = 1}^{s} f_{s} {(\frac{2}{π_{0 s} (1 - π_{0 s})} + \frac{2}{π_{1 s} (1 - π_{1 s})})}^{- 1}]}^{- 1} .

(3)

2.4 |. Ratio of Sample Size for Comparably-Powered Stratified and Unstratified IRTs

In this section, we develop a novel expression for the ratio of the sample size required for a stratified IRT to the sample size required for a comparably-powered IRT without stratification. There is a within-stratum treatment effect if and only if there is an overall treatment effect, so β = 0 ⇔ β* = 0. Thus, the hypothesis test of H₀ : β = 0 is equivalent to a test of H₀ : β* = 0 and vice versa.¹ A test of the overall log-odds ratio powered for the alternative hypothesis H _A : β = b ≠ 0 corresponds to a test of the conditional log-odds ratio powered for the alternative hypothesis H _A : β* = b* ≠ 0 where b and b* solve:

\frac{\exp (b) π_{0}}{1 - π_{0} + \exp (b) π_{0}} = π_{1} = \sum_{s = 1}^{S} f_{s} π_{1 s} = \sum_{s = 1}^{S} \frac{f_{s} \exp (b^{*}) π_{0 s}}{1 - π_{0 s} + \exp (b^{*}) π_{0 s}} .

(4)

There is no closed form formula for b* as a function of b.

A stratified trial with sample size given by N_{I RT(S)} in equation (3) corresponds to an unstratified trial with sample size given by N_{I RT} in equation (1) with the same α and γ, $π_{0} = \sum_{s = 1}^{S} f_{s} π_{0 s}$ , $π_{1} = \sum_{s = 1}^{S} f_{s} π_{1 s}$ , and b and b* related by equation (4). Thus, the ratio of the sample size required for a stratified IRT to the sample size required for the comparably powered IRT without stratification, is given by:

R_{I RT} = \frac{N_{I RT (S)}}{N_{I RT}} = {(\frac{b}{b^{*}})}^{2} \frac{{(\frac{1}{π_{0} (1 - π_{0})} + \frac{1}{π_{1} (1 - π_{1})})}^{- 1}}{\sum_{s = 1}^{S} f_{s} {(\frac{1}{π_{0 s} (1 - π_{0 s})} + \frac{1}{π_{1 s} (1 - π_{1 s})})}^{- 1}} = {(\frac{b}{b^{*}})}^{2} \frac{1}{\sum_{s = 1}^{S} f_{s} \frac{V}{V_{s}}},

(5)

where $V_{s} = \frac{1}{π_{0 s} (1 - π_{0 s})} + \frac{1}{π_{1 s} (1 - π_{1 s})}$ and $V = \frac{1}{π_{0} (1 - π_{0})} + \frac{1}{π_{1} (1 - π_{1})}$ .

Although Robinson and Jewell did not formalize a comparison of sample sizes through a ratio such as R_{I RT}, they did establish results that indicate that R_{I RT} < 1 when the stratifying variable is predictive of outcome.¹ Specifically, they showed that under stratified randomization, for β ≠ 0, β will lie between 0 and β*; that is, the overall log-odds ratio β will be closer to zero than the common within-stratum log-odds ratio β*. Without loss of generality, assuming that b < 0, then b* < b < 0 and so the first term in the expression for R_{I RT} in equation (5), (b/b*)², is less than one. They also showed, however, that the variance of the stratified estimator is higher than the variance of the unstratified estimator, that is that the second term, $\frac{1}{\sum_{s = 1}^{S} f_{s} \frac{V}{V_{s}}} > 1.$ Overall, they showed that the test of no treatment effect is more powerful for the stratified IRT than the unstratified IRT for a fixed sample size N, indicating that R_{I RT} < 1. This is because β* is sufficiently further from zero than β so as to overcome the increased variance.

2.5 |. Ratio of Sample Size in an IRT with Two Strata Versus an Unstratified IRT

To illustrate the effect of stratification on sample size estimation, we consider an IRT with two strata, a “high-risk” stratum and a “low-risk” stratum, versus a comparably-powered IRT without stratification, in the situation when b = log(0.5). We consider a setting with a low overall probability of events, specifically π₀ = 0.05, and a setting with a moderate probability of events, specifically π₀ = 0.50. For a hypothesized overall log-odds ratio of b = log(0.5) in the IRT without stratification, panels a and b in Figure 1 show the ratio of sample sizes, R_{I RT}, by f₁ for selected values of π₀₁ for π₀ = 0.05 and π₀ = 0.5, respectively. When the stratifying variable becomes more predictive of the outcome (i.e., when π₀₁ decreases and/or f₁ increases, and so π₀₂ must be further from π₀₁ to maintain the same overall π₀), R_{I RT} decreases, indicating a greater benefit, in terms of sample size, due to stratification. As a measure of what might be considered practically important reductions in sample size achieved by stratification, combinations of the parameters π₀₁ and f₁ with R_{I RT} ≤ 0.90 (i.e. ≥ 10% reduction) are indicated with a solid line in these two panels; combinations achieving smaller reductions are indicated with a dotted line. Panels c and d in Figure 1 then use the same solid and dashed lines to show the values of π₀₂ that correspond to combinations of π₀₁ and f₁ (to achieve the specified π₀) that do and do not give reductions in sample size of ≥ 10%. R code for all key formulae and for the results and figures presented throughout is available in the online Supporting Information. Additionally, a user-friendly RStudio Shiny web application that implements these formulae for up to three strata can be accessed at https://leekshaffer.shinyapps.io/stratcrt/.

Figure 1. — Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Individual Randomized Trials, *R_{I RT}* (Panels a and b), and the Probability of an Event in the Control Arm of the High-Risk Stratum, π₀₂ (Panels c and d), versus the Proportion of Individuals in the Low-Risk Stratum (f₁) for Low and Moderate Overall Probability of Events (π₀ = 0.05 and π₀ = 0.50) Where $π_{0} = \sum_{s = 1}^{2} f_{s} π_{0 s}$ . π₀₁ is the Probability of an Event in the Control Arm of the Low-Risk Stratum. Note: Solid lines indicate combinations of parameters such that *R_{I RT}* ≤ 0.90; dotted lines indicate that *R_{I RT}* > 0.90.

For IRTs with π₀ = 0.05 (panels a and c), a two-category stratifying variable must be highly predictive of the outcome and f₁ reasonably large in order to yield reductions of ≥ 10% in the required sample size. For example, with π₀₁ = 0.01, R_{I RT} is approximately 0.90 when f₁ = 0.8 and thus π₀₂ = 0.21, which is likely an unrealistic scenario for the difference between π₀₁ and π₀₂. If π₀₁ = 0.03, then R_{I RT} = 0.90 requires f₁ = 0.94 and π₀₂ = 0.36, an even larger risk difference between the two strata. When π₀ = 0.5 (panels b and d), there are scenarios which might perhaps be realistic in practice for which stratification would achieve reductions in sample size of ≥ 10%. For example, with π₀₁ = 0.40 and f₁ = 0.72, R_{I RT} is approximately 0.90. This requires π₀₂ = 0.76, which may or may not be reasonable depending on the trial context. For a lower π₀₁ = 0.35, the ratio of R_{I RT} = 0.90 is achieved with f₁ = 0.55 and π₀₂ = 0.68. In a high-probability setting of π₀ = 0.90 (displayed in Figure S1), the upper bound on the high-stratum probability limits the number of settings that can be considered. A ratio of 0.90 can be achieved with f₁ = 0.53, π₀₁ = 0.825, and π₀₂ = 0.985, requiring a very high-risk stratum.

For a treatment that reduces the probability of events versus control (so b is negative), the quantity (b/b*) decreases further from one as the treatment effect increases (i.e., b becomes more negative), holding all other parameters constant.³⁷ Thus, R_{I RT} also decreases further from one as the treatment effect increases. That is, as the hypothesized treatment effect becomes stronger, stratification will lead to a greater reduction in required sample size. Conversely, for the case of two strata, our results for b = log(0.5) suggest that stratification when odds ratios are between 0.5 and 1.0 may not, in practice, lead to meaningful reductions in sample size.

The trends in our results accord well with the results from a simulation study conducted by Hernández, Steyerberg, and Habbema.³ As an example, for π₀ = 0.50 and b = log(1.4) = 0.34, their simulations indicated that a reduction in sample size of 13.7% (corresponding to R_{I RT} = 0.863) required f₁ = 0.50 and an odds ratio for the event with respect to the stratifying variable of 5, which corresponds to π₀₁ = 0.31 and π₀₂ = 0.69.³ Using equation (5) with the same parameters yields R_{I RT} = 0.861, indicating agreement with their simulated results. Our results agree in the other settings discussed in their paper as well, including those presented in their Tables 3 and 4.³ More generally, as we found based on our analytical expression for R_{I Rt}, they found that highly predictive stratification variables were necessary for substantial reductions in sample size, and that decreasing the probability of events reduced the sample size reduction. ³

3 |. STRATIFIED CLUSTER RANDOMIZED TRIALS

3.1 |. Notation for CRTs

Consider now an unstratified CRT with N clusters, labeled ℓ = 1, … , N. Cluster ℓ has m_ℓ subjects labeled j = 1, … , m_ℓ. Assume that the clusters are randomly assigned in equal numbers to either the experimental treatment arm (x_ℓ = 1) or the control arm (x_ℓ = 0) and that every subject within a cluster receives the same randomized intervention. Let Y_ℓj denote the binary outcome of subject j in cluster ℓ. Let π₁ = E[Y_ℓj|x_ℓ = 1] and π₀ = E[Y_ℓj|x_ℓ = 0] denote the marginal probability of a subject experiencing the event of interest under treatment and control, respectively.

In this paper, we focus on a subject-level analysis of outcomes comparing treatment to control using a marginal logistic regression model, fitted using GEEs, to estimate the marginal log-odds ratio $β = \log (\frac{π_{1} / (1 - π_{1})}{π_{0} / (1 - π_{0})}) .$ ¹¹ We use the GEE approach because of its flexibility in specifying correlation structures, its ability to handle cases where cluster sizes vary, and its popularity as an approach to analysis in the CRT literature.^12,35

Now consider a CRT with S strata, labeled s = 1, … , S. Stratum s has n_s clusters, labeled i = 1, … , n_s, such that the total number of clusters is $N = \sum_{s = 1}^{S} n_{s}$ . Cluster i in stratum s has m_si subjects, labeled j = 1, … , m_si. Denote the mean cluster size in stratum s by ${\bar{m}}_{s} = \frac{1}{n_{s}} \sum_{i = 1}^{n_{s}} m_{s i},$ the overall mean cluster size by $\bar{m} = \frac{1}{N} \sum_{s = 1}^{S} \sum_{i = 1}^{n_{s}} m_{s i},$ , and the proportion of individuals in stratum s by $f_{s} = \frac{\sum_{i = 1}^{n_{s}} m_{s i}}{\sum_{s = 1}^{S} \sum_{i = 1}^{n_{s}} m_{s i}} = \frac{n_{s} {\bar{m}}_{s}}{N \bar{m}}$ . The proportion of clusters in stratum s is given by $g_{s} = \frac{n_{s}}{N} = f_{s} \frac{\bar{m}}{{\bar{m}}_{s}}$ . In the special case where the mean cluster size does not vary across the strata, f_s = g_s for all s. As in the unstratified case, the same treatment is applied to every subject in a given cluster (this treatment is denoted x_si for cluster i in stratum s) and the binary outcome for individual j in cluster i in stratum s is denoted Y_sij. Within each stratum, half of the clusters are randomly assigned to each arm. Let π_1s = E[Y_sij|x_si = 1, s] and π_0s, = E[Y_sij|x_si = 0, s] denote the within-stratum probability of a subject in stratum experiencing the event of interest under treatment and control, respectively.

As in the IRT setting, we denote the marginal log-odds ratio comparing treatment to control within stratum s by $β_{s} = \log (\frac{π_{1 s} / (1 - π_{1 s})}{π_{0 s} / (1 - π_{0 s})}) .$ and assume that this is the same across all strata, i.e. β_s = β* for all s. Since these overall and stratum-specific parameters are marginal with respect to cluster membership and rely only on the mean specification and not the correlation structure, equation (4) holds as in the IRT case. Thus, when the stratifying variable is associated with the outcome and there is an effect of treatment, we will have 0 < |β| < |β*|. As before, without loss of generality, we take β* < β < 0 for our illustrative examples. Since the following formulae are symmetric to changing the treatment and control designation, and the hypothesized treatment effect appears only in a squared term, these formulae all apply equally well to the case of a treatment that increases the probability of event, where 0 < β < β*.

3.2 |. Sample Size Estimation for Unstratified CRTs

For a general unstratified CRT, the usual approach to sample size estimation is first to determine the required sample size, N_{I RT}, for a similar IRT with the desired significance level and power, and then to obtain the sample size for the CRT, $N_{C RT} = N \bar{m}$ , by multiplying N_{I RT} by a design effect or variance inflation factor.^7,9 This factor, which we denote by $F = \frac{N_{C RT}}{N_{I RT}}$ is generally a function of the cluster sizes and the intracluster correlation coefficient (ICC), denoted by ρ.^8,10,11 Then the sample size, in terms of total subjects, required for an unstratified CRT with design effect F, to be analyzed using GEEs, testing H₀ : β = 0 vs. H _A : β = b ≠ 0 with power 1 − γ and significance level α, is given by:

N_{C RT} = F \frac{{(Z_{α / 2} + Z_{γ})}^{2}}{b^{2}} (\frac{2}{π_{0 s} (1 - π_{0})} + \frac{2}{π_{1} (1 - π_{1})}) .

(6)

Pan provides formulae for the design effect F for a variety of cases for trials to be analyzed using GEEs, depending on the true and working correlation structures.¹¹ When the cluster sizes are constant (i.e., $m_{ℓ} = \bar{m}$ for all ℓ) and the true correlation structure is exchangeable, $F = 1 + (\bar{m} - 1) ρ$ , which we denote by F_A, irrespective of whether an independent or exchangeable working correlation structure is used. For the case when cluster sizes are not constant, this choice of F underestimates the required sample size in a CRT. When the true correlation structure is exchangeable and the cluster sizes are known, Pan proposes the use of the following formulae:

F = \frac{N \bar{m}}{\sum_{ℓ = 1}^{N} \frac{m_{ℓ}}{1 + (m_{ℓ} - 1) ρ}}, if an exchangeable working correlation structure is used, and

(7)

F = \frac{N \bar{m} \sum_{ℓ = 1}^{N} m_{ℓ} [1 + (m_{ℓ} - 1) ρ]}{{(\sum_{ℓ = 1}^{N} m_{ℓ})}^{2}}, if an independent working correlation structure is used .

(8)

These formulae require the full specification of the distribution of cluster sizes. Alternative estimates of the design effects for CRTs with unequal cluster size can also be used, for example, using the harmonic mean of the cluster sizes,³⁸ or finding the design effect for the corresponding trial with equal cluster sizes and multiplying by the relative efficiency of the trial with equal cluster sizes compared to that with unequal cluster sizes.^33,34 An upper bound for the sample size required can be obtained by using $F = 1 + [({CV}_{m}^{2} + 1) \bar{m} - 1] ρ$ , which we denote by F_B, where ${CV}_{m} = \frac{σ_{m}}{\bar{m}}$ and σ_m is the standard deviation of the cluster sizes.^8,10 The methods presented in Sections 3.3 and 3.4 can be used with any of these design effect or relative efficiency measures. In the examples that follow, we will use F_A and F_B to simplify the results and ease interpretability and because these are commonly used by investigators when the exact distribution of cluster sizes is unknown a priori.¹⁰

3.3. |. Sample Size Estimation for Stratified CRTs

By way of background, we note that both Donner and Klar,⁹ and Hayes and Moulton⁷ have considered approaches for sample size estimation in stratified CRTs but not in the context of analysis using GEEs and under somewhat more restrictive assumptions than we consider. Donner and Klar provide a formula for sample size estimation for stratified analyses assuming that within each stratum there is a constant cluster size (i.e., for each s, m_si = m_s for i = 1, … , n_s), and that the ICC is constant across treatment arms and strata (i.e., ρ_1s = ρ_0s = ρ*).⁹ Hayes and Moulton provide an alternative formula that uses the coefficient of variation, $k_{s} = \frac{σ_{B 0 s}}{π_{0 s}}$ where $σ_{B 0 s} = \sqrt{V a r [π_{0 ℓ} | ℓ in stratum s]}$ —the standard deviation of the cluster-specific event probabilities within stratum s—and π_0s is the within-stratum probability of a subject experiencing the event under control in stratum s as defined above. They allow k_s to vary across strata but require a constant cluster size across and within strata.⁷

We propose here a more general approach for sample size determination that allows for both within-stratum ICCs and within-stratum cluster size distributions that may vary across strata. This approach is based on using a weighted average of within-stratum GEE estimators to estimate the treatment effect. It is very flexible when estimates of the within-stratum parameters are available and can be used with relatively few assumptions when such parameters are not available.

A within-stratum estimate of the log-odds ratio for treatment compared to control, ${\hat{β}}_{s}$ , is calculated for each stratum s using the observed proportions ${\hat{π}}_{1 s}$ and ${\hat{π}}_{0 s}$ . Denote by F_s the value of the design effect for stratum s. This can be given by within-stratum versions of equations (7) and (8) when exchangeable and independent working correlation structures, respectively, are used in the GEE analysis within each stratum, or by one of the simplified design effects (F_A or F_B) commonly used. The variance of ${\hat{β}}_{s}$ in large samples is then given by:

Var ({\hat{β}}_{s}) = \frac{F_{s}}{n_{s} {\bar{m}}_{s}} (\frac{2}{π_{0 s} (1 - π_{0 s})} + \frac{2}{π_{1 s} (1 - π_{1 s})}) .

(9)

Similar to the approach used by Gail, we compute an overall estimator of the common within-stratum treatment effect, ${\hat{β}}^{*}$ , as a weighted average of the individual within-stratum treatment effects, ${\hat{β}}_{s}$ .³⁶ The estimator of ${\hat{β}}^{*}$ with minimal variance uses weights equal to the inverse variance of the within-stratum estimators³⁹:

{\hat{β}}^{*} = \frac{\sum_{s = 1}^{S} \frac{1}{Var ({\hat{β}}_{s})} {\hat{β}}_{s}}{\sum_{s = 1}^{S} \frac{1}{Var ({\hat{β}}_{s})}}, with large-sample variance Var ({\hat{β}}^{*}) = {(\sum_{s = 1}^{S} \frac{1}{V a r ({\hat{β}}_{s})})}^{- 1} .

(10)

To test the hypothesis H₀ : β* = 0 versus any alternative H _A : β* = b* ≠ 0, we perform a two-sided z-test. For an α-level test with power 1 − γ to detect a hypothesized effect size b* on the log-odds ratio scale requires a sample size of:

\begin{array}{l} N_{C RT (S)} = \bar{m} \frac{{(Z_{α / 2} + Z_{γ})}^{2}}{{(b^{*})}^{2}} {(\frac{1}{N} \sum_{s = 1}^{S} \frac{1}{Var ({\hat{β}}_{s})})}^{- 1} = \bar{m} \frac{{(Z_{α / 2} + Z_{γ})}^{2}}{{(b^{*})}^{2}} {[\sum_{s = 1}^{S} \frac{n_{s} {\bar{m}}_{s}}{N F_{s}} {(\frac{2}{π_{0 s} (1 - π_{0 s})} + \frac{2}{π_{1 s} (1 - π_{1 s})})}^{- 1}]}^{- 1} \\ = \bar{m} \frac{{(Z_{α / 2} + Z_{γ})}^{2}}{{(b^{*})}^{2}} {[\sum_{s = 1}^{S} \frac{f_{s} \bar{m}}{F_{s}} {(\frac{2}{π_{0 s} (1 - π_{0 s})} + \frac{2}{π_{1 s} (1 - π_{1 s})})}^{- 1}]}^{- 1} \\ = \frac{{(Z_{α / 2} + Z_{γ})}^{2}}{{(b^{*})}^{2}} {[\sum_{s = 1}^{S} \frac{f_{s}}{F_{s}} {(\frac{2}{π_{0 s} (1 - π_{0 s})} + \frac{2}{π_{1 s} (1 - π_{1 s})})}^{- 1}]}^{- 1} . \end{array}

(11)

Note that, like N_{C RT}, N_{C RT(S)} is the total number of subjects, not clusters, required. For each s = 1, …, S, π_1s and π_0s are related by the hypothesized log-odds ratio b* via the formula $π_{1 s} = [\exp (b^{*}) π_{0 s}] / [1 - π_{0 s} + \exp (b^{*}) π_{0 s}]$ .

If both the within-stratum ICCs and the within-stratum cluster size distributions are constant over strata and so the within-stratum design effects do not vary across strata, so that F_s = F* for every stratum s, then this simplifies to:

N_{C RT (S)} = F^{*} \frac{{(Z_{α / 2} + Z_{γ})}^{2}}{{(b^{*})}^{2}} {[\sum_{s = 1}^{S} f_{s} {(\frac{2}{π_{0 s} (1 - π_{0 s})} + \frac{2}{π_{1 s} (1 - π_{1 s})})}^{- 1}]}^{- 1} = F^{*} N_{I RT (S)} .

(12)

In this special case, therefore, the sample size requirement for the stratified CRT is the common within-stratum design effect, F*, times the sample size for the stratified IRT. When cluster size distributions or within-stratum ICC values vary among the strata, however, this is no longer the case and there is no simple overall design effect for the stratified CRT compared to the stratified IRT.

3.4 |. Ratio of Sample Size for Comparably-Powered Stratified and Unstratified CRTs

3.4.1 |. General Formulae for Ratio of Sample Sizes

Let R_{C RT} = N_{C RT(S)}/N_{C RT} be the ratio of sample sizes required in comparable stratified and unstratified CRTs. To be comparable, the trials will need to detect equivalent stratified and unstratified alternative hypotheses with the same power and Type I error rate, as in the IRT case, and so they must be designed to detect the stratified log-odds ratio b* and overall log-odds ratio b that solve equation (4). As described in Sections 2.4 and 3.1, b* will thus be greater in magnitude than b. In addition, in the CRT case, {F_s : s = 1, … , S} and F must be the within-stratum and overall design effects for corresponding trials, respectively. For design effects parameterized by the ICC, this relationship is explored in Sections 3.4.2 and 3.4.3. Then using equations (11) and (6):

R_{C RT} = {(\frac{b}{b^{*}})}^{2} {\frac{{[F (\frac{1}{π_{0} (1 - π_{0})} + \frac{1}{π_{1} (1 - π_{1})})]}^{- 1}}{\sum_{s = 1}^{S} \frac{f_{s}}{F_{s}} {(\frac{1}{π_{0 s} (1 - π_{0 s})} + \frac{1}{π_{1 s} (1 - π_{1 s})})}^{- 1}}} = {(\frac{b}{b^{*}})}^{2} \frac{{(F V)}^{- 1}}{\sum_{s = 1}^{S} [f_{s} {(F_{s} V_{s})}^{- 1}]} = {(\frac{b}{b^{*}})}^{2} \frac{1}{\sum_{s = 1}^{S} f_{s} (\frac{F}{F_{s}}) (\frac{V}{V_{s}})},

(13)

where V and V_s are as defined in section 2.4.

In the special case where the within-stratum ICCs and the within-stratum cluster size distributions are constant over strata and so the within-stratum design effects are constant over strata, F_s = F* for s = 1, … , S, by using equation (5), this simplifies to:

R_{C RT} = {{(\frac{b}{b^{*}})}^{2} \frac{1}{\sum_{s = 1}^{S} f_{s} (\frac{V}{V_{s}})}} {\frac{F^{*}}{F}} = {R_{I RT}} {Q_{D E}},

(14)

where Q_DE = F*/F. Thus, for the special case when the within-stratum design effects are constant across strata, the ratio R_{C RT} is the product of two terms: R_{I RT}, the ratio of sample size with stratification to sample size without stratification for an IRT with the same effect size and event probabilities; and Q_DE, the ratio of the within-stratum design effect to the design effect without stratification, which reflects the difference between the common within-stratum ICC and the overall ICC. Note that when m_si = 1 for all clusters i in all strata s and so the CRT is in effect an IRT, then F* = F = 1 regardless of the choice of design effect. In this case, Q_DE = 1 and hence, as would be expected, R_{C RT} equals R_{I RT}.

3.4.2 |. Relationship Between Within-Stratum ICCs and the ICC in the Overall Population

Exploring the sample size requirements of stratified versus unstratified CRTs using equations (13) and (14) requires an understanding of the relationship between the within-stratum F_s values and the overall F value. For many design effects, these will be parameterized by the within-stratum ICCs, ρ_0s for s = 1, … , S, and the overall ICC, ρ₀, respectively, and the within-stratum and overall cluster size distributions. In the ideal setting, investigators will have reasonable estimates of stratum-specific ICCs available from prior studies or feasibility studies. When this is not the case, however, under some additional assumptions about the stratum-specific ICCs, investigators can use estimates derived analytically from the overall parameters. Here we present one novel approach to finding stratum-specific ICCs that correspond to a known overall ICC.

To determine the relationship between these ICC values, we make the usual assumption that each subject within a given cluster ℓ has the same probability of experiencing the event, denoted by π_1ℓ = E[Y_ℓj|x_ℓ = 1, ℓ] and π_0ℓ = E[Y_ℓj|x_ℓ = 0, ℓ] under treatment and control, respectively. We further assume that the π_0ℓ and π_1ℓ are independent and identically distributed within each randomized arm (including that they do not vary with cluster size), with mean π₀ = E[π_0ℓ] and (between-cluster) variance $σ_{B 0}^{2} = Var [π_{0 ℓ}]$ for the control arm. Similarly, let π₁ and $σ_{B 1}^{2}$ be the mean and variance, respectively, for the treatment arm. Under these assumptions, the marginal probability of outcome among subjects in clusters assigned control is E[Y_ℓj|x_ℓ = 0] = π₀ with variance Var[Y_ℓj|x_ℓ = 0] = π₀(1 − π₀) and among those in clusters assigned treatment is E[Y_ℓj|x_ℓ = 1] = π₁ with variance Var[Y_ℓj|x_ℓ = 1] = π₁(1 − π₁). Furthermore, under these assumptions, the ICC for the control arm, ρ₀ = Cov[Y_ℓj, Yℓj′|x_ℓ = 0]/ [π₀ (1 − π₀)] for j ≠ j′, will be non-negative and equal to $\frac{σ^{2} B_{0}}{π_{0} (1 - π_{0})}$ .^40,41 Similarly, $ρ_{1} = \frac{σ^{2} B_{1}}{π_{1} (1 - π_{1})}$ .

For deriving the sample size of a CRT, this model is often used assuming a true exchangeable correlation structure. A common additional assumption is that the ICC, ρ, is the same in the treatment and control arms; that is, ρ₁ = ρ₀.^10,11,42,43 We also make this assumption, but note that this implies that if π₁ differs from π₀ (i.e., if there is a treatment effect), then the between-cluster variances $σ_{B 1}^{2}$ and $σ_{B 0}^{2}$ must differ in a corresponding way to achieve ρ₁ = ρ₀.

For the stratified CRT, we define the cluster-specific probability of the event under treatment for cluster i in stratum s as π_1si = E[Y_sij|x_si = 1, s, i] and under control as π_0si = E|Y_sij|x_si = 0, s, i]. By adding a subscript s to the notation defined above for the unstratified CRT, within stratum s, the cluster-specific probabilities of outcome in the control arm are distributed with mean π_0s, and variance $σ_{B 0 s}^{2}$ . And the within-stratum ICC in the control arm for stratum s is given by $ρ_{0 s} = \frac{σ_{B 0 s}^{2}}{π_{0 s} (1 - π_{0 s})}$ . Since π₀ is the marginal probability of experiencing the event, $π_{0} = \sum_{s = 1}^{S} f_{s} π_{0 s}$ by definition. We still assume that the outcome probability for each cluster is independent of cluster size. Importantly, this implies that the cluster size distribution does not vary among the strata if the stratifying variable is predictive of the outcome. Then, the between-cluster variance in the control arm ignoring strata, $σ_{B 0}^{2}$ , can be partitioned as:

\begin{array}{l} σ_{B 0}^{2} = Var [π_{0 ℓ}] = E {Var [π_{0 ℓ} | ℓ in stratum s]} + Var {[π_{0 ℓ} | ℓ in stratum s]} \\ = E [σ_{B 0 s}^{2}] + Var [π_{0 s}] \\ = \sum_{s = 1}^{S} f_{s} [σ_{B 0 s}^{2} + {(π_{0 s} - π_{0})}^{2}] . \end{array}

Therefore:

ρ_{0} = \frac{σ_{B 0}^{2}}{π_{0} (1 - π_{0})} = \frac{\sum_{s = 1}^{S} f_{s} [σ_{B 0}^{2} + {(π_{0 s} - π_{0})}^{2}]}{π_{0} (1 - π_{0})} .

Using the fact that for each s = 1, … , S, $σ_{B 0 s}^{2} = ρ_{0 s} π_{0 s} (1 - π_{0 s})$ , the overall ICC is then given by:

ρ_{0} = \frac{\sum_{s = 1}^{S} f_{s} [ρ_{0 s} π_{0 s} (1 - π_{0 s}) + {(π_{0 s} - π_{0})}^{2}}{π_{0} (1 - π_{0})} .

(15)

It is useful to rewrite equation (15) as:

ρ_{0} = \frac{\sum_{s = 1}^{S} ρ_{0 s} f_{s} π_{0 s} (1 - π_{0 s})}{π_{0} (1 - π_{0})} + \frac{\sum_{s = 1}^{S} f_{s} {(π_{0 s} - π_{0})}^{2}}{π_{0} (1 - π_{0})} .

(16)

The two terms on the right side of this equation show more clearly the contributions to ρ₀ of within-stratum (related to ρ_0s) and between-strata components. Note that the overall ICC, ρ₀, can be zero only when both the within-stratum ICC, ρ_0s, is zero for each stratum s and there is no variability in the within-stratum proportions, π_0s, so that π_0s = π₀ for all s = 1, … , S. Although it may not be of interest in practical settings, it is also useful to note that when ρ_0s = 1 for all s = 1, … , S, then ρ₀ = 1. This can be shown using the fact that $π_{0} = \sum_{s = 1}^{S} f_{s} π_{0 s}$ and $\sum_{s = 1}^{S} f_{s} = 1$ in equation (15).

If the overall ICC, ρ₀, is given, and equal within-stratum ICCs are assumed, so that we can let $ρ_{0 s} = ρ_{0}^{*}$ for s = 1, … , S, then equation (15) can be re-arranged to give:

ρ_{0}^{*} = \frac{ρ_{0} π_{0} (1 - π_{0}) - \sum_{s = 1}^{S} f_{s} {(π_{0 s} - π_{0})}^{2}}{\sum_{s = 1}^{S} f_{s} π_{0 s} {(1 - π_{0 s})}^{2}} .

(17)

Based on this expression, some useful observations can be made. First, and somewhat obviously, if there is no variation in the within-stratum event probabilities (i.e., π_0s, = π₀ for all s = 1, … , S), then $ρ_{0}^{*} = ρ_{0}$ . That is, when the stratifying variable is not predictive of the outcome, the overall and within-stratum ICCs are identical. Second, for given f_s and π_0s for s = 1, … , S, and hence given π₀, it can be seen that $ρ_{0}^{*}$ is an increasing function of ρ₀. Third, for $ρ_{0}^{*} \in [0, 1]$ , which is true for the definition of the ICC we are using, it can be shown that $ρ_{0}^{*} \leq ρ_{0}$ (see Lemma 1 in Appendix A). Fourth, since $ρ_{0}^{*}$ must be non-negative, for a given ρ₀ and π₀, there is a constraint on possible values for f_s and π_0s, for s = 1, … , S such that the numerator of this expression is non-negative. Fifth, as shown in Lemma 2 in Appendix A, the assumption that $ρ_{0 s} = ρ_{0}^{*}$ for all s results in sample size estimates that are approximately conservative for small treatment effects (b* ≈ 0) and/or low-probability events (π_0s ≈ 0), under both of the simplified design effects F_A and F_B.

3.4.3 |. Bounds on the Ratio of Sample Sizes Under Simplifying Assumptions

When, as is common in practice, simplified design effects are used for sample size determination for CRTs,^8,10 those design effects can be used to calculate the ratio of sample sizes for a stratified CRT and corresponding unstratified CRT, R_{C RT}. For the design effect denoted F_A (see section 3.2), assuming that the design effect is constant across strata, i.e., $F_{s} = F_{A}^{*} = 1 + (\bar{m} - 1) ρ_{0}^{*}$ for all s, equation (13) gives:

R_{C RT} = {R_{I RT}} {Q_{D E}} = {R_{I RT}} \frac{1 + (\bar{m} - 1) ρ_{0}^{*}}{1 + (\bar{m} - 1) ρ_{0}} .

(18)

Using $F_{s} = F_{B} = 1 + [({CV}_{m}^{2} + 1) \bar{m} - 1) ρ_{0}^{*}$ for all s instead gives the same result but replacing $\bar{m}$ by $({CV}_{m}^{2} + 1) \bar{m}$ .

This equation provides for a key finding on upper and lower bounds for R_{C RT}. We begin with an upper bound on R_{C RT}. Recognizing that the relationship between $ρ_{0}^{*}$ and ρ₀ is fixed by equation (17), by the ICC definition we are using and Lemma 1 in Appendix A, $0 \leq ρ_{0}^{*} \leq ρ_{0} \leq 1$ . Thus, 0 ≤ Q_DE ≤ 1 since $\bar{m} \geq 1$ . So, under the assumption of a common within-stratum design effect given by $F_{A}^{*}$ or $F_{B}^{*}$ , 0 ≤ R_{C RT} ≤ R_{I RT} < 1.

Now turning to a lower bound on R_{C RT}, we note that $ρ_{0}^{*}$ is an increasing function of ρ₀ and, by definition, $ρ_{0}^{*} \geq 0$ . Thus, for any combination of other parameter values, there is a lower bound on ρ₀: ρ_0,LB such that $ρ_{0}^{*} = 0$ . This value, ρ_0,LB > 0 when the stratifying variable is predictive of the outcome, can be derived for any setting using equation (16) with ρ_0s = 0 for all s. Since Q_DE is an increasing function of ρ₀ (see Lemma 3 in Appendix A), this gives a lower bound for Q_DE. That is, $1 \geq Q_{DE} \geq Q_{DE, LB} = \frac{1}{F_{LB .}}$ where F_LB is the value of F_A or F_B given using ρ₀ = ρ_0,LB. This lower bound depends only on the usual design effect comparing an unstratified CRT to an unstratified IRT. Overall, then, using F_A or F_B and assuming a common within-stratum ICC and within-stratum cluster size distribution, 0 < R_{I RT}/F_LB ≤ R_{C RT} ≤ R_{I RT}.

3.5 |. Ratio of Sample Size in a CRT with Two Strata Versus an Unstratified CRT

3.5.1 |. Scenario with a Common Within-Stratum Design Effect

To illustrate the effects of stratification on sample size estimation for CRTs, we extend the illustrative examples of section 2.5 to CRTs, again using two strata and considering three settings with different overall probabilities of events in the control arm (π₀). As before, we present results for a hypothesized overall log-odds ratio of b = log(0.5) and assume a constant within-stratum treatment effect b* such that equation (4) holds. We assume first that the two strata have the same distribution of cluster sizes and that $ρ_{01} = ρ_{02} = ρ_{0}^{*}$ , so that they also have a common design effect given by F₁ = F₂ = F*. Given values of ρ₀, f₁ and π₀₁, $ρ_{0}^{*}$ is determined through equation (17), subject to the constraint that $ρ_{0}^{*} \geq 0$ .

Table B1 provides an example of how $ρ_{0}^{*}$ varies for ρ₀ = 0.05, ρ₀ = 0.10, and ρ₀ = 0.15 as f₁ (and hence also π₀₂) changes in the setting of a low overall probability of events, π₀ = 0.05, when π₀₁ = 0.02. Dashes are used on the right side of the table to indicate combinations of parameter values for which equation (17) would return a negative value for $ρ_{0}^{*}$ , which is inadmissible for the definition of the ICC being used. Clearly, the reduction in $ρ_{0}^{*}$ is not proportional to ρ₀; in fact, the relative reduction is smaller in magnitude for larger values of ρ₀. These results illustrate that in the setting of a constant within-stratum ICC, stratifying by a variable highly predictive of the outcome can greatly reduce the within-stratum ICC compared to the overall ICC.

TABLE B1.

Common Within-Stratum ICC, $ρ_{0}^{*}$ , for π₀ = 0.05, π₀₁ = 0.02, by f₁ and ρ₀

f₁:	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9

π₀₂:	0.053	0.058	0.063	0.070	0.080	0.095	0.120	0.170	0.320

$ρ_{0}^{*}$ when ρ₀ = 0.05:	0.048	0.045	0.042	0.038	0.032	0.022	0.006	-	-
$ρ_{0}^{*}$ when ρ₀ = 0.10:	0.098	0.096	0.093	0.088	0.083	0.074	0.058	0.026	-
$ρ_{0}^{*}$ when ρ₀ = 0.15:	0.148	0.146	0.143	0.139	0.134	0.125	0.111	0.080	-

Open in a new tab

We now turn to the ratio of sample sizes for a stratified and unstratified CRT. We use here the simplified design effect $F^{*} = F_{A}^{*} = 1 + (\bar{m} - 1) ρ_{0}^{*}$ , so the assumption of a common design effect is equivalent to assuming a common mean cluster size $({\bar{m}}_{1} = {\bar{m}}_{2} = \bar{m})$ and a common within-stratum ICC $(ρ_{01} = ρ_{02} = ρ_{0}^{*})$ . For the main results of this section, we take ρ₀ = 0.10 (this choice is motivated by results from a feasibility study for a tuberculosis prevention CRT where the clusters are households, described in more detail in Section 4).

Figure 2 displays R_{C RT}, the ratio of the sample size required for a stratified CRT to the sample size required for the comparable unstratified CRT, calculated via equations (14) and (17) for various values of f₁ and $\bar{m}$ (for reference, the case of $\bar{m} = 1$ , equivalent to an IRT, is shown by the black line) for ρ₀ = 0.10 and π₀₁ = 0.02 (panel a) or π₀₁ = 0.04 (panel b). This figure uses the low-probability setting of π₀ = 0.05. To give some practical context, for each value of π₀₁, panels c and d show the corresponding values of π₀₂ by f₁ and panels e and f show the corresponding values of $ρ_{0}^{*}$ by f₁. Looking at panels a and b, it is clear that R_{C RT} decreases, and hence the relative reduction in sample size achieved with stratification increases, as f₁ increases, and that this reduction also increases as the mean cluster size increases. Comparing panels a and b, it is also clear that as π₀₁ decreases away from 0.05, R_{C RT} decreases away from a ratio of one, indicating a bigger reduction in sample size due to stratification. The difference between the line for a CRT with fixed $\bar{m}$ versus the line for $\bar{m} = 1$ (i.e., an IRT) at any given f₁ reflects the additional relative design effect due to stratification in a CRT versus an IRT, Q_DE, which is determined by the difference between the within-stratum ICC, $ρ_{0}^{*}$ , and the overall ICC, ρ₀, per equation (17). Since $ρ_{0}^{*}$ decreases as f₁ increases (and hence the separation of π₀₂ from π₀₁ also increases), Q_DE decreases and thus R_{C RT} diverges further from the ratio for an IRT, with a greater effect for larger values of $\bar{m}$ . This effect can lead to practically important reductions in sample size, particularly in CRTs with larger mean cluster sizes. For example, for $\bar{m} = 8$ , π₀ = 0.05, and ρ₀ = 0.10, a stratified CRT with a low-risk stratum of π₀₁ = 0.02 with 70% of the subjects and a high-risk stratum of π₀₂ = 0.12 with the remaining 30% of the subjects can have a sample size 20% lower than the comparably-powered unstratified CRT. For $\bar{m} = 32$ , a stratified CRT with a low-risk stratum of π₀₁ = 0.02 with 58% of the subjects and a high-risk stratum of π₀₂ = 0.09 with the remaining 42% of the subjects can achieve the 20% reduction in sample size as well. Q_DE depends on ρ₀ and the distribution of cluster sizes, so it is the interplay of these factors that affects R_{C RT} as a whole when the design effect is common across strata.

Figure 3 shows parallel results to those in Figure 2 but for the moderate probability setting of π₀ = 0.5. The left column of panels shows results for π₀₁ = 0.40 and the right column shows results for π₀₁ = 0.45. This figure shows the same patterns as in Figure 2, with R_{C RT} decreasing as $ρ_{0}^{*}$ decreases and as the difference between π₀₁ and π₀ (and hence also π₀₂) increases, indicating greater between-strata variability in the outcome. Practically important reductions in sample size from using a stratified design (e.g. a reduction of at least 10%, or R ≤ 0.9) may be more readily achieved for potentially plausible combinations of π₀₁ and f₁ (and hence π₀₂) when π₀ is greater, including at smaller mean cluster sizes (e.g. comparing findings in Figure 3 to those in Figure 2). For example, with π₀ = 0.50, $\bar{m} = 4$ , and ρ₀ = 0.10, a stratified CRT with a low-risk stratum with π₀₁ = 0.40 containing 63% of the subjects and a high-risk stratum with π₀₂ = 0.67 containing the remaining 37% of the subjects leads to a 20% reduction in sample size (i.e., R_{C RT} = 0.80) compared to the comparably-powered unstratified CRT. A similar reduction in sample size can be achieved when $\bar{m} = 8$ by a low-risk stratum of π₀₁ = 0.40 with 52% of the subjects and a high-risk stratum of π₀₂ = 0.61 with the remaining 48% of the subjects.

A similar figure for the high-proportion setting of π₀ = 0.90 is shown in Figure S2. To examine the sensitivity to ρ₀, similar figures are shown for the low-proportion setting of π₀ = 0.05 for ρ₀ = 0.05 and ρ₀ = 0.15; see Figures S3 and S4. For a lower ρ₀, holding all else fixed, the reduction in sample size due to stratification is greater. However, a low ρ₀ limits how predictive the stratifying variable can be of the outcome. As in the IRT case, for all of these settings, the effect of changing b on R_{I RT} is modest in comparison to changing other parameters, except when b is very negative (i.e., the treatment odds ratio is close to zero).

Overall, these results demonstrate that, for CRTs with low cluster sizes in the low-proportion setting, like for IRTs, stratification by a binary variable has a relatively modest effect on required sample size, unless that variable is highly predictive of the outcome. For larger cluster sizes, however, the additional effect of stratification for a CRT can lead to a substantial reduction in sample size due to stratification where an IRT would not see a substantial reduction. In moderate- or high-probability settings, even with small cluster sizes, there can be a substantial effect in sample size due to stratification when the stratifying variable is associated with the outcome. This reduction is greater than the reduction in a similar IRT.

3.5.2 |. Two Strata with Varying Within-Stratum ICC and a Common Cluster Size Distribution

We now relax the assumption of a common design effect across strata, specifically by allowing the within-stratum ICC to vary across strata but retaining the assumption that the cluster size distribution is the same in each stratum. We consider two different values of f₁: 0.50 and 0.75 (for smaller values of f₁ than 0.50, the same patterns hold as for f₁ = 0.50 but with smaller effects on R_{C RT}). We varied ρ₀₁ and calculated ρ₀₂ to ensure that ρ₀ = 0.10, using equation (15); inadmissible combinations of parameters that resulted in ρ₀₂ < 0 are not considered. We continue to assume that within each stratum s, the ICCs in the treatment and control arms are equal; that is, ρ_0s = ρ_1s. We also continue to use the simplified design effect F_A, overall and within each stratum.

Focusing first on the low proportion setting, π₀ = 0.05, panels a and b of Figure 4 show the association between R_{C RT} and ρ₀₁ obtained using equation (13) for f₁ = 0.5 and f₁ = 0.75, respectively. These are plotted for π₀₁ = 0.02 (dotted lines) and π₀₁ = 0.04 (dashed lines) when $\bar{m} = 2$ (orange lines) and $\bar{m} = 8$ (blue lines); note that $\bar{m} = {\bar{m}}_{1} = {\bar{m}}_{2}$ . Panels c and d show how ρ₀₂ varies as ρ₀₁ changes for f₁ = 0.50 and f₁ = 0.75, respectively, with different lines within each panel for π₀₁ = 0.02 or 0.04. In each of the four panels, the square indicates the situation considered in previous sections in which $ρ_{01} = ρ_{02} = ρ_{0}^{*}$ , giving a constant within-stratum design effect.

In panels a and b, when $\bar{m} = 2$ (orange lines), the change in R_{C RT} compared to assuming a common within-stratum ICC across strata, $ρ_{0}^{*}$ , is minimal. For example, for π₀₁ = 0.02 and either of the f₁ values considered, the difference between the highest and lowest values of R_{C RT} among all of the possible ρ₀₁ values is 0.006. When $\bar{m} = 8$ (blue lines in panels a and b), however, the varying within-stratum ICC values can have a big effect on R_{C RT}. For example, for π₀₁ = 0.04, R_{C RT} varies by more than 0.10 between the minimum and maximum values as ρ₀₁ varies when f₁ = 0.50 and by almost 0.20 when f₁ = 0.75. Review of panels a and d shows that when $ρ_{01} < ρ_{0}^{*}$ , the lowest value of R_{C RT} occurs when ρ₀₁ = 0. Conversely, when $ρ_{01} > ρ_{0}^{*}$ , the lowest value of R_{C RT} occurs when ρ₀₂ = 0. The fact that R_{C RT} varies more for values of π₀₁ closer to π₀ (and hence also π₀₂ closer to π₀) reflects the larger possible difference between ρ₀₁ and ρ₀₂ when one of these within-stratum ICCs equals zero. This in turn affects the relative magnitude between the two strata of the within-stratum design effects. Moreover, when $\bar{m}$ is larger, this magnitude is increased and so the effect on R_{C RT} is larger; this is true for larger $\bar{m}$ values than those shown in this plot as well. A general conclusion in the two-strata situation is that stratification will reduce sample size requirements more the greater the difference between the within-stratum ICCs and the effect will be larger for greater mean cluster sizes. Of course, the practical relevance of this variation in R_{C RT} will depend on how different ρ₀₁ and ρ₀₂ might be in practice.

Figure 5 displays the same relationships as Figure 4 but for π₀ = 0.50 and considering π₀₁ = 0.40 (dotted lines) and π₀₁ = 0.45 (dashed lines). The general pattern of effects on R_{C RT} are similar as for π₀ = 0.05, but with lower values of R_{C RT} and greater effects of varying within-stratum ICC values on the required sample sizes. Note that for π₀ = 0.50, π₀₁ = 0.40 and f₁ = 0.75 is an inadmissible combination of parameters, so it is not shown on the plot.

While varying within-stratum ICC values can have a substantial effect on sample size, it may be difficult for trial planners to obtain reliable estimates of within-stratum ICCs. In the contexts considered here, R_{C RT} is closest to 1 at or near the point where $ρ_{01} = ρ_{02} = ρ_{0}^{*}$ , so assuming a common within-stratum ICC across strata is approximately conservative.

3.5.3 |. More than Two Strata

With more than two strata, further reductions in sample size may be possible over and above that achieved with two strata. Take as an example one of the settings discussed in Section 3.5.1. For $\bar{m} = 8$ , π₀ = 0.05, and ρ₀ = 0.10, assuming a common within-stratum design effect, a hypothesized treatment effect of b = log(0.5) and a common within-stratum treatment effect of b* such that equation (4) holds, stratification into a low-risk stratum with π₀₁ = 0.02 containing 70% of the subjects and a high-risk stratum with π₀₂ = 0.12 containing 30% of the subjects leads to a 20% reduction in sample size. Now assume that the high probability stratum can be further divided into two strata, giving three strata overall with π₀₁ = 0.02 containing 70% of subjects, π₀₂ = 0.08 containing 15% of subjects, and π₀₃ = 0.16 containing 15% of subjects. Then the reduction is 25%, so there is a noticeable additional reduction in sample size beyond the two-strata case. In contrast, if it was the low probability stratum of the two strata that could be divided, then using three strata rather than two strata may not achieve much further reduction in sample size. For example, with π₀₁ = 0.01 containing 35% of subjects, π₀₂ = 0.03 containing 35% of subjects, and π₀₃ = 0.12 containing 30% of subjects, the reduction in sample size is 21%, so only marginally better than that achieved with two strata.

As there are more potential parameters for more than two strata, the effect of stratification will be very trial-dependent. Investigators can use the formulae presented here to determine sample size requirements for trials with an arbitrary number of strata, but future work may be useful in providing general guidance on the effect of stratification on sample size for common scenarios with more than two strata. Note also that these methods assume that there are enough clusters for the asymptotic distribution of the GEE estimator to hold within each stratum, which may be less likely to be true with many strata.

3.5.4 |. Using Other Design Effects

Using a different design effect than what we have been using for illustration (i.e. $F_{A, s} = 1 + [{\bar{m}}_{s} - 1] ρ_{0 s}$ ) changes the specific value of R_{C RT} obtained and of the sample sizes more generally, but likely does not change the general pattern of associations between the ratio and the various input parameters. For example, using the conservative design effect that requires specification of the mean cluster size and coefficient of variation of cluster size within each stratum, $F_{B, s} = 1 + [({CV}_{m s}^{2} + 1) {\bar{m}}_{s} - 1] ρ_{0 s}$ , essentially inflates the ${\bar{m}}_{s}$ value by a factor of $({CV}_{m}^{2} + 1)$ in each stratum s compared to the simplified design effect. This will increase the impact on R_{C RT} of mean cluster size and of differences between strata in ρ_0s from our illustrative results. For more complex design effects, like those in equations (7) and (8), which require the specification of the full cluster size distribution in each stratum, estimating whether R_{C RT} will be larger or smaller than that seen in these examples is more complicated. Again, however, the associations between the parameter values (i.e., ρ₀, π₀₁, f₁) and R_{C RT} can still be evaluated through equation (13).

4 |. EXAMPLE: EVALUATING THE SIZE OF A TUBERCULOSIS PREVENTION TRIAL

Here we illustrate the use of these methods to determine the size of a CRT and the potential sample size reduction if stratification is used. This illustrative example is based on an evaluation performed in the context of designing the PHOENIx trial, a CRT being undertaken by the AIDS Clinical Trial Group and the International Maternal Pediatric Adolescent AIDS Clinical Trials Network. The PHOENIx trial will compare two interventions for preventing the development of tuberculosis (TB) disease among household contacts of index patients starting treatment for multidrug-resistant TB disease. Clusters are the contacts in the same household as the index patient who are considered at higher risk of themselves developing TB disease, potentially because of the exposure to the index patient. A feasibility study was conducted to inform the design of the trial.²⁹

The results of the feasibility study suggest estimated parameters for the unstratified trial of π₀ = 0.0645, $\bar{m} = 3.01$ , CV_m = 0.75, and ρ₀ = 0.0675. We use the conservative design effect $F = F_{B} = 1 + [({CV}_{m}^{2} + 1) \bar{m} - 1] ρ_{0}$ and size the trial to have significance level α = 0.05 and 90% power (γ = 0.1) to detect a treatment effect of b = log(0.5) on the log-odds ratio scale. Using equation (6) gives a required sample size (ignoring issues such as loss to follow-up) for the unstratified trial of N_{C RT} = 2604 individuals (rounding up).

As an example of the potential reduction in sample size due to stratification, we considered strata defined by whether a household was enrolled at a site in South Africa (where half of contacts in the feasibility study were enrolled), versus elsewhere. From the feasibility study, the estimated parameters for the South African stratum, s = 1, were: f₁ = 0.50, π₀₁ = 0.085, ${\bar{m}}_{1} = 3.4$ , CV_m1 = 0.76, and ρ₀₁ = 0.044. For the non-South African stratum, s = 2, the estimated parameters were: f₂ = 0.50, π₀₂ = 0.044 (so an event probability approximately half of that among participants in South Africa), ${\bar{m}}_{2} = 2.7$ , CV_m2 = 0.71, and ρ₀₂ = 0.109. Since the design effect parameters are different, we use equation (11), with the same α and γ as before, to size the trial. Using equation (4), we find that the corresponding stratified treatment effect is b* = log(0.498). This then gives a required sample size (again ignoring issues such as loss to follow-up) for the stratified trial of N_{C RT(S)} = 2563 individuals. Thus, in this example, stratification by whether a cluster was enrolled in South Africa or elsewhere had minimal impact on sample size requirement (an estimated 1.6% reduction).

This example illustrates, as expected from our previous results, the limited impact of stratification on the required sample size when the event probability is low and the mean cluster size is small, even for a stratifying covariate that is a reasonably strong predictor of risk (the risk ratio for the South African stratum compared to the other stratum is slightly less than 2). Other settings will yield different results, as shown in the figures.

Investigators can use the R code provided in the online Supporting Information or, for up to three strata, the user-friendly web application at https://leekshaffer.shinyapps.io/stratcrt/ to implement these equations. These methods work well for trials with more strata, as long as the necessary assumptions are met, and for treatments that increase rather than decrease the probability of events. When stratum-specific parameters are not known a priori, a range of estimates can be used to evaluate what sample sizes might be required.

5 |. DISCUSSION

As demonstrated in Section 4, investigators can use the methods presented here, and available in R code in the online Supporting Information, to determine the appropriate size for stratified IRTs and CRTs analyzed using logistic regression (fitted using GEEs for CRTs). These are most useful when stratum-specific parameters can be estimated in advance, but can also be used when these are not well known by making certain assumptions. They are also useful as sensitivity checks to determine the required sample size under a range of different values of these parameters. These developments will enable more precise trial planning, especially for CRTs, where the effect of stratification on sample size has often been ignored.

We have also described situations where stratification will and will not have a practically important effect on the required sample size for a trial. When there is a low overall probability of events, stratification with two strata in IRTs and CRTs with small cluster sizes is unlikely to achieve substantial reductions in the required sample size. When the overall probability of events is moderate or high and the two strata are highly predictive of the outcome, stratification can have a substantial impact even for an IRT or a CRT with small cluster sizes. As the mean cluster size increases, substantial reductions in required sample size can be achieved with stratification, including in situations where there is a low overall probability of events. While the results are shown for a treatment that reduces the probability of event, the same sample size ratios hold when the control and treatment probabilities are swapped and so the alternative hypothesis is that the treatment increases the probability of the event. When the within-stratum ICCs are not assumed to be constant across strata, further reductions in sample size can arise compared to the case of a constant within-stratum ICC, but these additional reductions may be sensitive to the exact ICCs specified.

The methods and results presented here depend on the validity of assumptions needed for use of GEE methods and for sample size estimation when GEE methods are used for analysis.^10,11,13 In particular, they rely on the asymptotic properties of the GEE estimator and further, they rely on these properties holding for the within-stratum GEE estimator for each stratum. Specifically, this means that a minimum number of clusters in each stratum must be reached for the methods presented here to properly size the trial. As noted in Section 1, there are many trials for which these assumptions are reasonable, reaching the proposed number of clusters that make GEE methods reasonable for use. When the assumptions are not reasonable, alternative approaches should be considered.

More work is needed to determine how to incorporate bias-corrected variance estimation into these sample size methods and to determine the impact of applying these methods in the analysis of stratified CRTs, especially when there are more than two strata. One potential approach is a modified degrees-of-freedom adjustment, similar to that considered by Mancl and DeRouen, that adjusts the variance by N/(N − q), where q is the number of parameters in the mean model.¹⁶ Another option might be to find the approximate t-distribution for the distribution within each stratum, as proposed by Pan and Wall, and find the critical values from the linear combination of these f-distributions.^44,45 Future work may also incorporate other variance adjustments proposed for GEEs in light of their relative small-sample properties.^16,46–49

Our determination of the relationship between the within-stratum ICCs and the ICC in the overall unstratified population given by equations (15) and (17), and hence also of the ratio, R_{C RT}, of sample sizes needed with and without stratification, rests on the definition of the ICC used. This definition requires that the outcome probability not depend on the cluster size and hence that the within-stratum distribution of cluster sizes be the same across strata. Further work is needed if this is not the case. However, equation (11) can still be used to calculate the sample size needed for a stratified CRT when the within-stratum distribution of cluster sizes varies among strata if the within-stratum design effects are specified. If corresponding within-stratum and overall design effects can be determined, equation (13) can still be used to calculate the ratio of sample sizes for stratified and unstratified CRTs as well.

For stratified CRTs, the reliability of these sample size calculations is limited by the quality of the estimated parameters, including the within-stratum ICCs. So these methods will perform better when within-stratum ICCs and within-stratum event probabilities are available. Currently, the CONSORT statement encourages investigators to report the ICC (or another measure of the effect of clustering).³² More detailed guidelines on how to report ICCs have been suggested, including that any covariate adjustments used in the ICC estimation should be discussed.⁵⁰ However, at this time, it is still difficult to find within-stratum estimates of the effect of clustering. ⁷ Investigators conducting stratified CRTs should report estimated within-stratum ICCs when possible for use in the planning of future trials. The same is true of the estimated cluster size distributions used in the design effect calculations. Frequently, the exact distribution of cluster sizes will be unavailable and so the conservative design effect F_B should be considered as a good option for stratum-specific design effects.

Stratification in randomized trials, whether IRTs or CRTs, can serve a variety of purposes. It can be done out of logistical necessity or to ensure balance of key covariates. In some cases, a stratified trial may better answer the scientific question of interest. Thus, the sample size reductions discussed here should be only one among many factors considered in deciding whether to stratify a trial. These methods will allow sample size to inform that decision-making and allow investigators to better determine the sample size when a decision to stratify has been made.

Supplementary Material

Figure S1

Figure S1. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Individual Randomized Trials, R_{I RT} (Panel a), and the Probability of an Event in the Control Arm of the High-Risk Stratum, π₀₂ (Panel b), versus the Proportion of Individuals in the Low-Risk Stratum (f₁) for High Overall Probability of Events (π₀ = 0.90). π₀₁ is the Probability of an Event in the Control Arm of the Low-Risk Stratum.

NIHMS1584367-supplement-Figure_S1.eps^{(168.3KB, eps)}

Figure S2

Figure S2. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, R_{C RT} (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π₀₂ (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, $ρ_{0}^{*}$ (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f₁) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π₀₁ = 0.80 and π₀₁ = 0.85). Plots are for an ICC in the Unstratified Analysis, ρ₀, of 0.10 and an Overall Probability of an Event in the Control Arm, π₀, of 0.90. The Design Effect Used is $F_{A} = 1 + (\bar{m} - 1) ρ_{0}$ where $\bar{m}$ is the Mean Cluster Size, Assumed to be Constant Over Strata.

NIHMS1584367-supplement-Figure_S2.eps^{(249.3KB, eps)}

Figure S3

Figure S3. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, R_{C RT} (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π₀₂ (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, $ρ_{0}^{*}$ (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f₁) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π₀₁ = 0.02 and π₀₁ = 0.04). Plots are for an ICC in the Unstratified Analysis, ρ₀, of 0.05 and an Overall Probability of an Event in the Control Arm, π₀, of 0.05. The Design Effect Used is $F_{A} = 1 + (\bar{m} - 1] ρ_{0}$ where $\bar{m}$ is the Mean Cluster Size, Assumed to be Constant Over Strata.

NIHMS1584367-supplement-Figure_S3.eps^{(334.5KB, eps)}

Figure S4

Figure S4. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, R_{C RT} (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π₀₂ (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, $ρ_{0}^{*}$ (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f₁) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π₀₁ = 0.02 and π₀₁ = 0.04). Plots are for an ICC in the Unstratified Analysis, ρ₀, of 0.15 and an Overall Probability of an Event in the Control Arm, π₀, of 0.05. The Design Effect Used is $F_{A} = 1 + (\bar{m} - 1) ρ_{0}$ where $\bar{m}$ is the Mean Cluster Size, Assumed to be Constant Over Strata.

NIHMS1584367-supplement-Figure_S4.eps^{(359.3KB, eps)}

R Code

R Code. Program to reproduce figures in this article and to determine sample size for trials using these methods.

NIHMS1584367-supplement-R_Code.r^{(70.7KB, r)}

ACKNOWLEDGEMENTS

Research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases under Award Numbers 5T32AI007358-28 and 1F31AI147745 (for L.K.S.) and Grant Number UM1AI068634 (for M.D.H.). We are grateful to the PHOENIx feasibility study for providing the information needed for our illustrative example. We are grateful to the editor, associate editor, and reviewers for their helpful comments.

APPENDIX

A PROOFS OF LEMMAS

Lemma 1. The common within-stratum ICC ( $ρ_{0}^{*}$ ) is less than or equal to the overall ICC (ρ₀).

Proof. Given that $\sum_{s = 1}^{S} f_{s} = 1$ and $\sum_{s = 1}^{S} f_{s} π_{0 s} = π_{0}$ :

\begin{array}{l} \sum_{s = 1}^{S} f_{s} π_{0 s} (π_{0 s} = π_{0}) = \sum_{s = 1}^{S} f_{s} π_{0 s} (π_{0 s} - \sum_{r = 1}^{S} f_{r} π_{0 r}) \\ = \sum_{s = 1}^{S} f_{s} π_{0 s}^{2} - \sum_{s = 1}^{S} f_{s} π_{0 s} \sum_{r = 1}^{S} f_{r} π_{0 r} = \sum_{s = 1}^{S} f_{s} (1 - f_{s}) π_{0 s}^{2} - \sum_{s = 1}^{S} \sum_{\begin{array}{l} r = 1 \\ r \neq s \end{array}}^{S} f_{s} f_{r} π_{0 s} π_{0 r} \\ = \sum_{s = 1}^{S} [f_{s} (\sum_{\begin{array}{l} r = 1 \\ r \neq s \end{array}}^{S} f_{r}) π_{0 s}^{2}] - \sum_{S = 1}^{S} \sum_{\begin{array}{l} r = 1 \\ r \neq s \end{array}}^{S} f_{s} f_{r} π_{0 s} π_{0 r} = \sum_{S = 1}^{S} \sum_{\begin{array}{l} r = 1 \\ r \neq s \end{array}}^{S} (f_{s} f_{r} π_{π_{0}}^{2} - f_{s} f_{r} π_{0 r}) \\ = \sum_{(r, s) : 1 \leq r \leq s \leq S} f_{s} f_{r} (π_{π_{0}}^{2} - 2 π_{0 s} π_{0 r} + π_{0 r}^{2}) = \sum_{(r, s) : 1 \leq r \leq s \leq S} f_{s} f_{r} {(π_{0 s} - π_{0 r})}^{2} \geq 0. \end{array}

Hence:

\sum_{s = 1}^{S} f_{s} π_{0 s}^{2} \geq \sum_{s = 1}^{S} f_{s} π_{0 s} π_{0} = π_{o}^{2} .

(A1)

And so:

\frac{π_{0} (1 - π_{0})}{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})} = \frac{π_{0} - π_{0}^{2}}{\sum_{s = 1}^{S} f_{s} π_{0 s} - \sum_{s = 1}^{S} f_{s} π_{0 s}^{2}} = \frac{π_{0} - π_{0}^{2}}{π_{0} - \sum_{s = 1}^{S} f_{s} π_{0 s}^{2}} \geq 1.

(A2)

Now, turning to $ρ_{0}^{*}$ , using equation (16) with $ρ_{0 s} = ρ_{0}^{*}$ for s = 1, … , S:

\begin{array}{l} ρ_{0} = ρ_{0}^{*} \frac{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})}{π_{0} (1 - π_{0 s})} + \frac{\sum_{s = 1}^{S} f_{s} {(π_{0 s} - π_{0})}^{2}}{π_{0} (1 - π_{0})} \\ = ρ_{0}^{*} \frac{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})}{π_{0} (1 - π_{0})} + \frac{\sum_{s = 1}^{S} (f_{s} π_{0 s}^{2}) - 2 \sum_{s = 1}^{S} f_{s} π_{0 s} π_{0} + \sum_{s = 1}^{S} f_{s} π_{0}^{2}}{π_{0} (1 - π_{0})} \\ = ρ_{0}^{*} \frac{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})}{π_{0} (1 - π_{0})} + \frac{\sum_{s = 1}^{S} (f_{s} π_{0 s}^{2}) - π_{0 s}^{2}}{π_{0} (1 - π_{0})} \\ = ρ_{0}^{*} \frac{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})}{π_{0} (1 - π_{0})} + \frac{\sum_{s = 1}^{S} (f_{s} π_{0 s}^{2}) - π_{0}}{π_{0} (1 - π_{0})} + \frac{π_{0} - π_{0}^{2}}{π_{0} (1 - π_{0})} \\ = ρ_{0}^{*} \frac{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})}{π_{0} (1 - π_{0})} + \frac{\sum_{s = 1}^{S} (f_{s} π_{0 s}^{2}) - f_{s} π_{0}}{π_{0} (1 - π_{0})} + 1 = (ρ_{0}^{*} - 1) \frac{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})}{π_{0} (1 - π_{0})} + 1 \geq (ρ_{0}^{*} - 1) + 1, since ρ_{0}^{*} \in [0, 1] and \frac{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})}{π_{0} (1 - π_{0})} \leq 1, by (A 2) \\ = ρ_{0}^{*} . \end{array}

Thus $ρ_{0}^{*} \leq ρ_{0}$ . □

Lemma 2. Assume that F_s = c₁ + c₂ρ_0s, where c₁, c₂ are constants that do not vary with s, c₁ > 0, and c₂ > 0. Then the assumption that $ρ_{0 s} = ρ_{0}^{*}$ for all s = 1, … , S results in sample size estimates that are approximately conservative for small treatment effects and/or events with low probability, when $ρ_{0}^{*} > 0$ exists.

Proof. From the definition of V_s, and taking all parameters other than ρ₀₁, …, ρ_0S fixed, we have that:

\begin{array}{l} V_{s} = \frac{1}{π_{0 s} (1 - π_{0 s})} + {[\frac{\exp (b^{*}) π_{0 s}}{1 - π_{0 s} + \exp (b^{*}) π_{0 s}} \frac{1 - π_{0 s}}{1 - π_{0 s} + \exp (b^{*}) π_{0 s}}]}^{- 1} \\ = \frac{1}{π_{0 s} (1 - π_{0 s})} + \frac{{(1 - π_{0 s} + \exp (b^{*}) π_{0 s})}^{2}}{\exp (b^{*}) π_{0 s} (1 - π_{0 s})} \\ = \frac{1 + \exp (- b^{*}) {(1 - π_{0 s} + \exp (b^{*}) π_{0 s})}^{2}}{π_{0 s} (1 - π_{0 s})} \\ \Rightarrow V_{s} π_{0 s} (1 - π_{0 s}) = 1 + \exp (- b^{*}) {(1 - π_{0 s} + \exp (b^{*}) π_{0 s})}^{2} . \end{array}

(A3)

We know the values of ρ₀₁, … , ρ_0S must satisfy the constraint that ensures the overall ICC is ρ₀ given in equations (15) and (16). Now, we seek the values that satisfy this constraint that also maximize the required sample size N_{C RT(S)} given by equation (11):

N_{C RT (S)} = \frac{{(Z_{α / 2} + Z_{γ})}^{2}}{{(b^{*})}^{2}} {[\sum_{s = 1}^{S} \frac{f_{s}}{F_{s}} {(\frac{2}{π_{0 s} (1 - π_{0 s})} + \frac{2}{π_{1 s} (1 - π_{1 s})})}^{- 1}]}^{- 1},

where F_s denotes the design effect of stratum s, which depends on ρ_0s. Maximizing N_{C RT(S)} with respect to ρ₀₁, … ρ_0S. equivalent to minimizing:

h (ρ_{01}, …, ρ_{0 S}) \equiv \sum_{s = 1}^{S} \frac{f_{s}}{F_{s}} {(\frac{1}{π_{0 s} (1 - π_{0 s})} + \frac{1}{π_{1 s} (1 - π_{1 s})})}^{- 1} = \sum_{s = 1}^{S} \frac{f_{s}}{F_{s}} V_{s}^{- 1} .

So we wish to minimize h(ρ₀₁, … , ρ_0S) subject to the constraint that:

0 = g (ρ_{01}, …, ρ_{0 S}) = \sum_{s = 1}^{S} ρ_{0 S} f_{s} π_{0 s} (1 - π_{0 s}) + \sum_{s = 1}^{S} f_{s} {(π_{0 s} - π_{0})}^{2} - ρ_{0} π_{0} (1 - π_{0 s}) .

Under the assumption that F_s = c₁ + c₂ρ_0s—which occurs with the simplified design effects F_A and F_B when the cluster size distribution does not vary with strata—for any s = 1, … , S:

\begin{array}{l} \frac{\partial}{\partial ρ_{0 s}} h = \frac{f_{s}}{V_{s}} \frac{\partial}{\partial ρ_{0 s}} F_{s}^{- 1} = - \frac{f_{s}}{V_{s} F_{s}^{2}} \frac{\partial}{\partial ρ_{0 s}} F_{s} = - \frac{c_{2} f_{s}}{V_{s} F_{s}^{2}} under the simplefied design effect . \\ \frac{\partial}{\partial ρ_{0 s}} g = f_{s} π_{0 s} (1 - π_{0 s}) . \end{array}

We now seek a Lagrange multiplier constant λ s.t. $\frac{\partial h}{\partial ρ_{0 s}} = λ \frac{\partial g}{\partial ρ_{0 s}}$ for all s = 1, … , S. So we consider the ratio for any s:

\frac{\partial h}{\partial ρ_{0 s}} / \frac{\partial g}{\partial ρ_{0 s}} = \frac{- c_{2} f_{s}}{V_{s} F_{s}^{2} f_{s} π_{0 s} (1 - π_{0 s})} = - \frac{c_{2}}{F_{s}^{2} (V_{s} π_{0 s} (1 - π_{0 s}))} = - \frac{c_{2}}{F_{s}^{2} [(1 + \exp (- b^{*}) {(1 - π_{0 s} + \exp (- b^{*}) π_{0 s})}^{2}]} .

(A4)

For this ratio to be constant for all s, then, requires:

F_{s}^{- 2} = c_{3} [1 + \exp (- b^{*}) {(1 - π_{0 s} + \exp (b^{*}) π_{0 s})}^{2}],

(A5)

for some constant c₃ that does not vary with s. Hence the ρ_0s values that are critical points satisfy:

\frac{1}{c_{1}^{2} + 2 c_{1} c_{2} ρ_{0 s} + c_{2}^{2} ρ_{0 s}^{2}} = F_{s}^{- 2} = c_{3} [1 + \exp (- b^{*}) {(1 - π_{0 s} + \exp (b^{*}) π_{0 s})}^{2}] .

(A6)

We can then solve this quadratic equation for ρ_0s, letting $c_{4} = c_{3}^{- 1}$ , to get:

\begin{array}{l} ρ_{0 s} = \frac{- 2 c_{1} c_{2} \pm \sqrt{4 c_{1}^{2} c_{2}^{2} - 4 c_{2}^{2} (c_{1}^{2} - c_{4} {[1 + \exp (b^{*}) {(1 - π_{0 s} + \exp (b^{*}) π_{0 s})}^{2}]}^{- 1})}}{2 c_{2}^{2}} \\ = - \frac{c_{1}}{c_{2}} \pm \sqrt{{(\frac{c_{1}}{c_{2}})}^{2} - \frac{1}{c_{2}^{2}} (c_{1}^{2} - c_{4} {[1 + \exp (b^{*}) {(1 - π_{0 s} + \exp (b^{*}) π_{0 s})}^{2}]}^{- 1})} \\ = \frac{1}{c_{2}} (- c_{1} \pm \sqrt{c_{4} {[1 + \exp (b^{*}) {(1 - π_{0 s} + \exp (b^{*}) π_{0 s})}^{2}]}^{- 1}}) \end{array}

(A7)

for all s = 1, … , S. To satisfy the constraint that ρ_0s ≥ 0 for all s, we will use the greater of the two solutions, which will occur when ± is replaced by +, since the other is always negative. We discuss this constraint further below.

Now, we show that these critical point values of ρ_0s minimize h (and thus maximize N_{C RT(S)}) by performing the second-derivative test for constrained optimization using the bordered Hessian H (λ, ρ₀₁, … , ρ_0S):

H (λ, ρ_{01}, …, ρ_{0 S}) = (\begin{matrix} 0 & - f_{1} π_{01} (1 - π_{01}) & - f_{2} π_{02} (1 - π_{02}) & \dots & - f_{2} π_{0 S} (1 - π_{0 S}) \\ - f_{1} π_{01} (1 - π_{01}) & \frac{2 c_{2}^{2} f_{1}}{V_{1} F_{1}^{3}} & 0 & \dots & 0 \\ - f_{2} π_{02} (1 - π_{02}) & 0 & \frac{2 c_{2}^{2} f_{2}}{V_{2} F_{2}^{3}} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ - f_{2} π_{0 S} (1 - π_{0 S}) & 0 & 0 & \dots & \frac{2 c_{2}^{2} f_{s}}{V_{s} F_{s}^{3}} \end{matrix}) = (\begin{matrix} 0 & B \\ B^{T} & D \end{matrix}),

(A8)

for $B \equiv - (f_{1} π_{01} (1 - π_{01}) … f_{S} π_{0 S} (1 - π_{0 S}) and D \equiv {2c}_{2}^{2} diag (\frac{f_{1}}{V_{1} F_{1}^{3}}, …, \frac{f_{s}}{V_{s} F_{S}^{3}})$ .

For 3 ≤ j ≤ S + 1, we consider H_j, the j × j principal submatrix of H, and write it as $H_{j} = (\begin{matrix} 0 & B_{j} \\ B_{j}^{T} & D_{j} \end{matrix})$ , where B_j is the first (j − 1) elements of B and D_j is the (j − 1) × (j − 1) principal submatrix of D. Then $\det (H_{j}) = \det (D_{j}) \det (- B_{j} D_{j}^{- 1} B_{j}^{T})$ since D_j, a diagonal matrix of nonzero real numbers, is invertible.

\det (D_{j}) = 2^{j - 1} c_{2}^{2 (j - 1)} \prod_{s = 1}^{j - 1} \frac{f_{s}}{V_{s} F_{s}^{3}}, and

(A9)

\det (- B_{j} D_{j}^{- 1} B_{j}^{T}) = - B_{j} D_{j}^{- 1} B_{j}^{T} = - \sum_{s = 1}^{j - 1} \frac{f_{s}^{2} π_{0 s}^{2} {(1 - π_{0 s})}^{2} V_{s} F_{s}^{3}}{2 c_{2}^{2} f_{s}} = - \sum_{s = 1}^{j - 1} \frac{f_{s}^{2} π_{0 s}^{2} {(1 - π_{0 s})}^{2} V_{s} F_{s}^{3}}{2 c_{2}^{2}}, and thus

(A10)

\det (H_{j}) = - 2^{j - 1} c_{2}^{2 (j - 1)} [\prod_{s = 1}^{j - 1} \frac{f_{s}}{V_{s} F_{s}^{3}}] [\sum_{s = 1}^{j - 1} \frac{f_{s}^{2} π_{0 s}^{2} {(1 - π_{0 s})}^{2} V_{s} F_{s}^{3}}{2 c_{2}^{2}}] .

(A11)

Since c₂, f_s, V_s are all positive and F_s is positive for positive values of ρ_0s, the product and the sum are both positive, so det(D_j) < 0 for all j = 3, … , S + 1. So the signs of the determinants of the principal submatrices are all equal to (−1)¹. Since there is one constraint equation g, the second derivative test gives that the set of critical values of ρ_0s derived above is a constrained local minimum of h, and thus a constrained local maximum of N_{C RT(S)}. As no other critical values satisfy the constraints, these are the constrained maximizers of N_{C RT(S)}.

Finally, we consider the limiting behavior of these ρ_0s values for small values of b* and/or π_0s. For any s = 1, … , S:

\lim_{b^{*} \to 0} ρ_{o s} = \frac{1}{c_{2}} (- c_{1} + \sqrt{c_{4} {[1 + {(1 - π_{0 s} + π_{0 s})}^{2}]}^{- 1}}) = \frac{1}{c_{2}} (- c_{1} + \sqrt{\frac{c_{4}}{2}}) .

(A12)

\lim_{π_{0 s} \to 0} ρ_{o s} = \frac{1}{c_{2}} (- c_{1} + \sqrt{c_{4} [1 + \exp (b^{*}) {(1)}^{2})]^{- 1}}) = \frac{1}{c_{2}} (- c_{1} + \sqrt{\frac{c_{4}}{1 + \exp (b^{*})}}) .

(A13)

Most importantly, these values do not depend on s. Thus, when the treatment effect is small and/or the stratum-specific event probabilities are all low, the conservative (N_{C RT(S)}-maximizing) values of ρ_0s, holding all other parameters constant, can be approximated by using the common within-stratum ICC, $ρ_{0}^{*}$ , for all strata. This value of $ρ_{0}^{*}$ must yield the overall ICC ρ₀, as given by equation (17) in Section 3.4.2. When this $ρ_{0}^{*}$ exists and is positive, then there is a neighborhood of b* and/or π_0s around 0 where the critical ρ_0s values are positive for all s, since the critical values are continuous functions of b* and π_0s, for 0 < π_0s, < 1. Hence, under the conditions, the solutions satisfy the additional constraint that ρ_0s > 0 for all s and we can in fact treat $ρ_{0}^{*}$ as an approximation of these N_{C RT(S)}-maximizing critical values. □

Lemma 3. The design effect ratio (Q_DE) is an increasing function of the overall ICC (ρ₀) for design effects F_A and F_B.

Proof. From equation (17) and the definition of Q_DE as used in equation (18):

\begin{array}{l} Q_{D E} = \frac{1 + (\bar{m} - 1) ρ_{0}^{*}}{1 + (\bar{m} - 1) ρ_{0}} = \frac{1 + \frac{\bar{m} - 1}{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})} {[ρ_{0} π_{0} (1 - π_{0}) - \sum_{s = 1}^{S} f_{s} (π_{0 s} - π_{0})}^{2}]}{1 + (\bar{m} - 1) ρ_{0}} \\ = \frac{1}{[1 + (\bar{m} - 1) ρ_{0}] [\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})]} {\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s}) + (\bar{m} - 1) [ρ_{0} π_{0} (1 - π_{0}) - (\sum_{s = 1}^{S} f_{s} π_{0 s}^{2} - π_{0}^{2})]} \\ \equiv A^{- 1} {\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s}) + (\bar{m} - 1) [ρ_{0} π_{0} (1 - π_{0}) + (π_{0}^{2} - \sum_{s = 1}^{S} f_{s} π_{0 s}^{2})]}, where A = [1 + (\bar{m} - 1) ρ_{0}] [\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})], \\ = A^{- 1} {π_{0} - \sum_{s = 1}^{S} f_{s} π_{0 s}^{2} + ρ_{0} (\bar{m} - 1) π_{0} (1 - π_{0}) + (\bar{m} - 1) (π_{0}^{2} - \sum_{s = 1}^{S} f_{s} π_{0 s}^{2})} \\ = A^{- 1} {π_{0} - π_{0}^{2} + \bar{m} π_{0}^{2} - \bar{m} \sum_{s = 1}^{S} f_{s} π_{0 s}^{2} + ρ_{0} (\bar{m} - 1) π_{0} (1 - π_{0})} \\ = \frac{[1 + ρ_{0} (\bar{m} - 1)] π_{0} (1 - π_{0}) + \bar{m} (π_{0}^{2} - \sum_{s = 1}^{S} f_{s} π_{0 s}^{2})}{[1 + ρ_{0} (\bar{m} - 1)] [\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})]} \\ = \frac{π_{0} (1 - π_{0})}{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})} + \frac{\bar{m}}{1 + ρ_{0} (\bar{m} - 1)} \frac{π_{0}^{2} - \sum_{s = 1}^{S} f_{s} π_{0 s}^{2}}{\sum_{s = 1}^{S} f_{s} π_{0 s} (1 - π_{0 s})} \\ \equiv K + C \frac{\bar{m}}{1 + ρ_{0} (\bar{m} - 1)}, where K and C do not depend on ρ_{0} and C \leq 0 by equation (A1) from Lemma 1, \\ = K + Ch (ρ_{0}), where h (ρ_{0}) is a decreasing function of ρ_{0} for ρ_{0} \in [0, 1] . \end{array}

Thus, as ρ₀ increases on [0,1], Ch(ρ₀) increases and so Q_DE increases.

For $F_{B} = 1 + [({CV}_{m}^{2} + 1) \bar{m} - 1) ρ_{0}$ , the same property holds, as can be seen by substituting $({CV}_{m}^{2} + 1) \bar{m}$ for $\bar{m}$ in the proof above.

Footnotes

CONFLICT OF INTEREST

The authors declare no potential conflict of interests.

DATA AVAILABILITY

R code that implements the key formulae presented in this article for any given parameters and R code that generates the results and figures presented throughout the article are available in the online Supporting Information. Additionally, a user-friendly RStudio Shiny web application that implements these formulae for IRTs and CRTs for up to three strata can be accessed at https://leekshaffer.shinyapps.io/stratcrt/.

References

1.Robinson LD, Jewell NP. Some surprising results about covariate adjustment in logistic regression models. Int. Stat. Rev 1991; 59(2): 227–240. [Google Scholar]
2.Demidenko E Sample size determination for logistic regression revisited. Stat. Med 2007; 26(18): 3385–3397. [DOI] [PubMed] [Google Scholar]
3.Hernández AV, Steyerberg EW, Habbema J. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. J. Clin. Epidemiol 2004; 57(5): 454–460. [DOI] [PubMed] [Google Scholar]
4.Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials 2014; 15: 139. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Roozenbeek B, Maas A, Lingsma HF, et al. Baseline characteristics and statistical power in randomized controlled trials: selection, prognostic targeting, or covariate adjustment?. Crit. Care Med 2009; 37(10): 2683–2690. [DOI] [PubMed] [Google Scholar]
6.Eldridge S, Kerry S. A Practical Guide to Cluster Randomised Trials in Health Services Research. London, UK: Wiley; 2012. [Google Scholar]
7.Hayes RJ, Moulton LH. Cluster Randomised Trials. 2nd ed. Boca Raton, FL: CRC Press; 2017. [Google Scholar]
8.Eldridge SM, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int. J. Epidemiol 2006; 35(5): 1292–1300. [DOI] [PubMed] [Google Scholar]
9.Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. London, UK: Wiley; 2000. [Google Scholar]
10.Rutterford C, Copas A, Eldridge S. Methods for sample size determination in cluster randomized trials. Int. J. Epidemiol 2015; 44(3): 1051–1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Pan W Sample size and power calculations with correlated binary data. Control. Clin. Trials 2001; 22(3): 211–227. [DOI] [PubMed] [Google Scholar]
12.Bellamy SL, Gibberd R, Hancock L, et al. Analysis of dichotomous outcome data for community intervention studies. Stat. Methods Med. Res 2000; 9(2): 135–159. [DOI] [PubMed] [Google Scholar]
13.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73(1): 13–22. [Google Scholar]
14.Donner A, Klar N. Statistical considerations in the design and analysis of community intervention trials. J. Clin. Epidemiol 1996; 49(4): 435–439. [DOI] [PubMed] [Google Scholar]
15.Murray DM. Design and Analysis of Group-Randomized Trials. Oxford, UK: Oxford University Press; 1998. [Google Scholar]
16.Mancl LA, DeRouen TA. A covariance estimator for GEE with improved small-sample properties. Biometrics 2001; 57(1): 126–134. [DOI] [PubMed] [Google Scholar]
17.Murray DM, Varnell SP, Blitstein JL. Design and analysis of group-randomized trials: a review of recent methodological developments. Am. J. Public Health 2004; 94(3): 423–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Li P, Redden DT. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Stat. Med 2015; 34(2): 281–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Huang S, Fiero MH, Bell ML. Generalized estimating equations in cluster randomized trials with a small number of clusters: Review of practice and simulation study. Clin. Trials 2016; 13(4): 445–449. [DOI] [PubMed] [Google Scholar]
20.Benger JR, Kirby K, Black S, et al. Effect of a strategy of a supraglottic airway device vs tracheal intubation during out-of-hospital cardiac arrest on functional outcome: the AIRWAYS-2 randomized clinical trial. JAMA 2018; 320(8): 779–791. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Perkins GD, Lall R, Quinn T, et al. Mechanical versus manual chest compression for out-of-hospital cardiac arrest (PARAMEDIC): a pragmatic, cluster randomised controlled trial. Lancet 2015; 385(9972): 947–955. [DOI] [PubMed] [Google Scholar]
22.Choudhry NK, Avorn J, Glynn RJ, et al. Full coverage for preventive medications after myocardial infarction. N. Engl. J. Med 2011; 365(22): 2088–2097. [DOI] [PubMed] [Google Scholar]
23.Engineer CY, Dale E, Agarwal A, et al. Effectiveness of a pay-for-performance intervention to improve maternal and child health services in Afghanistan: a cluster-randomized trial. Int. J. Epidemiol 2016; 45(2): 451–459. [DOI] [PubMed] [Google Scholar]
24.Cowling BJ, Chan KH, Fang VJ, et al. Facemasks and hand hygiene to prevent influenza transmission in households: a cluster randomized trial. Ann. Intern. Med 2009; 151(7): 437–446. [DOI] [PubMed] [Google Scholar]
25.George CM, Monira S, Sack DA, et al. Randomized controlled trial of hospital-based hygiene and water treatment intervention (CHoBI7) to reduce cholera. Emerg. Infect. Dis 2016; 22(2): 233–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Guiteras R, Levinsohn J, Mobarak AM. Encouraging sanitation investment in the developing world: a cluster-randomized trial. Science 2015; 348(6237): 903–906. [DOI] [PubMed] [Google Scholar]
27.Lin A, Ercumen A, Benjamin-Chung J, et al. Effects of water, sanitation, handwashing, and nutritional interventions on child enteric protozoan infections in rural Bangladesh: a cluster-randomized controlled trial. Clin. Infect. Dis 2018; 67(10): 1515–1522. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Theiss-Nyland K, Qadri F, Colin-Jones R, et al. Assessing the impact of a vi-polysaccharide conjugate vaccine in preventing typhoid infection among Bangladeshi children: a protocol for a phase IIIb trial. Clin. Infect. Dis 2019; 68(Supplement 2): S74–S82. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Gupta A, Swindells S, Kim S, et al. Feasibility of Identifying Household Contacts of Rifampin- and Multidrug-Resistant Tuberculosis Cases at High Risk of Progression to Tuberculosis Disease. Clin. Infect. Dis 2019. 10.1093/cid/ciz235. Accessed Nov. 23, 2019. [DOI] [PMC free article] [PubMed]
30.Donner A Sample size requirements for stratified cluster randomization designs. Stat. Med 1992; 11(6): 743–750. [DOI] [PubMed] [Google Scholar]
31.Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat. Med 2002; 21(19): 2917–2930. [DOI] [PubMed] [Google Scholar]
32.Campbell MK, Piaggio G, Elbourne DR, Altman DG, CONSORT Group. Consort 2010 statement: extension to cluster randomised trials. BMJ 2012; 345: e5661. [DOI] [PubMed] [Google Scholar]
33.Breukelen vGJP, Candel MJJM, Berger MPF. Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Stat. Med 2007; 26(13): 2589–2603. [DOI] [PubMed] [Google Scholar]
34.Liu J, Colditz GA. Relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using generalized estimating equation models. Biom. J 2018; 60(3): 616–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Austin PC. A comparison of the statistical power of different methods for the analysis of cluster randomization trials with binary outcomes. Stat. Med 2007; 26(19): 3550–3565. [DOI] [PubMed] [Google Scholar]
36.Gail MH. The determination of sample sizes for trials involving several independent 2×2 tables. J. Chronic Dis 1973; 26(10): 669–673. [DOI] [PubMed] [Google Scholar]
37.Gail MH. Adjusting for covariates that have the same distribution in exposed and unexposed cohorts In: Moolgavkar SH, Prentice RL, eds. Modern Statistical Methods in Chronic Disease Epidemiology. New York, NY: Wiley; 1986: 3–18. [Google Scholar]
38.Hayes RJ, Bennett S. Simple sample size calculation for cluster-randomized trials. Int. J. Epidemiol 1999; 28(2): 319–326. [DOI] [PubMed] [Google Scholar]
39.Cochran WG. The combination of estimates from different experiments. Biometrics 1954; 10(1): 101–129. [Google Scholar]
40.Commenges D, Jacqmin H. The intraclass correlation coefficient: distribution-free definition and test. Biometrics 1994; 50(2): 517–526. [PubMed] [Google Scholar]
41.Eldridge SM, Ukoumunne OC, Carlin JB. The intra-cluster correlation coefficient in cluster randomized trials: a review of definitions. Int. Stat. Rev 2009; 77(3): 378–394. [Google Scholar]
42.Shih WJ. Sample size and power calculations for periodontal and other studies with clustered samples using the method of generalized estimating equations. Biom. J 1997; 39(8): 899–908. [Google Scholar]
43.Thomson A, Hayes R, Cousens S. Measures of between-cluster variability in cluster randomized trials with binary outcomes. Stat. Med 2009; 28(12): 1739–1751. [DOI] [PubMed] [Google Scholar]
44.Pan W, Wall MM. Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Stat. Med 2002; 21(10): 1429–1441. [DOI] [PubMed] [Google Scholar]
45.Walker GA, Saw JG. The distribution of linear combinations of t-variables. J. Am. Stat. Assoc 1978; 73(364): 876–878. [Google Scholar]
46.Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J. Am. Stat. Assoc 2001; 96(456): 1387–1396. [Google Scholar]
47.Fay MP, Graubard BI. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics 2001; 57(4): 1198–1206. [DOI] [PubMed] [Google Scholar]
48.Morel JG, Bokossa MC, Neerchal NK. Small sample correction for the variance of GEE estimators. Biom. J 2003; 45(4): 395–409. [Google Scholar]
49.Leyrat C, Morgan KE, Leurent B, Kahan BC. Cluster randomized trials with a small number of clusters: which analyses should be used?. Int. J. Epidemiol 2018; 47(1): 321–331. [DOI] [PubMed] [Google Scholar]
50.Campbell MK, Grimshaw JM, Elbourne DR. Intracluster correlation coefficients in cluster randomized trials: empirical insights into how should they be reported. BMC Med. Res. Methodol 2004; 4: 9. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1

NIHMS1584367-supplement-Figure_S1.eps^{(168.3KB, eps)}

Figure S2

NIHMS1584367-supplement-Figure_S2.eps^{(249.3KB, eps)}

Figure S3

NIHMS1584367-supplement-Figure_S3.eps^{(334.5KB, eps)}

Figure S4

NIHMS1584367-supplement-Figure_S4.eps^{(359.3KB, eps)}

R Code

R Code. Program to reproduce figures in this article and to determine sample size for trials using these methods.

NIHMS1584367-supplement-R_Code.r^{(70.7KB, r)}

[R1] 1.Robinson LD, Jewell NP. Some surprising results about covariate adjustment in logistic regression models. Int. Stat. Rev 1991; 59(2): 227–240. [Google Scholar]

[R2] 2.Demidenko E Sample size determination for logistic regression revisited. Stat. Med 2007; 26(18): 3385–3397. [DOI] [PubMed] [Google Scholar]

[R3] 3.Hernández AV, Steyerberg EW, Habbema J. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. J. Clin. Epidemiol 2004; 57(5): 454–460. [DOI] [PubMed] [Google Scholar]

[R4] 4.Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials 2014; 15: 139. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Roozenbeek B, Maas A, Lingsma HF, et al. Baseline characteristics and statistical power in randomized controlled trials: selection, prognostic targeting, or covariate adjustment?. Crit. Care Med 2009; 37(10): 2683–2690. [DOI] [PubMed] [Google Scholar]

[R6] 6.Eldridge S, Kerry S. A Practical Guide to Cluster Randomised Trials in Health Services Research. London, UK: Wiley; 2012. [Google Scholar]

[R7] 7.Hayes RJ, Moulton LH. Cluster Randomised Trials. 2nd ed. Boca Raton, FL: CRC Press; 2017. [Google Scholar]

[R8] 8.Eldridge SM, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int. J. Epidemiol 2006; 35(5): 1292–1300. [DOI] [PubMed] [Google Scholar]

[R9] 9.Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. London, UK: Wiley; 2000. [Google Scholar]

[R10] 10.Rutterford C, Copas A, Eldridge S. Methods for sample size determination in cluster randomized trials. Int. J. Epidemiol 2015; 44(3): 1051–1067. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Pan W Sample size and power calculations with correlated binary data. Control. Clin. Trials 2001; 22(3): 211–227. [DOI] [PubMed] [Google Scholar]

[R12] 12.Bellamy SL, Gibberd R, Hancock L, et al. Analysis of dichotomous outcome data for community intervention studies. Stat. Methods Med. Res 2000; 9(2): 135–159. [DOI] [PubMed] [Google Scholar]

[R13] 13.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73(1): 13–22. [Google Scholar]

[R14] 14.Donner A, Klar N. Statistical considerations in the design and analysis of community intervention trials. J. Clin. Epidemiol 1996; 49(4): 435–439. [DOI] [PubMed] [Google Scholar]

[R15] 15.Murray DM. Design and Analysis of Group-Randomized Trials. Oxford, UK: Oxford University Press; 1998. [Google Scholar]

[R16] 16.Mancl LA, DeRouen TA. A covariance estimator for GEE with improved small-sample properties. Biometrics 2001; 57(1): 126–134. [DOI] [PubMed] [Google Scholar]

[R17] 17.Murray DM, Varnell SP, Blitstein JL. Design and analysis of group-randomized trials: a review of recent methodological developments. Am. J. Public Health 2004; 94(3): 423–432. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Li P, Redden DT. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Stat. Med 2015; 34(2): 281–296. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Huang S, Fiero MH, Bell ML. Generalized estimating equations in cluster randomized trials with a small number of clusters: Review of practice and simulation study. Clin. Trials 2016; 13(4): 445–449. [DOI] [PubMed] [Google Scholar]

[R20] 20.Benger JR, Kirby K, Black S, et al. Effect of a strategy of a supraglottic airway device vs tracheal intubation during out-of-hospital cardiac arrest on functional outcome: the AIRWAYS-2 randomized clinical trial. JAMA 2018; 320(8): 779–791. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Perkins GD, Lall R, Quinn T, et al. Mechanical versus manual chest compression for out-of-hospital cardiac arrest (PARAMEDIC): a pragmatic, cluster randomised controlled trial. Lancet 2015; 385(9972): 947–955. [DOI] [PubMed] [Google Scholar]

[R22] 22.Choudhry NK, Avorn J, Glynn RJ, et al. Full coverage for preventive medications after myocardial infarction. N. Engl. J. Med 2011; 365(22): 2088–2097. [DOI] [PubMed] [Google Scholar]

[R23] 23.Engineer CY, Dale E, Agarwal A, et al. Effectiveness of a pay-for-performance intervention to improve maternal and child health services in Afghanistan: a cluster-randomized trial. Int. J. Epidemiol 2016; 45(2): 451–459. [DOI] [PubMed] [Google Scholar]

[R24] 24.Cowling BJ, Chan KH, Fang VJ, et al. Facemasks and hand hygiene to prevent influenza transmission in households: a cluster randomized trial. Ann. Intern. Med 2009; 151(7): 437–446. [DOI] [PubMed] [Google Scholar]

[R25] 25.George CM, Monira S, Sack DA, et al. Randomized controlled trial of hospital-based hygiene and water treatment intervention (CHoBI7) to reduce cholera. Emerg. Infect. Dis 2016; 22(2): 233–241. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Guiteras R, Levinsohn J, Mobarak AM. Encouraging sanitation investment in the developing world: a cluster-randomized trial. Science 2015; 348(6237): 903–906. [DOI] [PubMed] [Google Scholar]

[R27] 27.Lin A, Ercumen A, Benjamin-Chung J, et al. Effects of water, sanitation, handwashing, and nutritional interventions on child enteric protozoan infections in rural Bangladesh: a cluster-randomized controlled trial. Clin. Infect. Dis 2018; 67(10): 1515–1522. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Theiss-Nyland K, Qadri F, Colin-Jones R, et al. Assessing the impact of a vi-polysaccharide conjugate vaccine in preventing typhoid infection among Bangladeshi children: a protocol for a phase IIIb trial. Clin. Infect. Dis 2019; 68(Supplement 2): S74–S82. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Gupta A, Swindells S, Kim S, et al. Feasibility of Identifying Household Contacts of Rifampin- and Multidrug-Resistant Tuberculosis Cases at High Risk of Progression to Tuberculosis Disease. Clin. Infect. Dis 2019. 10.1093/cid/ciz235. Accessed Nov. 23, 2019. [DOI] [PMC free article] [PubMed]

[R30] 30.Donner A Sample size requirements for stratified cluster randomization designs. Stat. Med 1992; 11(6): 743–750. [DOI] [PubMed] [Google Scholar]

[R31] 31.Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat. Med 2002; 21(19): 2917–2930. [DOI] [PubMed] [Google Scholar]

[R32] 32.Campbell MK, Piaggio G, Elbourne DR, Altman DG, CONSORT Group. Consort 2010 statement: extension to cluster randomised trials. BMJ 2012; 345: e5661. [DOI] [PubMed] [Google Scholar]

[R33] 33.Breukelen vGJP, Candel MJJM, Berger MPF. Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Stat. Med 2007; 26(13): 2589–2603. [DOI] [PubMed] [Google Scholar]

[R34] 34.Liu J, Colditz GA. Relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using generalized estimating equation models. Biom. J 2018; 60(3): 616–638. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Austin PC. A comparison of the statistical power of different methods for the analysis of cluster randomization trials with binary outcomes. Stat. Med 2007; 26(19): 3550–3565. [DOI] [PubMed] [Google Scholar]

[R36] 36.Gail MH. The determination of sample sizes for trials involving several independent 2×2 tables. J. Chronic Dis 1973; 26(10): 669–673. [DOI] [PubMed] [Google Scholar]

[R37] 37.Gail MH. Adjusting for covariates that have the same distribution in exposed and unexposed cohorts In: Moolgavkar SH, Prentice RL, eds. Modern Statistical Methods in Chronic Disease Epidemiology. New York, NY: Wiley; 1986: 3–18. [Google Scholar]

[R38] 38.Hayes RJ, Bennett S. Simple sample size calculation for cluster-randomized trials. Int. J. Epidemiol 1999; 28(2): 319–326. [DOI] [PubMed] [Google Scholar]

[R39] 39.Cochran WG. The combination of estimates from different experiments. Biometrics 1954; 10(1): 101–129. [Google Scholar]

[R40] 40.Commenges D, Jacqmin H. The intraclass correlation coefficient: distribution-free definition and test. Biometrics 1994; 50(2): 517–526. [PubMed] [Google Scholar]

[R41] 41.Eldridge SM, Ukoumunne OC, Carlin JB. The intra-cluster correlation coefficient in cluster randomized trials: a review of definitions. Int. Stat. Rev 2009; 77(3): 378–394. [Google Scholar]

[R42] 42.Shih WJ. Sample size and power calculations for periodontal and other studies with clustered samples using the method of generalized estimating equations. Biom. J 1997; 39(8): 899–908. [Google Scholar]

[R43] 43.Thomson A, Hayes R, Cousens S. Measures of between-cluster variability in cluster randomized trials with binary outcomes. Stat. Med 2009; 28(12): 1739–1751. [DOI] [PubMed] [Google Scholar]

[R44] 44.Pan W, Wall MM. Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Stat. Med 2002; 21(10): 1429–1441. [DOI] [PubMed] [Google Scholar]

[R45] 45.Walker GA, Saw JG. The distribution of linear combinations of t-variables. J. Am. Stat. Assoc 1978; 73(364): 876–878. [Google Scholar]

[R46] 46.Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J. Am. Stat. Assoc 2001; 96(456): 1387–1396. [Google Scholar]

[R47] 47.Fay MP, Graubard BI. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics 2001; 57(4): 1198–1206. [DOI] [PubMed] [Google Scholar]

[R48] 48.Morel JG, Bokossa MC, Neerchal NK. Small sample correction for the variance of GEE estimators. Biom. J 2003; 45(4): 395–409. [Google Scholar]

[R49] 49.Leyrat C, Morgan KE, Leurent B, Kahan BC. Cluster randomized trials with a small number of clusters: which analyses should be used?. Int. J. Epidemiol 2018; 47(1): 321–331. [DOI] [PubMed] [Google Scholar]

[R50] 50.Campbell MK, Grimshaw JM, Elbourne DR. Intracluster correlation coefficients in cluster randomized trials: empirical insights into how should they be reported. BMC Med. Res. Methodol 2004; 4: 9. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Sample size estimation for stratified individual and cluster randomized trials with binary outcomes

Lee Kennedy-Shaffer

Michael D Hughes

Abstract

1 |. INTRODUCTION

2 |. STRATIFIED INDIVIDUAL RANDOMIZED TRIALS

2.1 |. Notation for IRTs

2.2 |. Sample Size Estimation for Unstratified IRTs

2.3 |. Sample Size Estimation for Stratified IRTs

2.4 |. Ratio of Sample Size for Comparably-Powered Stratified and Unstratified IRTs

2.5 |. Ratio of Sample Size in an IRT with Two Strata Versus an Unstratified IRT

Figure 1.

3 |. STRATIFIED CLUSTER RANDOMIZED TRIALS

3.1 |. Notation for CRTs

3.2 |. Sample Size Estimation for Unstratified CRTs

3.3. |. Sample Size Estimation for Stratified CRTs

3.4 |. Ratio of Sample Size for Comparably-Powered Stratified and Unstratified CRTs

3.4.1 |. General Formulae for Ratio of Sample Sizes

3.4.2 |. Relationship Between Within-Stratum ICCs and the ICC in the Overall Population

3.4.3 |. Bounds on the Ratio of Sample Sizes Under Simplifying Assumptions

3.5 |. Ratio of Sample Size in a CRT with Two Strata Versus an Unstratified CRT

3.5.1 |. Scenario with a Common Within-Stratum Design Effect

TABLE B1.

Figure 2.

Figure 3.

3.5.2 |. Two Strata with Varying Within-Stratum ICC and a Common Cluster Size Distribution

Figure 4.

Figure 5.

3.5.3 |. More than Two Strata

3.5.4 |. Using Other Design Effects

4 |. EXAMPLE: EVALUATING THE SIZE OF A TUBERCULOSIS PREVENTION TRIAL

5 |. DISCUSSION

Supplementary Material

ACKNOWLEDGEMENTS

APPENDIX

A PROOFS OF LEMMAS

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases