Abstract
Individual randomized trials (IRTs) and cluster randomized trials (CRTs) with binary outcomes arise in a variety of settings and are often analyzed by logistic regression (fitted using generalized estimating equations for CRTs). The effect of stratification on the required sample size is less well understood for trials with binary outcomes than for continuous outcomes. We propose easy-to-use methods for sample size estimation for stratified IRTs and CRTs and demonstrate the use of these methods for a tuberculosis prevention CRT currently being planned. For both IRTs and CRTs, we also identify the ratio of the sample size for a stratified trial versus a comparably-powered unstratified trial, allowing investigators to evaluate how stratification will affect the required sample size when planning a trial. For CRTs, these can be used when the investigator has estimates of the within-stratum intra-cluster correlation coefficients (ICCs) or by assuming a common within-stratum ICC. Using these methods, we describe scenarios where stratification may have a practically important impact on the required sample size. We find that in the two-stratum case, for both IRTs and for CRTs with very small cluster sizes, there are unlikely to be plausible scenarios in which an important sample size reduction is achieved when the overall probability of a subject experiencing the event of interest is low. When the probability of events is not small, or when cluster sizes are large, however, there are scenarios where practically important reductions in sample size result from stratification.
Keywords: sample size, stratification, cluster randomized trials, generalized estimating equations, intracluster correlation coefficient, design effect
1 |. INTRODUCTION
Clinical trials often have binary outcomes, such as death or disease acquisition within a specified time period, as primary endpoints. While methods to analyze binary data are well-known, the effects of stratification on power, efficiency, and sample size are more complicated for binary outcomes than for normally-distributed continuous outcomes. 1,2 Perhaps because of this, use of stratification or covariate adjustment remains inconsistent for trials with binary outcomes.3,4 While some studies have shown an increase in power and consequent decrease in required sample size by stratifying on a covariate predictive of the outcome, easily-used methods to quantify this reduction and properly size a stratified trial are hard to find. 4,5 Simulation studies have shown that the specific parameters matter a great deal in this reduction, with one study finding anywhere from negligible (3%) to large (46%) reductions in sample size for a study that adjusts for covariates compared to a comparably-powered study without adjustment.3
For cluster randomized trials (CRTs), stratification is often proposed to improve balance in covariates between treatment arms or to ameliorate practical challenges of study implementation.6,7 The effects of stratification on sample size and power, however, are less frequently discussed. Methods for determining the required sample size for a CRT often ignore stratification or consider it only with continuous outcomes or in special cases with simple design effects or assumptions about cluster sizes.7–10
In recent years, use of generalized estimating equations (GEEs) for the analysis of CRTs with large numbers of clusters has become more common.6,7,11,12 Robust variance estimators can be used such that the variance is consistently estimated even if the working covariance matrix is misspecified.13 Since they rely on asymptotic properties, GEE methods are not appropriate for trials with a small number of clusters.13 In particular, for binary outcomes, GEE methods with few clusters lead to anti-conservative results, inflating the Type I error.12 This has led to suggestions that GEEs are most preferable for CRTs with a large number of relatively small clusters.7,9,14,15 Various rules of thumb have been suggested for the minimum number of clusters required to use standard GEE methods, ranging from as few as 10 clusters9 to at least 40.16–19 In this paper, we consider sample size methods suitable for CRTs with binary outcomes analyzed using GEE methods; throughout, we assume that there are enough clusters for the asymptotic properties of GEE estimators to hold approximately. While many CRTs are not large enough to be analyzed by stratified GEEs, there have recently been many examples of CRTs with reasonably large numbers of clusters, studying out-of-hospital medical interventions,20,21 health policies,22,23 and especially infectious disease interventions.24–28 This work was originally motivated by the design of a tuberculosis prevention trial, described in the example in Section 4, which plans to enroll about 1600 households as clusters.29
Stratification in both individual randomized trials (IRTs) and CRTs can lead to reductions in the required sample size when the stratification variable is predictive of the outcome.1,10,30,31 In order to properly plan these trials, however, investigators need to be able to determine the sample size for the stratified trial. For CRTs, these sample size calculations must be flexible enough to incorporate varying cluster sizes and design effects suitable for the trial at hand.7,10,32 In considering stratification, investigators must weigh any logistical challenges associated with stratification with the potential benefits of a reduced sample size or increased power. The ratio of the required sample size for a stratified trial versus that for a comparably-powered unstratified trial is useful in determining whether the benefits of stratification outweigh the potential costs. Similar metrics have been discussed in the context of IRTs,1,3 for unequal versus equal cluster sizes in CRTs,33,34 and in simulation studies of CRT analysis methods.35 Here, we present analytic formulae for the sample size required for a stratified trial and the ratio of the sample sizes required for stratified versus unstratified IRTs and CRTs with binary outcomes in the context of stratification by a cluster-level covariate.
In Section 2, we first review a method proposed by Gail for determining the sample size required for stratified IRTs with binary outcomes.36 We then present a novel expression for the ratio of the sample size required for a stratified IRT with that for a comparably powered trial without stratification. We illustrate use of this expression to explore whether practically important reductions in sample size might be achieved by stratifying IRTs for the case of two strata. In Section 3, we develop a new approach, similar to that used for IRTs, for sample size estimation for stratified CRTs with binary outcomes by using a weighted average of within-stratum GEE estimators. We then present an expression, derived from this sizing method, for the ratio of the sample size required for a stratified CRT with that for a comparable CRT without stratification. By considering illustrative examples with commonly-used design effects, we determine settings where stratification may lead to practically important sample size reductions for CRTs and settings where stratification is unlikely to lead to such reductions and trial planning can proceed based on unstratified methods. In Section 4, we illustrate the use of these methods for a planned CRT of a prophylactic tuberculosis drug and describe the implications of these results. Finally, in Section 5, we discuss the utility of these methods as well as limitations and potential areas for further research.
2 |. STRATIFIED INDIVIDUAL RANDOMIZED TRIALS
2.1 |. Notation for IRTs
Consider first an unstratified IRT with N subjects, labeled ℓ = 1, …, N. Assume that the subjects are randomly assigned in equal numbers to either the experimental treatment arm (xℓ = 1) or the control arm (xℓ = 0). Let Yℓ denote the binary outcome of subject ℓ and let π1 = E[Yℓ|xℓ = 1] and π0 = E[Yℓ|xℓ = 0] denote the probability of subject ℓ experiencing the event of interest under treatment and control, respectively. Then the overall treatment effect might be evaluated through the log-odds ratio comparing treatment to control, given by
Now consider this trial with the subjects categorized into S mutually exclusive strata, labeled s = 1, …, S. Stratum s has ns subjects, labeled i = 1, …, ns, where Let denote the proportion of subjects in stratum s. Within each stratum, half of the subjects are randomly assigned to each arm. Let π1s = E[Ysi|xsi = 1, s] and π0s = E[Ysi|xsi = 0, s] be the probability of a subject in stratum s experiencing the event under treatment and control, respectively. By the law of total expectation, the probability of the event for an individual, ignoring stratification, is the weighted sum of the within-stratum probabilities, with weights equal to the proportion of subjects in each stratum. So and . The within-stratum log-odds ratio of treatment for stratum s is given by
2.2 |. Sample Size Estimation for Unstratified IRTs
In an IRT without stratification, the log-odds ratio of treatment can be estimated by where and are the observed proportions of events under treatment and control, respectively. The approximate variance of this estimator in large samples is .36 Hence, the sample size for an IRT without stratification, which we denote by NIRT, required to detect β = b in a two-sided test of H0 : β = 0 versus H A : β = b ≠ 0, with significance level α and power 1 − γ is 36:
(1) |
where Zα/2 and Zγ are the standard normal distribution critical values for upper tail probabilities of α/2 and γ, respectively.
2.3 |. Sample Size Estimation for Stratified IRTs
For stratified IRTs with binary outcomes, sample size estimation can be approached using an inverse-variance weighted estimator, as shown by Gail.36 The log-odds ratio of treatment within stratum s can be estimated by where and are the observed proportions of events within stratum s under treatment and control, respectively. The approximate variance of this estimator for large ns is .36
We assume that the within-stratum log-odds ratio of treatment is constant across all strata, so that βs = β* for all s. Note that when β* ≠ 0 and the stratifying variable is predictive of the outcome, β* will not equal the overall log-odds ratio, β, in the total population due to the non-collapsibility of the odds ratio.37 This, along with the mean-variance relationship, distinguishes the binary outcome setting from the continuous outcome setting.1,4
Now, the minimum variance linear unbiased estimate of β* is given by the inverse-variance weighted estimator36:
(2) |
Hence, the sample size in a stratified IRT, which we denote by NI RT(S), required to detect β* = b* in a two-sided test of H0 : β* = 0 versus H A : β* = b* ≠ 0, with significance level α and power 1 − γ is36:
(3) |
2.4 |. Ratio of Sample Size for Comparably-Powered Stratified and Unstratified IRTs
In this section, we develop a novel expression for the ratio of the sample size required for a stratified IRT to the sample size required for a comparably-powered IRT without stratification. There is a within-stratum treatment effect if and only if there is an overall treatment effect, so β = 0 ⇔ β* = 0. Thus, the hypothesis test of H0 : β = 0 is equivalent to a test of H0 : β* = 0 and vice versa.1 A test of the overall log-odds ratio powered for the alternative hypothesis H A : β = b ≠ 0 corresponds to a test of the conditional log-odds ratio powered for the alternative hypothesis H A : β* = b* ≠ 0 where b and b* solve:
(4) |
There is no closed form formula for b* as a function of b.
A stratified trial with sample size given by NI RT(S) in equation (3) corresponds to an unstratified trial with sample size given by NI RT in equation (1) with the same α and γ, , , and b and b* related by equation (4). Thus, the ratio of the sample size required for a stratified IRT to the sample size required for the comparably powered IRT without stratification, is given by:
(5) |
where and .
Although Robinson and Jewell did not formalize a comparison of sample sizes through a ratio such as RI RT, they did establish results that indicate that RI RT < 1 when the stratifying variable is predictive of outcome.1 Specifically, they showed that under stratified randomization, for β ≠ 0, β will lie between 0 and β*; that is, the overall log-odds ratio β will be closer to zero than the common within-stratum log-odds ratio β*. Without loss of generality, assuming that b < 0, then b* < b < 0 and so the first term in the expression for RI RT in equation (5), (b/b*)2, is less than one. They also showed, however, that the variance of the stratified estimator is higher than the variance of the unstratified estimator, that is that the second term, Overall, they showed that the test of no treatment effect is more powerful for the stratified IRT than the unstratified IRT for a fixed sample size N, indicating that RI RT < 1. This is because β* is sufficiently further from zero than β so as to overcome the increased variance.
2.5 |. Ratio of Sample Size in an IRT with Two Strata Versus an Unstratified IRT
To illustrate the effect of stratification on sample size estimation, we consider an IRT with two strata, a “high-risk” stratum and a “low-risk” stratum, versus a comparably-powered IRT without stratification, in the situation when b = log(0.5). We consider a setting with a low overall probability of events, specifically π0 = 0.05, and a setting with a moderate probability of events, specifically π0 = 0.50. For a hypothesized overall log-odds ratio of b = log(0.5) in the IRT without stratification, panels a and b in Figure 1 show the ratio of sample sizes, RI RT, by f1 for selected values of π01 for π0 = 0.05 and π0 = 0.5, respectively. When the stratifying variable becomes more predictive of the outcome (i.e., when π01 decreases and/or f1 increases, and so π02 must be further from π01 to maintain the same overall π0), RI RT decreases, indicating a greater benefit, in terms of sample size, due to stratification. As a measure of what might be considered practically important reductions in sample size achieved by stratification, combinations of the parameters π01 and f1 with RI RT ≤ 0.90 (i.e. ≥ 10% reduction) are indicated with a solid line in these two panels; combinations achieving smaller reductions are indicated with a dotted line. Panels c and d in Figure 1 then use the same solid and dashed lines to show the values of π02 that correspond to combinations of π01 and f1 (to achieve the specified π0) that do and do not give reductions in sample size of ≥ 10%. R code for all key formulae and for the results and figures presented throughout is available in the online Supporting Information. Additionally, a user-friendly RStudio Shiny web application that implements these formulae for up to three strata can be accessed at https://leekshaffer.shinyapps.io/stratcrt/.
Figure 1.
Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Individual Randomized Trials, RI RT (Panels a and b), and the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panels c and d), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for Low and Moderate Overall Probability of Events (π0 = 0.05 and π0 = 0.50) Where . π01 is the Probability of an Event in the Control Arm of the Low-Risk Stratum. Note: Solid lines indicate combinations of parameters such that RI RT ≤ 0.90; dotted lines indicate that RI RT > 0.90.
For IRTs with π0 = 0.05 (panels a and c), a two-category stratifying variable must be highly predictive of the outcome and f1 reasonably large in order to yield reductions of ≥ 10% in the required sample size. For example, with π01 = 0.01, RI RT is approximately 0.90 when f1 = 0.8 and thus π02 = 0.21, which is likely an unrealistic scenario for the difference between π01 and π02. If π01 = 0.03, then RI RT = 0.90 requires f1 = 0.94 and π02 = 0.36, an even larger risk difference between the two strata. When π0 = 0.5 (panels b and d), there are scenarios which might perhaps be realistic in practice for which stratification would achieve reductions in sample size of ≥ 10%. For example, with π01 = 0.40 and f1 = 0.72, RI RT is approximately 0.90. This requires π02 = 0.76, which may or may not be reasonable depending on the trial context. For a lower π01 = 0.35, the ratio of RI RT = 0.90 is achieved with f1 = 0.55 and π02 = 0.68. In a high-probability setting of π0 = 0.90 (displayed in Figure S1), the upper bound on the high-stratum probability limits the number of settings that can be considered. A ratio of 0.90 can be achieved with f1 = 0.53, π01 = 0.825, and π02 = 0.985, requiring a very high-risk stratum.
For a treatment that reduces the probability of events versus control (so b is negative), the quantity (b/b*) decreases further from one as the treatment effect increases (i.e., b becomes more negative), holding all other parameters constant.37 Thus, RI RT also decreases further from one as the treatment effect increases. That is, as the hypothesized treatment effect becomes stronger, stratification will lead to a greater reduction in required sample size. Conversely, for the case of two strata, our results for b = log(0.5) suggest that stratification when odds ratios are between 0.5 and 1.0 may not, in practice, lead to meaningful reductions in sample size.
The trends in our results accord well with the results from a simulation study conducted by Hernández, Steyerberg, and Habbema.3 As an example, for π0 = 0.50 and b = log(1.4) = 0.34, their simulations indicated that a reduction in sample size of 13.7% (corresponding to RI RT = 0.863) required f1 = 0.50 and an odds ratio for the event with respect to the stratifying variable of 5, which corresponds to π01 = 0.31 and π02 = 0.69.3 Using equation (5) with the same parameters yields RI RT = 0.861, indicating agreement with their simulated results. Our results agree in the other settings discussed in their paper as well, including those presented in their Tables 3 and 4.3 More generally, as we found based on our analytical expression for RI Rt, they found that highly predictive stratification variables were necessary for substantial reductions in sample size, and that decreasing the probability of events reduced the sample size reduction. 3
3 |. STRATIFIED CLUSTER RANDOMIZED TRIALS
3.1 |. Notation for CRTs
Consider now an unstratified CRT with N clusters, labeled ℓ = 1, … , N. Cluster ℓ has mℓ subjects labeled j = 1, … , mℓ. Assume that the clusters are randomly assigned in equal numbers to either the experimental treatment arm (xℓ = 1) or the control arm (xℓ = 0) and that every subject within a cluster receives the same randomized intervention. Let Yℓj denote the binary outcome of subject j in cluster ℓ. Let π1 = E[Yℓj|xℓ = 1] and π0 = E[Yℓj|xℓ = 0] denote the marginal probability of a subject experiencing the event of interest under treatment and control, respectively.
In this paper, we focus on a subject-level analysis of outcomes comparing treatment to control using a marginal logistic regression model, fitted using GEEs, to estimate the marginal log-odds ratio 11 We use the GEE approach because of its flexibility in specifying correlation structures, its ability to handle cases where cluster sizes vary, and its popularity as an approach to analysis in the CRT literature.12,35
Now consider a CRT with S strata, labeled s = 1, … , S. Stratum s has ns clusters, labeled i = 1, … , ns, such that the total number of clusters is . Cluster i in stratum s has msi subjects, labeled j = 1, … , msi. Denote the mean cluster size in stratum s by the overall mean cluster size by , and the proportion of individuals in stratum s by . The proportion of clusters in stratum s is given by . In the special case where the mean cluster size does not vary across the strata, fs = gs for all s. As in the unstratified case, the same treatment is applied to every subject in a given cluster (this treatment is denoted xsi for cluster i in stratum s) and the binary outcome for individual j in cluster i in stratum s is denoted Ysij. Within each stratum, half of the clusters are randomly assigned to each arm. Let π1s = E[Ysij|xsi = 1, s] and π0s, = E[Ysij|xsi = 0, s] denote the within-stratum probability of a subject in stratum experiencing the event of interest under treatment and control, respectively.
As in the IRT setting, we denote the marginal log-odds ratio comparing treatment to control within stratum s by and assume that this is the same across all strata, i.e. βs = β* for all s. Since these overall and stratum-specific parameters are marginal with respect to cluster membership and rely only on the mean specification and not the correlation structure, equation (4) holds as in the IRT case. Thus, when the stratifying variable is associated with the outcome and there is an effect of treatment, we will have 0 < |β| < |β*|. As before, without loss of generality, we take β* < β < 0 for our illustrative examples. Since the following formulae are symmetric to changing the treatment and control designation, and the hypothesized treatment effect appears only in a squared term, these formulae all apply equally well to the case of a treatment that increases the probability of event, where 0 < β < β*.
3.2 |. Sample Size Estimation for Unstratified CRTs
For a general unstratified CRT, the usual approach to sample size estimation is first to determine the required sample size, NI RT, for a similar IRT with the desired significance level and power, and then to obtain the sample size for the CRT, , by multiplying NI RT by a design effect or variance inflation factor.7,9 This factor, which we denote by is generally a function of the cluster sizes and the intracluster correlation coefficient (ICC), denoted by ρ.8,10,11 Then the sample size, in terms of total subjects, required for an unstratified CRT with design effect F, to be analyzed using GEEs, testing H0 : β = 0 vs. H A : β = b ≠ 0 with power 1 − γ and significance level α, is given by:
(6) |
Pan provides formulae for the design effect F for a variety of cases for trials to be analyzed using GEEs, depending on the true and working correlation structures.11 When the cluster sizes are constant (i.e., for all ℓ) and the true correlation structure is exchangeable, , which we denote by FA, irrespective of whether an independent or exchangeable working correlation structure is used. For the case when cluster sizes are not constant, this choice of F underestimates the required sample size in a CRT. When the true correlation structure is exchangeable and the cluster sizes are known, Pan proposes the use of the following formulae:
(7) |
(8) |
These formulae require the full specification of the distribution of cluster sizes. Alternative estimates of the design effects for CRTs with unequal cluster size can also be used, for example, using the harmonic mean of the cluster sizes,38 or finding the design effect for the corresponding trial with equal cluster sizes and multiplying by the relative efficiency of the trial with equal cluster sizes compared to that with unequal cluster sizes.33,34 An upper bound for the sample size required can be obtained by using , which we denote by FB, where and σm is the standard deviation of the cluster sizes.8,10 The methods presented in Sections 3.3 and 3.4 can be used with any of these design effect or relative efficiency measures. In the examples that follow, we will use FA and FB to simplify the results and ease interpretability and because these are commonly used by investigators when the exact distribution of cluster sizes is unknown a priori.10
3.3. |. Sample Size Estimation for Stratified CRTs
By way of background, we note that both Donner and Klar,9 and Hayes and Moulton7 have considered approaches for sample size estimation in stratified CRTs but not in the context of analysis using GEEs and under somewhat more restrictive assumptions than we consider. Donner and Klar provide a formula for sample size estimation for stratified analyses assuming that within each stratum there is a constant cluster size (i.e., for each s, msi = ms for i = 1, … , ns), and that the ICC is constant across treatment arms and strata (i.e., ρ1s = ρ0s = ρ*).9 Hayes and Moulton provide an alternative formula that uses the coefficient of variation, where —the standard deviation of the cluster-specific event probabilities within stratum s—and π0s is the within-stratum probability of a subject experiencing the event under control in stratum s as defined above. They allow ks to vary across strata but require a constant cluster size across and within strata.7
We propose here a more general approach for sample size determination that allows for both within-stratum ICCs and within-stratum cluster size distributions that may vary across strata. This approach is based on using a weighted average of within-stratum GEE estimators to estimate the treatment effect. It is very flexible when estimates of the within-stratum parameters are available and can be used with relatively few assumptions when such parameters are not available.
A within-stratum estimate of the log-odds ratio for treatment compared to control, , is calculated for each stratum s using the observed proportions and . Denote by Fs the value of the design effect for stratum s. This can be given by within-stratum versions of equations (7) and (8) when exchangeable and independent working correlation structures, respectively, are used in the GEE analysis within each stratum, or by one of the simplified design effects (FA or FB) commonly used. The variance of in large samples is then given by:
(9) |
Similar to the approach used by Gail, we compute an overall estimator of the common within-stratum treatment effect, , as a weighted average of the individual within-stratum treatment effects, .36 The estimator of with minimal variance uses weights equal to the inverse variance of the within-stratum estimators39:
(10) |
To test the hypothesis H0 : β* = 0 versus any alternative H A : β* = b* ≠ 0, we perform a two-sided z-test. For an α-level test with power 1 − γ to detect a hypothesized effect size b* on the log-odds ratio scale requires a sample size of:
(11) |
Note that, like NC RT, NC RT(S) is the total number of subjects, not clusters, required. For each s = 1, …, S, π1s and π0s are related by the hypothesized log-odds ratio b* via the formula .
If both the within-stratum ICCs and the within-stratum cluster size distributions are constant over strata and so the within-stratum design effects do not vary across strata, so that Fs = F* for every stratum s, then this simplifies to:
(12) |
In this special case, therefore, the sample size requirement for the stratified CRT is the common within-stratum design effect, F*, times the sample size for the stratified IRT. When cluster size distributions or within-stratum ICC values vary among the strata, however, this is no longer the case and there is no simple overall design effect for the stratified CRT compared to the stratified IRT.
3.4 |. Ratio of Sample Size for Comparably-Powered Stratified and Unstratified CRTs
3.4.1 |. General Formulae for Ratio of Sample Sizes
Let RC RT = NC RT(S)/NC RT be the ratio of sample sizes required in comparable stratified and unstratified CRTs. To be comparable, the trials will need to detect equivalent stratified and unstratified alternative hypotheses with the same power and Type I error rate, as in the IRT case, and so they must be designed to detect the stratified log-odds ratio b* and overall log-odds ratio b that solve equation (4). As described in Sections 2.4 and 3.1, b* will thus be greater in magnitude than b. In addition, in the CRT case, {Fs : s = 1, … , S} and F must be the within-stratum and overall design effects for corresponding trials, respectively. For design effects parameterized by the ICC, this relationship is explored in Sections 3.4.2 and 3.4.3. Then using equations (11) and (6):
(13) |
where V and Vs are as defined in section 2.4.
In the special case where the within-stratum ICCs and the within-stratum cluster size distributions are constant over strata and so the within-stratum design effects are constant over strata, Fs = F* for s = 1, … , S, by using equation (5), this simplifies to:
(14) |
where QDE = F*/F. Thus, for the special case when the within-stratum design effects are constant across strata, the ratio RC RT is the product of two terms: RI RT, the ratio of sample size with stratification to sample size without stratification for an IRT with the same effect size and event probabilities; and QDE, the ratio of the within-stratum design effect to the design effect without stratification, which reflects the difference between the common within-stratum ICC and the overall ICC. Note that when msi = 1 for all clusters i in all strata s and so the CRT is in effect an IRT, then F* = F = 1 regardless of the choice of design effect. In this case, QDE = 1 and hence, as would be expected, RC RT equals RI RT.
3.4.2 |. Relationship Between Within-Stratum ICCs and the ICC in the Overall Population
Exploring the sample size requirements of stratified versus unstratified CRTs using equations (13) and (14) requires an understanding of the relationship between the within-stratum Fs values and the overall F value. For many design effects, these will be parameterized by the within-stratum ICCs, ρ0s for s = 1, … , S, and the overall ICC, ρ0, respectively, and the within-stratum and overall cluster size distributions. In the ideal setting, investigators will have reasonable estimates of stratum-specific ICCs available from prior studies or feasibility studies. When this is not the case, however, under some additional assumptions about the stratum-specific ICCs, investigators can use estimates derived analytically from the overall parameters. Here we present one novel approach to finding stratum-specific ICCs that correspond to a known overall ICC.
To determine the relationship between these ICC values, we make the usual assumption that each subject within a given cluster ℓ has the same probability of experiencing the event, denoted by π1ℓ = E[Yℓj|xℓ = 1, ℓ] and π0ℓ = E[Yℓj|xℓ = 0, ℓ] under treatment and control, respectively. We further assume that the π0ℓ and π1ℓ are independent and identically distributed within each randomized arm (including that they do not vary with cluster size), with mean π0 = E[π0ℓ] and (between-cluster) variance for the control arm. Similarly, let π1 and be the mean and variance, respectively, for the treatment arm. Under these assumptions, the marginal probability of outcome among subjects in clusters assigned control is E[Yℓj|xℓ = 0] = π0 with variance Var[Yℓj|xℓ = 0] = π0(1 − π0) and among those in clusters assigned treatment is E[Yℓj|xℓ = 1] = π1 with variance Var[Yℓj|xℓ = 1] = π1(1 − π1). Furthermore, under these assumptions, the ICC for the control arm, ρ0 = Cov[Yℓj, Yℓj′|xℓ = 0]/ [π0 (1 − π0)] for j ≠ j′, will be non-negative and equal to .40,41 Similarly, .
For deriving the sample size of a CRT, this model is often used assuming a true exchangeable correlation structure. A common additional assumption is that the ICC, ρ, is the same in the treatment and control arms; that is, ρ1 = ρ0.10,11,42,43 We also make this assumption, but note that this implies that if π1 differs from π0 (i.e., if there is a treatment effect), then the between-cluster variances and must differ in a corresponding way to achieve ρ1 = ρ0.
For the stratified CRT, we define the cluster-specific probability of the event under treatment for cluster i in stratum s as π1si = E[Ysij|xsi = 1, s, i] and under control as π0si = E|Ysij|xsi = 0, s, i]. By adding a subscript s to the notation defined above for the unstratified CRT, within stratum s, the cluster-specific probabilities of outcome in the control arm are distributed with mean π0s, and variance . And the within-stratum ICC in the control arm for stratum s is given by . Since π0 is the marginal probability of experiencing the event, by definition. We still assume that the outcome probability for each cluster is independent of cluster size. Importantly, this implies that the cluster size distribution does not vary among the strata if the stratifying variable is predictive of the outcome. Then, the between-cluster variance in the control arm ignoring strata, , can be partitioned as:
Therefore:
Using the fact that for each s = 1, … , S, , the overall ICC is then given by:
(15) |
It is useful to rewrite equation (15) as:
(16) |
The two terms on the right side of this equation show more clearly the contributions to ρ0 of within-stratum (related to ρ0s) and between-strata components. Note that the overall ICC, ρ0, can be zero only when both the within-stratum ICC, ρ0s, is zero for each stratum s and there is no variability in the within-stratum proportions, π0s, so that π0s = π0 for all s = 1, … , S. Although it may not be of interest in practical settings, it is also useful to note that when ρ0s = 1 for all s = 1, … , S, then ρ0 = 1. This can be shown using the fact that and in equation (15).
If the overall ICC, ρ0, is given, and equal within-stratum ICCs are assumed, so that we can let for s = 1, … , S, then equation (15) can be re-arranged to give:
(17) |
Based on this expression, some useful observations can be made. First, and somewhat obviously, if there is no variation in the within-stratum event probabilities (i.e., π0s, = π0 for all s = 1, … , S), then . That is, when the stratifying variable is not predictive of the outcome, the overall and within-stratum ICCs are identical. Second, for given fs and π0s for s = 1, … , S, and hence given π0, it can be seen that is an increasing function of ρ0. Third, for , which is true for the definition of the ICC we are using, it can be shown that (see Lemma 1 in Appendix A). Fourth, since must be non-negative, for a given ρ0 and π0, there is a constraint on possible values for fs and π0s, for s = 1, … , S such that the numerator of this expression is non-negative. Fifth, as shown in Lemma 2 in Appendix A, the assumption that for all s results in sample size estimates that are approximately conservative for small treatment effects (b* ≈ 0) and/or low-probability events (π0s ≈ 0), under both of the simplified design effects FA and FB.
3.4.3 |. Bounds on the Ratio of Sample Sizes Under Simplifying Assumptions
When, as is common in practice, simplified design effects are used for sample size determination for CRTs,8,10 those design effects can be used to calculate the ratio of sample sizes for a stratified CRT and corresponding unstratified CRT, RC RT. For the design effect denoted FA (see section 3.2), assuming that the design effect is constant across strata, i.e., for all s, equation (13) gives:
(18) |
Using for all s instead gives the same result but replacing by .
This equation provides for a key finding on upper and lower bounds for RC RT. We begin with an upper bound on RC RT. Recognizing that the relationship between and ρ0 is fixed by equation (17), by the ICC definition we are using and Lemma 1 in Appendix A, . Thus, 0 ≤ QDE ≤ 1 since . So, under the assumption of a common within-stratum design effect given by or , 0 ≤ RC RT ≤ RI RT < 1.
Now turning to a lower bound on RC RT, we note that is an increasing function of ρ0 and, by definition, . Thus, for any combination of other parameter values, there is a lower bound on ρ0: ρ0,LB such that . This value, ρ0,LB > 0 when the stratifying variable is predictive of the outcome, can be derived for any setting using equation (16) with ρ0s = 0 for all s. Since QDE is an increasing function of ρ0 (see Lemma 3 in Appendix A), this gives a lower bound for QDE. That is, where FLB is the value of FA or FB given using ρ0 = ρ0,LB. This lower bound depends only on the usual design effect comparing an unstratified CRT to an unstratified IRT. Overall, then, using FA or FB and assuming a common within-stratum ICC and within-stratum cluster size distribution, 0 < RI RT/FLB ≤ RC RT ≤ RI RT.
3.5 |. Ratio of Sample Size in a CRT with Two Strata Versus an Unstratified CRT
3.5.1 |. Scenario with a Common Within-Stratum Design Effect
To illustrate the effects of stratification on sample size estimation for CRTs, we extend the illustrative examples of section 2.5 to CRTs, again using two strata and considering three settings with different overall probabilities of events in the control arm (π0). As before, we present results for a hypothesized overall log-odds ratio of b = log(0.5) and assume a constant within-stratum treatment effect b* such that equation (4) holds. We assume first that the two strata have the same distribution of cluster sizes and that , so that they also have a common design effect given by F1 = F2 = F*. Given values of ρ0, f1 and π01, is determined through equation (17), subject to the constraint that .
Table B1 provides an example of how varies for ρ0 = 0.05, ρ0 = 0.10, and ρ0 = 0.15 as f1 (and hence also π02) changes in the setting of a low overall probability of events, π0 = 0.05, when π01 = 0.02. Dashes are used on the right side of the table to indicate combinations of parameter values for which equation (17) would return a negative value for , which is inadmissible for the definition of the ICC being used. Clearly, the reduction in is not proportional to ρ0; in fact, the relative reduction is smaller in magnitude for larger values of ρ0. These results illustrate that in the setting of a constant within-stratum ICC, stratifying by a variable highly predictive of the outcome can greatly reduce the within-stratum ICC compared to the overall ICC.
TABLE B1.
Common Within-Stratum ICC, , for π0 = 0.05, π01 = 0.02, by f1 and ρ0
f1: | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |
π02: | 0.053 | 0.058 | 0.063 | 0.070 | 0.080 | 0.095 | 0.120 | 0.170 | 0.320 |
when ρ0 = 0.05: | 0.048 | 0.045 | 0.042 | 0.038 | 0.032 | 0.022 | 0.006 | - | - |
when ρ0 = 0.10: | 0.098 | 0.096 | 0.093 | 0.088 | 0.083 | 0.074 | 0.058 | 0.026 | - |
when ρ0 = 0.15: | 0.148 | 0.146 | 0.143 | 0.139 | 0.134 | 0.125 | 0.111 | 0.080 | - |
We now turn to the ratio of sample sizes for a stratified and unstratified CRT. We use here the simplified design effect , so the assumption of a common design effect is equivalent to assuming a common mean cluster size and a common within-stratum ICC . For the main results of this section, we take ρ0 = 0.10 (this choice is motivated by results from a feasibility study for a tuberculosis prevention CRT where the clusters are households, described in more detail in Section 4).
Figure 2 displays RC RT, the ratio of the sample size required for a stratified CRT to the sample size required for the comparable unstratified CRT, calculated via equations (14) and (17) for various values of f1 and (for reference, the case of , equivalent to an IRT, is shown by the black line) for ρ0 = 0.10 and π01 = 0.02 (panel a) or π01 = 0.04 (panel b). This figure uses the low-probability setting of π0 = 0.05. To give some practical context, for each value of π01, panels c and d show the corresponding values of π02 by f1 and panels e and f show the corresponding values of by f1. Looking at panels a and b, it is clear that RC RT decreases, and hence the relative reduction in sample size achieved with stratification increases, as f1 increases, and that this reduction also increases as the mean cluster size increases. Comparing panels a and b, it is also clear that as π01 decreases away from 0.05, RC RT decreases away from a ratio of one, indicating a bigger reduction in sample size due to stratification. The difference between the line for a CRT with fixed versus the line for (i.e., an IRT) at any given f1 reflects the additional relative design effect due to stratification in a CRT versus an IRT, QDE, which is determined by the difference between the within-stratum ICC, , and the overall ICC, ρ0, per equation (17). Since decreases as f1 increases (and hence the separation of π02 from π01 also increases), QDE decreases and thus RC RT diverges further from the ratio for an IRT, with a greater effect for larger values of . This effect can lead to practically important reductions in sample size, particularly in CRTs with larger mean cluster sizes. For example, for , π0 = 0.05, and ρ0 = 0.10, a stratified CRT with a low-risk stratum of π01 = 0.02 with 70% of the subjects and a high-risk stratum of π02 = 0.12 with the remaining 30% of the subjects can have a sample size 20% lower than the comparably-powered unstratified CRT. For , a stratified CRT with a low-risk stratum of π01 = 0.02 with 58% of the subjects and a high-risk stratum of π02 = 0.09 with the remaining 42% of the subjects can achieve the 20% reduction in sample size as well. QDE depends on ρ0 and the distribution of cluster sizes, so it is the interplay of these factors that affects RC RT as a whole when the design effect is common across strata.
Figure 2.
Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, RC RT (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π01 = 0.02 and π01 = 0.04). Plots are for an ICC in the Unstratified Analysis, ρ0, of 0.10 and an Overall Probability of an Event in the Control Arm, , of 0.05. The Design Effect Used is where is the Mean Cluster Size, Assumed to be Constant Over Strata.
Figure 3 shows parallel results to those in Figure 2 but for the moderate probability setting of π0 = 0.5. The left column of panels shows results for π01 = 0.40 and the right column shows results for π01 = 0.45. This figure shows the same patterns as in Figure 2, with RC RT decreasing as decreases and as the difference between π01 and π0 (and hence also π02) increases, indicating greater between-strata variability in the outcome. Practically important reductions in sample size from using a stratified design (e.g. a reduction of at least 10%, or R ≤ 0.9) may be more readily achieved for potentially plausible combinations of π01 and f1 (and hence π02) when π0 is greater, including at smaller mean cluster sizes (e.g. comparing findings in Figure 3 to those in Figure 2). For example, with π0 = 0.50, , and ρ0 = 0.10, a stratified CRT with a low-risk stratum with π01 = 0.40 containing 63% of the subjects and a high-risk stratum with π02 = 0.67 containing the remaining 37% of the subjects leads to a 20% reduction in sample size (i.e., RC RT = 0.80) compared to the comparably-powered unstratified CRT. A similar reduction in sample size can be achieved when by a low-risk stratum of π01 = 0.40 with 52% of the subjects and a high-risk stratum of π02 = 0.61 with the remaining 48% of the subjects.
Figure 3.
Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, RC RT (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π01 = 0.40 and π01 = 0.45). Plots are for an ICC in the Unstratified Analysis, ρ0, of 0.10 and an Overall Probability of an Event in the Control Arm, , of 0.50. The Design Effect Used is where is the Mean Cluster Size, Assumed to be Constant Over Strata.
A similar figure for the high-proportion setting of π0 = 0.90 is shown in Figure S2. To examine the sensitivity to ρ0, similar figures are shown for the low-proportion setting of π0 = 0.05 for ρ0 = 0.05 and ρ0 = 0.15; see Figures S3 and S4. For a lower ρ0, holding all else fixed, the reduction in sample size due to stratification is greater. However, a low ρ0 limits how predictive the stratifying variable can be of the outcome. As in the IRT case, for all of these settings, the effect of changing b on RI RT is modest in comparison to changing other parameters, except when b is very negative (i.e., the treatment odds ratio is close to zero).
Overall, these results demonstrate that, for CRTs with low cluster sizes in the low-proportion setting, like for IRTs, stratification by a binary variable has a relatively modest effect on required sample size, unless that variable is highly predictive of the outcome. For larger cluster sizes, however, the additional effect of stratification for a CRT can lead to a substantial reduction in sample size due to stratification where an IRT would not see a substantial reduction. In moderate- or high-probability settings, even with small cluster sizes, there can be a substantial effect in sample size due to stratification when the stratifying variable is associated with the outcome. This reduction is greater than the reduction in a similar IRT.
3.5.2 |. Two Strata with Varying Within-Stratum ICC and a Common Cluster Size Distribution
We now relax the assumption of a common design effect across strata, specifically by allowing the within-stratum ICC to vary across strata but retaining the assumption that the cluster size distribution is the same in each stratum. We consider two different values of f1: 0.50 and 0.75 (for smaller values of f1 than 0.50, the same patterns hold as for f1 = 0.50 but with smaller effects on RC RT). We varied ρ01 and calculated ρ02 to ensure that ρ0 = 0.10, using equation (15); inadmissible combinations of parameters that resulted in ρ02 < 0 are not considered. We continue to assume that within each stratum s, the ICCs in the treatment and control arms are equal; that is, ρ0s = ρ1s. We also continue to use the simplified design effect FA, overall and within each stratum.
Focusing first on the low proportion setting, π0 = 0.05, panels a and b of Figure 4 show the association between RC RT and ρ01 obtained using equation (13) for f1 = 0.5 and f1 = 0.75, respectively. These are plotted for π01 = 0.02 (dotted lines) and π01 = 0.04 (dashed lines) when (orange lines) and (blue lines); note that . Panels c and d show how ρ02 varies as ρ01 changes for f1 = 0.50 and f1 = 0.75, respectively, with different lines within each panel for π01 = 0.02 or 0.04. In each of the four panels, the square indicates the situation considered in previous sections in which , giving a constant within-stratum design effect.
Figure 4.
Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, RC RT (Panels a and b), and the Within-Stratum Intra-Cluster Correlation in the High Risk Stratum, ρ02 (Panels c and d), versus Within-Stratum Intra-Cluster Correlation in the Low Risk Stratum (ρ01) for Two Choices of the Proportion of Individuals in the Low Risk Stratum (f1 = 0.50 and f1 = 0.75). Plots are for an ICC in the Unstratified Analysis, ρ0, of 0.10 and an Overall Probability of an Event in the Control Arm, , of 0.05. π01 is the Probability of an Event in the Control Arm of the Low Risk Stratum. The Design Effect Used is where is the Mean Cluster Size, Assumed to be Constant Over Strata.
In panels a and b, when (orange lines), the change in RC RT compared to assuming a common within-stratum ICC across strata, , is minimal. For example, for π01 = 0.02 and either of the f1 values considered, the difference between the highest and lowest values of RC RT among all of the possible ρ01 values is 0.006. When (blue lines in panels a and b), however, the varying within-stratum ICC values can have a big effect on RC RT. For example, for π01 = 0.04, RC RT varies by more than 0.10 between the minimum and maximum values as ρ01 varies when f1 = 0.50 and by almost 0.20 when f1 = 0.75. Review of panels a and d shows that when , the lowest value of RC RT occurs when ρ01 = 0. Conversely, when , the lowest value of RC RT occurs when ρ02 = 0. The fact that RC RT varies more for values of π01 closer to π0 (and hence also π02 closer to π0) reflects the larger possible difference between ρ01 and ρ02 when one of these within-stratum ICCs equals zero. This in turn affects the relative magnitude between the two strata of the within-stratum design effects. Moreover, when is larger, this magnitude is increased and so the effect on RC RT is larger; this is true for larger values than those shown in this plot as well. A general conclusion in the two-strata situation is that stratification will reduce sample size requirements more the greater the difference between the within-stratum ICCs and the effect will be larger for greater mean cluster sizes. Of course, the practical relevance of this variation in RC RT will depend on how different ρ01 and ρ02 might be in practice.
Figure 5 displays the same relationships as Figure 4 but for π0 = 0.50 and considering π01 = 0.40 (dotted lines) and π01 = 0.45 (dashed lines). The general pattern of effects on RC RT are similar as for π0 = 0.05, but with lower values of RC RT and greater effects of varying within-stratum ICC values on the required sample sizes. Note that for π0 = 0.50, π01 = 0.40 and f1 = 0.75 is an inadmissible combination of parameters, so it is not shown on the plot.
Figure 5.
Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, RC RT (Panels a and b), and the Within-Stratum Intra-Cluster Correlation in the High Risk Stratum, ρ02 (Panels c and d), versus Within-Stratum Intra-Cluster Correlation in the Low Risk Stratum (ρ01) for Two Choices of the Proportion of Individuals in the Low Risk Stratum (f1 = 0.50 and f1 = 0.75). Plots are for an ICC in the Unstratified Analysis, ρ0, of 0.10 and an Overall Probability of an Event in the Control Arm, , of 0.50. π01 is the Probability of an Event in the Control Arm of the Low Risk Stratum. The Design Effect Used is where is the Mean Cluster Size, Assumed to be Constant Over Strata.
While varying within-stratum ICC values can have a substantial effect on sample size, it may be difficult for trial planners to obtain reliable estimates of within-stratum ICCs. In the contexts considered here, RC RT is closest to 1 at or near the point where , so assuming a common within-stratum ICC across strata is approximately conservative.
3.5.3 |. More than Two Strata
With more than two strata, further reductions in sample size may be possible over and above that achieved with two strata. Take as an example one of the settings discussed in Section 3.5.1. For , π0 = 0.05, and ρ0 = 0.10, assuming a common within-stratum design effect, a hypothesized treatment effect of b = log(0.5) and a common within-stratum treatment effect of b* such that equation (4) holds, stratification into a low-risk stratum with π01 = 0.02 containing 70% of the subjects and a high-risk stratum with π02 = 0.12 containing 30% of the subjects leads to a 20% reduction in sample size. Now assume that the high probability stratum can be further divided into two strata, giving three strata overall with π01 = 0.02 containing 70% of subjects, π02 = 0.08 containing 15% of subjects, and π03 = 0.16 containing 15% of subjects. Then the reduction is 25%, so there is a noticeable additional reduction in sample size beyond the two-strata case. In contrast, if it was the low probability stratum of the two strata that could be divided, then using three strata rather than two strata may not achieve much further reduction in sample size. For example, with π01 = 0.01 containing 35% of subjects, π02 = 0.03 containing 35% of subjects, and π03 = 0.12 containing 30% of subjects, the reduction in sample size is 21%, so only marginally better than that achieved with two strata.
As there are more potential parameters for more than two strata, the effect of stratification will be very trial-dependent. Investigators can use the formulae presented here to determine sample size requirements for trials with an arbitrary number of strata, but future work may be useful in providing general guidance on the effect of stratification on sample size for common scenarios with more than two strata. Note also that these methods assume that there are enough clusters for the asymptotic distribution of the GEE estimator to hold within each stratum, which may be less likely to be true with many strata.
3.5.4 |. Using Other Design Effects
Using a different design effect than what we have been using for illustration (i.e. ) changes the specific value of RC RT obtained and of the sample sizes more generally, but likely does not change the general pattern of associations between the ratio and the various input parameters. For example, using the conservative design effect that requires specification of the mean cluster size and coefficient of variation of cluster size within each stratum, , essentially inflates the value by a factor of in each stratum s compared to the simplified design effect. This will increase the impact on RC RT of mean cluster size and of differences between strata in ρ0s from our illustrative results. For more complex design effects, like those in equations (7) and (8), which require the specification of the full cluster size distribution in each stratum, estimating whether RC RT will be larger or smaller than that seen in these examples is more complicated. Again, however, the associations between the parameter values (i.e., ρ0, π01, f1) and RC RT can still be evaluated through equation (13).
4 |. EXAMPLE: EVALUATING THE SIZE OF A TUBERCULOSIS PREVENTION TRIAL
Here we illustrate the use of these methods to determine the size of a CRT and the potential sample size reduction if stratification is used. This illustrative example is based on an evaluation performed in the context of designing the PHOENIx trial, a CRT being undertaken by the AIDS Clinical Trial Group and the International Maternal Pediatric Adolescent AIDS Clinical Trials Network. The PHOENIx trial will compare two interventions for preventing the development of tuberculosis (TB) disease among household contacts of index patients starting treatment for multidrug-resistant TB disease. Clusters are the contacts in the same household as the index patient who are considered at higher risk of themselves developing TB disease, potentially because of the exposure to the index patient. A feasibility study was conducted to inform the design of the trial.29
The results of the feasibility study suggest estimated parameters for the unstratified trial of π0 = 0.0645, , CVm = 0.75, and ρ0 = 0.0675. We use the conservative design effect and size the trial to have significance level α = 0.05 and 90% power (γ = 0.1) to detect a treatment effect of b = log(0.5) on the log-odds ratio scale. Using equation (6) gives a required sample size (ignoring issues such as loss to follow-up) for the unstratified trial of NC RT = 2604 individuals (rounding up).
As an example of the potential reduction in sample size due to stratification, we considered strata defined by whether a household was enrolled at a site in South Africa (where half of contacts in the feasibility study were enrolled), versus elsewhere. From the feasibility study, the estimated parameters for the South African stratum, s = 1, were: f1 = 0.50, π01 = 0.085, , CVm1 = 0.76, and ρ01 = 0.044. For the non-South African stratum, s = 2, the estimated parameters were: f2 = 0.50, π02 = 0.044 (so an event probability approximately half of that among participants in South Africa), , CVm2 = 0.71, and ρ02 = 0.109. Since the design effect parameters are different, we use equation (11), with the same α and γ as before, to size the trial. Using equation (4), we find that the corresponding stratified treatment effect is b* = log(0.498). This then gives a required sample size (again ignoring issues such as loss to follow-up) for the stratified trial of NC RT(S) = 2563 individuals. Thus, in this example, stratification by whether a cluster was enrolled in South Africa or elsewhere had minimal impact on sample size requirement (an estimated 1.6% reduction).
This example illustrates, as expected from our previous results, the limited impact of stratification on the required sample size when the event probability is low and the mean cluster size is small, even for a stratifying covariate that is a reasonably strong predictor of risk (the risk ratio for the South African stratum compared to the other stratum is slightly less than 2). Other settings will yield different results, as shown in the figures.
Investigators can use the R code provided in the online Supporting Information or, for up to three strata, the user-friendly web application at https://leekshaffer.shinyapps.io/stratcrt/ to implement these equations. These methods work well for trials with more strata, as long as the necessary assumptions are met, and for treatments that increase rather than decrease the probability of events. When stratum-specific parameters are not known a priori, a range of estimates can be used to evaluate what sample sizes might be required.
5 |. DISCUSSION
As demonstrated in Section 4, investigators can use the methods presented here, and available in R code in the online Supporting Information, to determine the appropriate size for stratified IRTs and CRTs analyzed using logistic regression (fitted using GEEs for CRTs). These are most useful when stratum-specific parameters can be estimated in advance, but can also be used when these are not well known by making certain assumptions. They are also useful as sensitivity checks to determine the required sample size under a range of different values of these parameters. These developments will enable more precise trial planning, especially for CRTs, where the effect of stratification on sample size has often been ignored.
We have also described situations where stratification will and will not have a practically important effect on the required sample size for a trial. When there is a low overall probability of events, stratification with two strata in IRTs and CRTs with small cluster sizes is unlikely to achieve substantial reductions in the required sample size. When the overall probability of events is moderate or high and the two strata are highly predictive of the outcome, stratification can have a substantial impact even for an IRT or a CRT with small cluster sizes. As the mean cluster size increases, substantial reductions in required sample size can be achieved with stratification, including in situations where there is a low overall probability of events. While the results are shown for a treatment that reduces the probability of event, the same sample size ratios hold when the control and treatment probabilities are swapped and so the alternative hypothesis is that the treatment increases the probability of the event. When the within-stratum ICCs are not assumed to be constant across strata, further reductions in sample size can arise compared to the case of a constant within-stratum ICC, but these additional reductions may be sensitive to the exact ICCs specified.
The methods and results presented here depend on the validity of assumptions needed for use of GEE methods and for sample size estimation when GEE methods are used for analysis.10,11,13 In particular, they rely on the asymptotic properties of the GEE estimator and further, they rely on these properties holding for the within-stratum GEE estimator for each stratum. Specifically, this means that a minimum number of clusters in each stratum must be reached for the methods presented here to properly size the trial. As noted in Section 1, there are many trials for which these assumptions are reasonable, reaching the proposed number of clusters that make GEE methods reasonable for use. When the assumptions are not reasonable, alternative approaches should be considered.
More work is needed to determine how to incorporate bias-corrected variance estimation into these sample size methods and to determine the impact of applying these methods in the analysis of stratified CRTs, especially when there are more than two strata. One potential approach is a modified degrees-of-freedom adjustment, similar to that considered by Mancl and DeRouen, that adjusts the variance by N/(N − q), where q is the number of parameters in the mean model.16 Another option might be to find the approximate t-distribution for the distribution within each stratum, as proposed by Pan and Wall, and find the critical values from the linear combination of these f-distributions.44,45 Future work may also incorporate other variance adjustments proposed for GEEs in light of their relative small-sample properties.16,46–49
Our determination of the relationship between the within-stratum ICCs and the ICC in the overall unstratified population given by equations (15) and (17), and hence also of the ratio, RC RT, of sample sizes needed with and without stratification, rests on the definition of the ICC used. This definition requires that the outcome probability not depend on the cluster size and hence that the within-stratum distribution of cluster sizes be the same across strata. Further work is needed if this is not the case. However, equation (11) can still be used to calculate the sample size needed for a stratified CRT when the within-stratum distribution of cluster sizes varies among strata if the within-stratum design effects are specified. If corresponding within-stratum and overall design effects can be determined, equation (13) can still be used to calculate the ratio of sample sizes for stratified and unstratified CRTs as well.
For stratified CRTs, the reliability of these sample size calculations is limited by the quality of the estimated parameters, including the within-stratum ICCs. So these methods will perform better when within-stratum ICCs and within-stratum event probabilities are available. Currently, the CONSORT statement encourages investigators to report the ICC (or another measure of the effect of clustering).32 More detailed guidelines on how to report ICCs have been suggested, including that any covariate adjustments used in the ICC estimation should be discussed.50 However, at this time, it is still difficult to find within-stratum estimates of the effect of clustering. 7 Investigators conducting stratified CRTs should report estimated within-stratum ICCs when possible for use in the planning of future trials. The same is true of the estimated cluster size distributions used in the design effect calculations. Frequently, the exact distribution of cluster sizes will be unavailable and so the conservative design effect FB should be considered as a good option for stratum-specific design effects.
Stratification in randomized trials, whether IRTs or CRTs, can serve a variety of purposes. It can be done out of logistical necessity or to ensure balance of key covariates. In some cases, a stratified trial may better answer the scientific question of interest. Thus, the sample size reductions discussed here should be only one among many factors considered in deciding whether to stratify a trial. These methods will allow sample size to inform that decision-making and allow investigators to better determine the sample size when a decision to stratify has been made.
Supplementary Material
Figure S1. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Individual Randomized Trials, RI RT (Panel a), and the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panel b), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for High Overall Probability of Events (π0 = 0.90). π01 is the Probability of an Event in the Control Arm of the Low-Risk Stratum.
Figure S2. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, RC RT (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π01 = 0.80 and π01 = 0.85). Plots are for an ICC in the Unstratified Analysis, ρ0, of 0.10 and an Overall Probability of an Event in the Control Arm, π0, of 0.90. The Design Effect Used is where is the Mean Cluster Size, Assumed to be Constant Over Strata.
Figure S3. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, RC RT (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π01 = 0.02 and π01 = 0.04). Plots are for an ICC in the Unstratified Analysis, ρ0, of 0.05 and an Overall Probability of an Event in the Control Arm, π0, of 0.05. The Design Effect Used is where is the Mean Cluster Size, Assumed to be Constant Over Strata.
Figure S4. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, RC RT (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π01 = 0.02 and π01 = 0.04). Plots are for an ICC in the Unstratified Analysis, ρ0, of 0.15 and an Overall Probability of an Event in the Control Arm, π0, of 0.05. The Design Effect Used is where is the Mean Cluster Size, Assumed to be Constant Over Strata.
R Code. Program to reproduce figures in this article and to determine sample size for trials using these methods.
ACKNOWLEDGEMENTS
Research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases under Award Numbers 5T32AI007358-28 and 1F31AI147745 (for L.K.S.) and Grant Number UM1AI068634 (for M.D.H.). We are grateful to the PHOENIx feasibility study for providing the information needed for our illustrative example. We are grateful to the editor, associate editor, and reviewers for their helpful comments.
APPENDIX
A PROOFS OF LEMMAS
Lemma 1. The common within-stratum ICC () is less than or equal to the overall ICC (ρ0).
Proof. Given that and :
Hence:
(A1) |
And so:
(A2) |
Now, turning to , using equation (16) with for s = 1, … , S:
Thus . □
Lemma 2. Assume that Fs = c1 + c2ρ0s, where c1, c2 are constants that do not vary with s, c1 > 0, and c2 > 0. Then the assumption that for all s = 1, … , S results in sample size estimates that are approximately conservative for small treatment effects and/or events with low probability, when exists.
Proof. From the definition of Vs, and taking all parameters other than ρ01, …, ρ0S fixed, we have that:
(A3) |
We know the values of ρ01, … , ρ0S must satisfy the constraint that ensures the overall ICC is ρ0 given in equations (15) and (16). Now, we seek the values that satisfy this constraint that also maximize the required sample size NC RT(S) given by equation (11):
where Fs denotes the design effect of stratum s, which depends on ρ0s. Maximizing NC RT(S) with respect to ρ01, … ρ0S. equivalent to minimizing:
So we wish to minimize h(ρ01, … , ρ0S) subject to the constraint that:
Under the assumption that Fs = c1 + c2ρ0s—which occurs with the simplified design effects FA and FB when the cluster size distribution does not vary with strata—for any s = 1, … , S:
We now seek a Lagrange multiplier constant λ s.t. for all s = 1, … , S. So we consider the ratio for any s:
(A4) |
For this ratio to be constant for all s, then, requires:
(A5) |
for some constant c3 that does not vary with s. Hence the ρ0s values that are critical points satisfy:
(A6) |
We can then solve this quadratic equation for ρ0s, letting , to get:
(A7) |
for all s = 1, … , S. To satisfy the constraint that ρ0s ≥ 0 for all s, we will use the greater of the two solutions, which will occur when ± is replaced by +, since the other is always negative. We discuss this constraint further below.
Now, we show that these critical point values of ρ0s minimize h (and thus maximize NC RT(S)) by performing the second-derivative test for constrained optimization using the bordered Hessian H (λ, ρ01, … , ρ0S):
(A8) |
for .
For 3 ≤ j ≤ S + 1, we consider Hj, the j × j principal submatrix of H, and write it as , where Bj is the first (j − 1) elements of B and Dj is the (j − 1) × (j − 1) principal submatrix of D. Then since Dj, a diagonal matrix of nonzero real numbers, is invertible.
(A9) |
(A10) |
(A11) |
Since c2, fs, Vs are all positive and Fs is positive for positive values of ρ0s, the product and the sum are both positive, so det(Dj) < 0 for all j = 3, … , S + 1. So the signs of the determinants of the principal submatrices are all equal to (−1)1. Since there is one constraint equation g, the second derivative test gives that the set of critical values of ρ0s derived above is a constrained local minimum of h, and thus a constrained local maximum of NC RT(S). As no other critical values satisfy the constraints, these are the constrained maximizers of NC RT(S).
Finally, we consider the limiting behavior of these ρ0s values for small values of b* and/or π0s. For any s = 1, … , S:
(A12) |
(A13) |
Most importantly, these values do not depend on s. Thus, when the treatment effect is small and/or the stratum-specific event probabilities are all low, the conservative (NC RT(S)-maximizing) values of ρ0s, holding all other parameters constant, can be approximated by using the common within-stratum ICC, , for all strata. This value of must yield the overall ICC ρ0, as given by equation (17) in Section 3.4.2. When this exists and is positive, then there is a neighborhood of b* and/or π0s around 0 where the critical ρ0s values are positive for all s, since the critical values are continuous functions of b* and π0s, for 0 < π0s, < 1. Hence, under the conditions, the solutions satisfy the additional constraint that ρ0s > 0 for all s and we can in fact treat as an approximation of these NC RT(S)-maximizing critical values. □
Lemma 3. The design effect ratio (QDE) is an increasing function of the overall ICC (ρ0) for design effects FA and FB.
Proof. From equation (17) and the definition of QDE as used in equation (18):
Thus, as ρ0 increases on [0,1], Ch(ρ0) increases and so QDE increases.
For , the same property holds, as can be seen by substituting for in the proof above.
Footnotes
CONFLICT OF INTEREST
The authors declare no potential conflict of interests.
DATA AVAILABILITY
R code that implements the key formulae presented in this article for any given parameters and R code that generates the results and figures presented throughout the article are available in the online Supporting Information. Additionally, a user-friendly RStudio Shiny web application that implements these formulae for IRTs and CRTs for up to three strata can be accessed at https://leekshaffer.shinyapps.io/stratcrt/.
References
- 1.Robinson LD, Jewell NP. Some surprising results about covariate adjustment in logistic regression models. Int. Stat. Rev 1991; 59(2): 227–240. [Google Scholar]
- 2.Demidenko E Sample size determination for logistic regression revisited. Stat. Med 2007; 26(18): 3385–3397. [DOI] [PubMed] [Google Scholar]
- 3.Hernández AV, Steyerberg EW, Habbema J. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. J. Clin. Epidemiol 2004; 57(5): 454–460. [DOI] [PubMed] [Google Scholar]
- 4.Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials 2014; 15: 139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Roozenbeek B, Maas A, Lingsma HF, et al. Baseline characteristics and statistical power in randomized controlled trials: selection, prognostic targeting, or covariate adjustment?. Crit. Care Med 2009; 37(10): 2683–2690. [DOI] [PubMed] [Google Scholar]
- 6.Eldridge S, Kerry S. A Practical Guide to Cluster Randomised Trials in Health Services Research. London, UK: Wiley; 2012. [Google Scholar]
- 7.Hayes RJ, Moulton LH. Cluster Randomised Trials. 2nd ed. Boca Raton, FL: CRC Press; 2017. [Google Scholar]
- 8.Eldridge SM, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int. J. Epidemiol 2006; 35(5): 1292–1300. [DOI] [PubMed] [Google Scholar]
- 9.Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. London, UK: Wiley; 2000. [Google Scholar]
- 10.Rutterford C, Copas A, Eldridge S. Methods for sample size determination in cluster randomized trials. Int. J. Epidemiol 2015; 44(3): 1051–1067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Pan W Sample size and power calculations with correlated binary data. Control. Clin. Trials 2001; 22(3): 211–227. [DOI] [PubMed] [Google Scholar]
- 12.Bellamy SL, Gibberd R, Hancock L, et al. Analysis of dichotomous outcome data for community intervention studies. Stat. Methods Med. Res 2000; 9(2): 135–159. [DOI] [PubMed] [Google Scholar]
- 13.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986; 73(1): 13–22. [Google Scholar]
- 14.Donner A, Klar N. Statistical considerations in the design and analysis of community intervention trials. J. Clin. Epidemiol 1996; 49(4): 435–439. [DOI] [PubMed] [Google Scholar]
- 15.Murray DM. Design and Analysis of Group-Randomized Trials. Oxford, UK: Oxford University Press; 1998. [Google Scholar]
- 16.Mancl LA, DeRouen TA. A covariance estimator for GEE with improved small-sample properties. Biometrics 2001; 57(1): 126–134. [DOI] [PubMed] [Google Scholar]
- 17.Murray DM, Varnell SP, Blitstein JL. Design and analysis of group-randomized trials: a review of recent methodological developments. Am. J. Public Health 2004; 94(3): 423–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Li P, Redden DT. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Stat. Med 2015; 34(2): 281–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Huang S, Fiero MH, Bell ML. Generalized estimating equations in cluster randomized trials with a small number of clusters: Review of practice and simulation study. Clin. Trials 2016; 13(4): 445–449. [DOI] [PubMed] [Google Scholar]
- 20.Benger JR, Kirby K, Black S, et al. Effect of a strategy of a supraglottic airway device vs tracheal intubation during out-of-hospital cardiac arrest on functional outcome: the AIRWAYS-2 randomized clinical trial. JAMA 2018; 320(8): 779–791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Perkins GD, Lall R, Quinn T, et al. Mechanical versus manual chest compression for out-of-hospital cardiac arrest (PARAMEDIC): a pragmatic, cluster randomised controlled trial. Lancet 2015; 385(9972): 947–955. [DOI] [PubMed] [Google Scholar]
- 22.Choudhry NK, Avorn J, Glynn RJ, et al. Full coverage for preventive medications after myocardial infarction. N. Engl. J. Med 2011; 365(22): 2088–2097. [DOI] [PubMed] [Google Scholar]
- 23.Engineer CY, Dale E, Agarwal A, et al. Effectiveness of a pay-for-performance intervention to improve maternal and child health services in Afghanistan: a cluster-randomized trial. Int. J. Epidemiol 2016; 45(2): 451–459. [DOI] [PubMed] [Google Scholar]
- 24.Cowling BJ, Chan KH, Fang VJ, et al. Facemasks and hand hygiene to prevent influenza transmission in households: a cluster randomized trial. Ann. Intern. Med 2009; 151(7): 437–446. [DOI] [PubMed] [Google Scholar]
- 25.George CM, Monira S, Sack DA, et al. Randomized controlled trial of hospital-based hygiene and water treatment intervention (CHoBI7) to reduce cholera. Emerg. Infect. Dis 2016; 22(2): 233–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Guiteras R, Levinsohn J, Mobarak AM. Encouraging sanitation investment in the developing world: a cluster-randomized trial. Science 2015; 348(6237): 903–906. [DOI] [PubMed] [Google Scholar]
- 27.Lin A, Ercumen A, Benjamin-Chung J, et al. Effects of water, sanitation, handwashing, and nutritional interventions on child enteric protozoan infections in rural Bangladesh: a cluster-randomized controlled trial. Clin. Infect. Dis 2018; 67(10): 1515–1522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Theiss-Nyland K, Qadri F, Colin-Jones R, et al. Assessing the impact of a vi-polysaccharide conjugate vaccine in preventing typhoid infection among Bangladeshi children: a protocol for a phase IIIb trial. Clin. Infect. Dis 2019; 68(Supplement 2): S74–S82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Gupta A, Swindells S, Kim S, et al. Feasibility of Identifying Household Contacts of Rifampin- and Multidrug-Resistant Tuberculosis Cases at High Risk of Progression to Tuberculosis Disease. Clin. Infect. Dis 2019. 10.1093/cid/ciz235. Accessed Nov. 23, 2019. [DOI] [PMC free article] [PubMed]
- 30.Donner A Sample size requirements for stratified cluster randomization designs. Stat. Med 1992; 11(6): 743–750. [DOI] [PubMed] [Google Scholar]
- 31.Pocock SJ, Assmann SE, Enos LE, Kasten LE. Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat. Med 2002; 21(19): 2917–2930. [DOI] [PubMed] [Google Scholar]
- 32.Campbell MK, Piaggio G, Elbourne DR, Altman DG, CONSORT Group. Consort 2010 statement: extension to cluster randomised trials. BMJ 2012; 345: e5661. [DOI] [PubMed] [Google Scholar]
- 33.Breukelen vGJP, Candel MJJM, Berger MPF. Relative efficiency of unequal versus equal cluster sizes in cluster randomized and multicentre trials. Stat. Med 2007; 26(13): 2589–2603. [DOI] [PubMed] [Google Scholar]
- 34.Liu J, Colditz GA. Relative efficiency of unequal versus equal cluster sizes in cluster randomized trials using generalized estimating equation models. Biom. J 2018; 60(3): 616–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Austin PC. A comparison of the statistical power of different methods for the analysis of cluster randomization trials with binary outcomes. Stat. Med 2007; 26(19): 3550–3565. [DOI] [PubMed] [Google Scholar]
- 36.Gail MH. The determination of sample sizes for trials involving several independent 2×2 tables. J. Chronic Dis 1973; 26(10): 669–673. [DOI] [PubMed] [Google Scholar]
- 37.Gail MH. Adjusting for covariates that have the same distribution in exposed and unexposed cohorts In: Moolgavkar SH, Prentice RL, eds. Modern Statistical Methods in Chronic Disease Epidemiology. New York, NY: Wiley; 1986: 3–18. [Google Scholar]
- 38.Hayes RJ, Bennett S. Simple sample size calculation for cluster-randomized trials. Int. J. Epidemiol 1999; 28(2): 319–326. [DOI] [PubMed] [Google Scholar]
- 39.Cochran WG. The combination of estimates from different experiments. Biometrics 1954; 10(1): 101–129. [Google Scholar]
- 40.Commenges D, Jacqmin H. The intraclass correlation coefficient: distribution-free definition and test. Biometrics 1994; 50(2): 517–526. [PubMed] [Google Scholar]
- 41.Eldridge SM, Ukoumunne OC, Carlin JB. The intra-cluster correlation coefficient in cluster randomized trials: a review of definitions. Int. Stat. Rev 2009; 77(3): 378–394. [Google Scholar]
- 42.Shih WJ. Sample size and power calculations for periodontal and other studies with clustered samples using the method of generalized estimating equations. Biom. J 1997; 39(8): 899–908. [Google Scholar]
- 43.Thomson A, Hayes R, Cousens S. Measures of between-cluster variability in cluster randomized trials with binary outcomes. Stat. Med 2009; 28(12): 1739–1751. [DOI] [PubMed] [Google Scholar]
- 44.Pan W, Wall MM. Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Stat. Med 2002; 21(10): 1429–1441. [DOI] [PubMed] [Google Scholar]
- 45.Walker GA, Saw JG. The distribution of linear combinations of t-variables. J. Am. Stat. Assoc 1978; 73(364): 876–878. [Google Scholar]
- 46.Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J. Am. Stat. Assoc 2001; 96(456): 1387–1396. [Google Scholar]
- 47.Fay MP, Graubard BI. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics 2001; 57(4): 1198–1206. [DOI] [PubMed] [Google Scholar]
- 48.Morel JG, Bokossa MC, Neerchal NK. Small sample correction for the variance of GEE estimators. Biom. J 2003; 45(4): 395–409. [Google Scholar]
- 49.Leyrat C, Morgan KE, Leurent B, Kahan BC. Cluster randomized trials with a small number of clusters: which analyses should be used?. Int. J. Epidemiol 2018; 47(1): 321–331. [DOI] [PubMed] [Google Scholar]
- 50.Campbell MK, Grimshaw JM, Elbourne DR. Intracluster correlation coefficients in cluster randomized trials: empirical insights into how should they be reported. BMC Med. Res. Methodol 2004; 4: 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Individual Randomized Trials, RI RT (Panel a), and the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panel b), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for High Overall Probability of Events (π0 = 0.90). π01 is the Probability of an Event in the Control Arm of the Low-Risk Stratum.
Figure S2. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, RC RT (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π01 = 0.80 and π01 = 0.85). Plots are for an ICC in the Unstratified Analysis, ρ0, of 0.10 and an Overall Probability of an Event in the Control Arm, π0, of 0.90. The Design Effect Used is where is the Mean Cluster Size, Assumed to be Constant Over Strata.
Figure S3. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, RC RT (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π01 = 0.02 and π01 = 0.04). Plots are for an ICC in the Unstratified Analysis, ρ0, of 0.05 and an Overall Probability of an Event in the Control Arm, π0, of 0.05. The Design Effect Used is where is the Mean Cluster Size, Assumed to be Constant Over Strata.
Figure S4. Plots of the Ratio of Sample Sizes for Stratified (Two Strata) and Unstratified Cluster Randomized Trials, RC RT (Panels a and b), the Probability of an Event in the Control Arm of the High-Risk Stratum, π02 (Panels c and d), and the Common Within-Stratum Intra-Cluster Correlation, (Panels e and f), versus the Proportion of Individuals in the Low-Risk Stratum (f1) for Two Choices of the Probability of an Event in the Control Arm of the Low-Risk Stratum (π01 = 0.02 and π01 = 0.04). Plots are for an ICC in the Unstratified Analysis, ρ0, of 0.15 and an Overall Probability of an Event in the Control Arm, π0, of 0.05. The Design Effect Used is where is the Mean Cluster Size, Assumed to be Constant Over Strata.
R Code. Program to reproduce figures in this article and to determine sample size for trials using these methods.