Abstract
We consider five asymptotically unbiased estimators of intervention effects on event rates in non‐matched and matched‐pair cluster randomized trials, including ratio of mean counts , ratio of mean cluster‐level event rates , ratio of event rates , double ratio of counts , and double ratio of event rates . In the absence of an indirect effect, they all estimate the direct effect of the intervention. Otherwise, , and estimate the total effect, which comprises the direct and indirect effects, whereas and estimate the direct effect only. We derive the conditions under which each estimator is more precise or powerful than its alternatives. To control bias in studies with a small number of clusters, we propose a set of approximately unbiased estimators. We evaluate their properties by simulation and apply the methods to a trial of seasonal malaria chemoprevention. The approximately unbiased estimators are practically unbiased and their confidence intervals usually have coverage probability close to the nominal level; the asymptotically unbiased estimators perform well when the number of clusters is approximately 32 or more per trial arm. Despite its simplicity, performs comparably with and in trials with a large but realistic number of clusters. When the variability of baseline event rate is large and there is no indirect effect, and tend to offer higher power than , and . We discuss the implications of these findings to the planning and analysis of cluster randomized trials.
Keywords: cluster randomized trial, event rate, incidence rate ratio, ratio estimator, relative incidence
1. INTRODUCTION
The cluster randomized trial (CRT) is an important study design in medical and health research. 1 , 2 , 3 Data on outcome events may be collected by passive surveillance or active surveillance. 4 Passive surveillance methods may or may not provide data at the individual level. That is, they may determine only the number of events in a cluster, without identifying which individual members of the cluster experienced the events. Furthermore, the denominators for standard practice of calculating event rates may not be available. 4 The advantage of passive surveillance is that the monetary and opportunity cost of data collection can be much reduced.
Broadly speaking, there are two approaches to the analysis of CRTs: individual‐level analysis and cluster‐level analysis. Methods for individual‐level analysis of CRTs include random‐effects models and generalized estimating equations. As compared to cluster‐level analysis, individual‐level analysis has the relative advantage of efficiency and ease in covariate adjustment. However, it has the relative disadvantage of being less robust, especially when the number of clusters is small. 1 Furthermore, data collection by passive surveillance may not be compatible with individual‐level analysis. In this manuscript we consider only cluster‐level analysis.
An estimator of the intervention effect in terms of incidence rate ratio, also called relative incidence, that only uses event data is a ratio of the arithmetic mean of the number of outcome events per cluster in the intervention arm to that in the control arm. We call this the “ratio of mean counts”, denoted by .
The denominator of event rates, that is, units of person‐time, in CRTs is usually variable across clusters. The person‐time for estimation of an event rate is sometimes approximated by the population size at some point of the study duration. In this article we use the phrases person‐time and population size interchangeably. Typical statistical practice makes comparison of event rates instead of mean number of events between trial arms. It requires extra resources in the collection of person‐time data. A demographic surveillance system, a population census, or rounds of community surveys may be required for this purpose. With both the number of events and person‐time collected for each cluster, one may calculate a cluster‐level event rate for each cluster, denoted by , where and are the number of events and person‐time in the jth cluster in the ith trial arm, respectively. Then, the arithmetic means of the cluster‐level event rates in the intervention and control arms are calculated, denoted by and , respectively. The ratio of the two means, is a popular estimator of the incidence rate ratio. 1 , 5 , 6 This estimator has been evaluated by simulations but not analytically. One simulation study considered scenarios of, approximately, = 0, 0.125, and 0.25, where is the coefficient of variation of the cluster‐level event rate in the ith trial arm and is known as “k” in the literature. 5 It found little bias in . However, another simulation study considered a broader range of . 6 It showed that was practically unbiased when = 0.05 and 0.15, but it was biased when = 0.4. Analytical investigation and simulation evaluation in a broad range of parameter values are warranted.
An alternative estimator of incidence rate ratio can be obtained by first calculating the event rate in each trial arm as the sum of the number of events divided by the sum of person‐time over the clusters, 7 , 8 and then calculate the ratio of these event rate estimates between the trial arms. We call this the “ratio of event rates”, denoted by . While (i = 0,1) is an unweighted average of cluster‐level event rates in the ith trial arm, the alternative estimator of event rate here can be seen as a weighted average of cluster‐level event rates, with the clusters' population sizes as weights.
In CRTs, interventions are often provided only to a specific group of the cluster members instead of all cluster members. For example, in studies of vaccines for pediatric infectious diseases, usually only young children in a specific age range are offered the interventions or its control comparators. Older children and adults are not. We refer to the two groups of cluster members as the target and non‐target groups. The outcome events may occur in both groups. In studies based on passive surveillance, the event data may be collected for the non‐target group in addition to the target group without much additional resources required, because the capital cost and infrastructure are already invested for the target group anyway. We consider an estimator that we call “double ratio of counts”, denoted by , by replacing the sums of person‐time in by the sums of number of events in the non‐target groups. Note that this estimator is defined even if the number of events in the non‐target group is zero in some clusters, which is a realistic situation because usually the reason of it being a non‐target group is that the disease incidence is relatively low. The motivation for considering this estimator arises from not only concerns of feasibility and cost of data collection but also concerns of precision and power. From an epidemiological point of view, sometimes we anticipate that the event counts in the target and non‐target groups are highly correlated, because they are both the manifestation of the disease burden in the clusters. In particular, some events are highly localized, for example, infectious diseases occurring in small outbreaks. For such events, the correlation between number of events in the target and non‐target groups, say children inside and outside a vaccination age range, is likely to be much stronger than the correlation between the number of events and amount of person‐time in the target group. This advantage in correlation offers a potential for improved precision. Note that and the three estimators aforementioned have different targets of estimation (estimands): , and estimate the total effect of the intervention whereas estimates the direct effect. Details will be discussed in the next section.
We also propose a new estimator that we call “double ratio of event rates”, denoted by . It has the ratio of event rates between the target and non‐target groups in the intervention arm as the numerator and its counterpart in the control arm as the denominator. Details in statistical notations will be provided in the next section. We hypothesize that this estimator will out‐perform in precision and power.
Donner and Klar pointed out that CRTs of binary outcomes may regard a proportion as a ratio and then an appropriate variance estimate can be obtained from sample survey theory. 9 They used the ratio of this estimated variance to the estimated binomial variance to adjust the Chi‐square statistics for hypothesis testing. In the context of toxicological experiments in which litters of animals were the experimental units and a binary outcome was observed for each animal, Rao and Scott proposed using the aforementioned approach to adjust the Chi‐square and Cochran‐Armitage statistics. 10 There has been some subsequent research on using ratio estimators for CRTs with event rate outcomes, including the two simulation studies of aforementioned. 5 , 6 Furthermore, Dufault and Jewell proposed permutation tests of counts of events only, with or without adjustment for differential ascertainment. 4 All of them concerned only CRTs that randomize clusters individually and (implicitly) aim to estimate the total effect.
This study aims to (a) evaluate and compare the performance of the five estimators aforementioned and (b) develop, evaluate, and compare bias‐corrected version of them. In Section 2 we will analytically assess and develop the methods. In Section 3 we will evaluate the methods by simulation in a broad range of realistic scenarios. In Section 4 we will apply the methods to a study of seasonal malaria chemoprevention. Section 5 gives some concluding remarks.
For brevity, we will focus on CRTs that randomize clusters individually, that is, non‐matched CRTs. Where necessary we also provide the details for matched‐pair CRTs in which one cluster per matched pair is randomized to receive the intervention and the other serves as the control. Introduction to the two types of CRTs can be found in, for example, Hayes and Moulton 1 and Donner and Klar. 2
2. STATISTICAL METHODS
2.1. Intervention effects and event rates
An intervention may have a direct effect and an indirect effect, for example, via reducing disease transmission in the community. 3 , 11 Only the intervention's target group can benefit from the direct effect; both the target and non‐target groups may benefit from the indirect effect, if any. Assume that:
where , and are the total, direct, and indirect effects in terms of log incidence rate ratio. If there is no indirect effect, and . The presence of an indirect effect depends on various factors including the nature of the interventions and outcome events. For example, even though vaccines are often anticipated to generate some degree of indirect effect on efficacy endpoints, they are usually anticipated to have no indirect effect on safety endpoints.
Let and be the number of events and total person‐time in the kth group of the jth cluster in the ith trial arm in the population, respectively, where k = 1 and 0 represent the target and non‐target groups, respectively, i = 1 and 0 represent intervention and control trial arms, respectively. We consider a data generating process that is often used in epidemiologic modeling, that the expected value of given is:
| (1) |
where and represent direct and indirect effects (; ; ), is a random cluster effect with standard deviation ≥ 0 that represents variation in event rates between clusters within each trial arm, and represents the difference in event rates between the target and non‐target group (; ). Note that , the coefficient of variation in cluster‐level event rate. 5 , 6 By randomization, the distributions of and are identical in expectation between the intervention and control arms. From Equation (1), and if . The difference increases and the correlation decreases as increases.
2.2. Asymptotically unbiased estimators
Given a sample dataset of , with randomization and a large number of clusters per trial arm, the ratio of mean counts, , provides an asymptotically unbiased estimator of the total effect that compares the event rates in the intervention and control arms:
When and are small,
causing a small sample bias in the ratio estimator. 7 This and the bias in the other estimators will be discussed in Section 2.3.
Similarly, the ratio of mean cluster‐level event rates,
where , and ratio of event rates,
also provide asymptotically unbiased estimators of the total effect.
The variance of , and can be written as: 5 , 7 , 8
| (2) |
| (3) |
and
| (4) |
Furthermore, in first‐order Taylor series expansion, , 8 and
From Equation (1), . Let , where .
If and give the same sample estimate, a comparison of and boils down to an evaluation of whether
Therefore, under the condition that in both trial arms:
| (5) |
Similarly, if and give the same estimate, under the condition that in both trial arms:
| (6) |
If and give the same estimate, under the condition that in both trial arms , that is,
| (7) |
If , by Jensen's inequality. As we will see in the case study in Section 4, it is possible that approximately equals zero in real‐world situations.
In contrast, with large and randomization, the double ratio of counts () provides an asymptotically unbiased estimator of the direct effect:
The variance of is:
| (8) |
where
Comparisons of the variances of the estimators vs and vs are meaningful only if the indirect effect is absent or trivial and the estimates and . In Section A of Online Supplementary Material 1 we show that if in both trial arms:
| (9) |
Similarly, if in both trial arms:
| (10) |
A strong correlation between number of events in the target and non‐target groups as compared to the correlation between the number of events and person‐time in the target group would favor over and in terms of precision.
If a non‐trivial indirect effect is present, the absolute values of the test statistics
and
are comparable in the sense that they all indicate the probability of rejecting the null hypothesis of the target ratio being one. Let , then if in both trial arms (details in Online Supplementary Material 1):
| (11) |
Similarly, let , then if in both trial arms:
| (12) |
In the special case that or , Equations (11) or (12) reduce to Equations (9) and (10), respectively. Otherwise, assume that the estimates of direct and indirect effects are in the same direction, the closer or is to 1, the more favourable is in terms of power.
Similar to , the ratio of event rates estimator, , where , also provides an asymptotically unbiased estimator of the direct effect. The variance of is:
| (13) |
where
and .
Furthermore,
Then, if in both trial arms (details in Online Supplementary Material 1):
| (14) |
It is natural to expect that the number of events is more strongly correlated with the person‐time in the same group than the other group. Therefore, we anticipate a high chance of in many studies.
For matched‐pair CRTs, is the number of pairs of clusters. Within the jth pair of clusters, one cluster is randomized to receive intervention (i = 1) and the other is the control cluster (i = 0). The paired design version of the five estimators, , and their variances are shown in Appendix Table A1.
2.3. Approximately unbiased estimators
The literature about bias in ratio estimators and the mitigation methods has very much focused on paired observations, mostly concerning an estimator in the form of . 7 , 12 Rao and Pereira considered a ratio‐of‐ratio estimator in the form of or . 13 These previous works showed that the estimators have a bias of order ; bias‐reduction methods were proposed. Useful though they are, they do not deal with non‐matched CRTs and and .
One solution is to determine the expectation and therefore bias of a ratio estimator, and then subtract the bias from the estimator. See, for example, van Kempen and van Vliet 8 and Rao and Pereira. 13 Although it has only been considered in studies of paired observations, the concept is applicable to both non‐matched and matched‐pair CRTs. Following this approach, we propose a set of approximately unbiased estimators. The key results for non‐matched CRTs are shown below. Their matched‐pair counterparts and details of the derivations are available in Section B of Online Supplementary Material 1.
2.3.1. Ratio of mean counts in non‐matched CRTs
The expectation of the asymptotically unbiased estimator and approximately unbiased estimator of the ratio of means estimator are, respectively:
with the unknown population mean approximated by the sample mean to form the sample CV.
2.3.2. Ratio of mean cluster‐level event rates in non‐matched CRTs
The expectation of the asymptotically unbiased estimator and approximately unbiased estimator are, respectively:
| (15) |
with the unknown population mean of cluster event rates replaced by the sample estimate to form the sample CV.
2.3.3. Ratio of event rates in non‐matched CRTs
with the unknown population mean and replaced by their sample estimates to form the sample CVs.
2.3.4. Double ratio of counts in non‐matched CRTs
and can be obtained by replacing , and by , , and in the formula in the previous sub‐section on ratio of event rates.
2.3.5. Double ratio of event rates in non‐matched CRTs
2.3.6. Variances and confidence intervals
The variance of is:
Following the same steps, it can be shown that for 2, 3, 4, and 5 as well.
The distribution of ratios is not normal. For calculation of confidence intervals, we calculate . Using the delta method and the result above, , where have been given in Equations (2), (3), (4), (8), and (13). Confidence intervals (CI) are calculated using the t‐distribution with degrees of freedom for non‐matched CRTs and degrees of freedom for matched‐paired CRTs. 14 The CIs calculated are then exponentiated back to the original scale.
In Equation (4), the calculation of involves an asymptotic variance estimator of . Similarly, in Equations (8) and (13), this variance estimator is involved in the calculation of and , and then the solutions are plugged into the estimators of and , respectively. Cochran showed that this variance estimator gave a considerable under‐estimation. 7 In contrast, he showed that the Jackknife method only mildly over‐estimated the variance and the over‐estimation vanished quickly as the number of observations increased. As such, an alternative method to statistical inference is to use the Jackknife method to estimate , and and then plug these values into the calculation of , 3, 4, and 5 and the respective CIs. We will use , 3, 4, and 5, to denote the estimators when used together with this Jackknife‐based variance estimation method.
3. SIMULATION
3.1. Simulation setting
For non‐matched CRTs, we generated the number of events in the kth group in the jth cluster in the ith trial arm, conditional on the person‐time , by using a Poisson distribution with expected value given by Equation (1). We considered three sets of intervention effects, representing (a) direct protection (with no indirect effect), (b) direct and indirect protection, and (c) no effect, respectively: (a) and ; (b) and ; (c) and . We set for the difference in event rate between the target and non‐target groups. We set person‐times as following a bivariate distribution with means 100, coefficient of variation 0.2, 0.4, or 0.6, skewness 1.5, kurtosis 4, and correlation 0.8. Our choice of takes into account the findings from a recent systematic review of CRTs that the first quartile, median, and third quartile of were 0.22, 0.41, and 0.52, respectively. 15 We used positively skewed distributions because that is implied by the sizeable CV(0.6) and positive values of person‐time. We used the rmvnonnormal macro in Stata for the non‐normal data generation. 16 Additionally, we simulated person‐time using a bivariate normal distribution with means 100 and 0.2 or 0.4, as symmetric distribution is possible under modest CV. A small number of observations (<1% in total) with person‐time either below 5 or above 350 were replaced by 5 or 350, respectively. This is because CRTs often exclude clusters that are very small in size and exclude or sub‐divide very large clusters due to operational and efficiency considerations. 5 , 6 We set the cluster effect, , as following a normal distribution with mean − 2 and 0.05, 0.2, and 0.5. The three levels of correspond to approximately = 0.05, 0.2, and 0.5. 5 Hayes and Bennett noted that is “often ≤ 0.25 and seldom exceeds 0.5 for most health outcomes”. 14
For matched‐pair CRTs, we generated the number of events in the ith trial arm in the jth paired cluster by using a Poisson distribution with expected value in the target group and in the non‐target group. We set the paired cluster effects as following a bivariate normal distribution with means −2 and SDs 0.05, 0.2, or 0.5 and correlation 0.8. The person‐time parameter in the kth group in the jth pair of clusters followed a multivariate non‐normal distribution with means 100, 0.2, 0.4, or 0.6, skewness 1.5, kurtosis 4, and correlation 0.8. Additionally, multivariate normal distribution was used in the case of 0.2 or 0.4. The other parameters in the matched‐pair CRTs were the same as those in the non‐matched CRTs.
In the literature, it has been suggested that non‐matched CRTs should include at least four clusters per trial arm and matched‐pair CRTs should include at least six pairs of clusters. 1 For non‐matched CRTs, we evaluated the properties of the estimators when the number of clusters per trial arm is 4, 6, 8, 12, 16, 32, and 64. For matched‐pair CRTs, we considered 6, 8, 12, 16, 32, and 64 pairs of clusters.
In each scenario, we conducted 10 000 replicates of data generation and in each of them calculated the five asymptotically unbiased estimators and the five approximately unbiased estimators and their variances. We report the relative bias of the mean estimates of the incidence rate ratio, root mean square error (RMSE), coverage probability (CP) of the 95% confidence intervals (CI) and power to reject the null hyperthesis of the respective ratio equals one (or type 1 error when the null hyperthesis is true). Calculation of CIs was based on log‐transformation and then exponentiate back to the original scale, which were used for statistical inference. Calculations of relative bias and RMSE were based on the ratios themselves without transformation.
3.2. Simulation results
In Figures 1, 2, 3 we show the simulation results of non‐matched CRTs that the intervention only had a direct effect, that is, and , and , which was approximately the median level of variability in cluster size found by a systematic review. 15 To maintain visual clarity, we separately present the results on , , , , , and (upper panel) and , , , and (lower panel).
FIGURE 1.

Relative bias of intervention effect estimators in relation to the number of clusters per trial arm for non‐matched CRTs, by three levels of ; population size per cluster follows a skewed distribution with mean = 100 and CV = 0.4; intervention has a direct effect only
FIGURE 2.

Root mean squared error (RMSE) of intervention effect estimators in relation to the number of clusters per trial arm for non‐matched CRTs, by three levels of ; population size per cluster follows a skewed distribution with mean = 100 and CV = 0.4; intervention has a direct effect only
FIGURE 3.

Coverage probability (CP) of 95% confidence interval (calculated on log‐scale and exponentiated back to the original scale) in relation to the number of clusters per trial arm for non‐matched CRTs, by three levels of ; population size per cluster follows a skewed distribution with mean = 100 and CV = 0.4; intervention has a direct effect only
Figure 1 shows the patterns of relative bias. All the asymptotically unbiased estimators showed positive relative bias that decreased as the number of clusters per arm () increased. Furthermore, the bias of , , and increased as increased. In scenarios with , , and only showed very mild bias and the two curves mostly overlapped. In contrast, the approximately unbiased estimators, , , and , were practically unbiased under all situations considered. The relative bias of and was stable in relation to . The approximately unbiased estimators and were practically unbiased under all situations considered.
In Figure 2, and had smaller RMSE than , but they converged as increased. had slightly smaller RMSE than when and was small. Otherwise they were almost indistinguishable from each other. had smaller RMSE than . Their difference was stable across level of but they converged as increased. The asymptotically unbiased estimators had RMSE similar to or slightly larger than their respective approximately unbiased counterparts', but the difference vanished as increased.
The estimators tended to have coverage probability (CP) smaller than the nominal 95% level, especially when was large (Figure 3). The CP improved as increased. The asymptotically unbiased estimators tended to have similar or slightly lower CP than their respective approximately unbiased counterparts. In the upper panel, had CP close to the nominal 95% across all levels of and . and had CP below 94% in some scenarios of small and large . Using the Jackknife‐based variance estimator described in Section 2 for , denoted by , gave improved CP that was closer to the nominal 95% level than . In the lower panel, , , , and had varying degree of under‐coverage in different scenarios. However, using the Jackknife‐based variance estimators for and , denoted by and , respectively, the CP was close to the nominal level in all situations.
Figure 4 shows type 1 error rates, that is, rejection of null hypothesis in scenarios with and . Otherwise, the parameters are the same as those in Figures 1, 2, 3. There was a tendency for all estimators to have type 1 error rate that exceeded the 5% target level, especially with 0.2 and 32. The inflation reduced as increased. In the upper panel, and , followed by and , performed better than the others. In the lower panel, , , , and had varying level of inflation of type 1 error under different parameter settings. In contrast, and performed well. In no circumstances did they show more than 1% deviation from the 5% target.
FIGURE 4.

Type 1 error rate in relation to the number of clusters per trial arm for non‐matched CRTs, by three levels of ; population size per cluster follows a skewed distribution with mean = 100 and CV = 0.4; intervention has no effect
Figure 5 compares the power of selected estimators that use person‐time as denominators, and , vs selected estimators that use event counts in the non‐target group as denominators, and . We focused on them because they performed well in terms of CP and type 1 error rate. The lower panel introduced an indirect effect, , in addition to a direct effect, . Otherwise, the parameters here are the same as those in Figures 1, 2, 3, 4. When there was direct effect only (upper panel), and were more powerful than and for . In contrast, and were more powerful when . Furthermore, was more powerful than in all the scenarios considered. With the addition of the indirect effect, and were more powerful than at all levels of , but similar to at .
FIGURE 5.

Power of in relation to the number of clusters per trial arm for non‐matched CRTs, by three levels of ; population size per cluster follows a skewed distribution with mean = 100 and CV = 0.4. Upper panel: with direct effect only ; lower panel: with direct and indirect effects
Further simulation results on non‐matched CRTs under other parameter settings and simulation results on matched‐pair CRTs are available in Online Supplementary Material 2. The findings are qualitatively similar to those reported above. Some relatively important additional information is as follows: First, in non‐matched CRTs, the relative bias of and increased substantially as increased except when the number of cluster was about 32 or above. In contrast, , , and the approximately unbiased estimators were much less sensitive to the magnitude of (eg, Figures S1 and S2). Second, in non‐matched CRTs, the CP of reduced to about 93% and its type 1 error rate increased to about 7% when = 0.6, = 0.5 and the number of clusters was 6 or below (eg, Figures S6 and S26). However, in matched‐paired CRTs, performed well in these aspects while had somewhat inflated type 1 error rate and below target CP (eg, Figures S41, S42, S62, and S63).
4. SEASONAL MALARIA CHEMOPREVENTION TRIAL
We use a subset of data from a published study of seasonal malaria chemoprevention (SMC) in Senegalese children to illustrate. 17 The trial set‐up is shown in Figure 6. The trial had a total of 54 clusters. It had the appearance of an “optimal design”, 18 , 19 with nine clusters on intervention (leftmost column) and nine clusters on control condition (rightmost column) for all three time periods (2008 to 2010) that resembled a non‐matched CRT, flanking a standard stepped‐wedge trial (middle columns). The middle columns represent 18 clusters that were randomized to receive SMC from 2009 and another 18 clusters randomized to receive SMC from 2010. However, the trial was not planned according to the optimal design. The original plan was that the trial would continue up to 4 years (2008 to 2011), and the nine clusters on the rightmost column were randomized to receive SMC in 2011. But the trial was terminated after the malaria transmission season in 2010 according to data monitoring and interim analysis results. Furthermore, in the first period (2008), children aged 3 to 59 months in the nine clusters on the leftmost column were given SMC as part of the preparation of the study logistics. In 2009 and 2010, children aged between 3 months and 9 years (inclusive) were given SMC.
FIGURE 6.

Design of seasonal malaria chemoprevention (SMC) trial
A passive surveillance system was implemented in health facilities to determine the number of clinical malaria episodes confirmed by rapid malaria test (primary endpoint) in each cluster, by four age groups (59 months or below; 5‐9; 10‐19; over 19 years). The data was at the cluster, not individual, level. Mortality data was collected for all age groups but only data for children aged 9 years or below was used in the previously published analysis 17 and available to the present analysis. Number of deaths (secondary endpoint) and population size of the clusters at mid‐September each year (approximately the beginning of the annual malaria transmission season), by age groups, was collected by a demographic surveillance system.
For the purpose of illustration, we used the 2008 data from the 18 clusters that resembled a non‐matched CRT. Children aged between 3 to 59 months were the target group. We considered children aged 5 to 9 years the non‐target group.
Table 1 shows the descriptive statistics by endpoints and trial arms. It also included a simple average of each statistics in the two trial arms as a summary. There was very large between‐cluster variability in malaria incidence rate in the target group, with >1 in both arms. In the target group, the CVs of malaria episodes approximately doubled the CVs of the population size, with . The correlation coefficient between malaria episodes and population size of the target group, , was weak, with average across trial arms being only 0.03. The estimates of malaria incidence rate in the SMC and control arms were and , respectively. In contrast, the correlation between malaria episodes in the target and non‐target group, , was strong, averaged at 0.76.
TABLE 1.
Descriptive summary of number of malaria episodes and deaths, population size and their correlations and coefficient of variation of cluster‐level event rates in 9 intervention clusters and 9 control clusters in a seasonal malaria chemoprevention (SMC) trial in 2008
| Malaria | Mortality | ||||||
|---|---|---|---|---|---|---|---|
| Statistics | SMC | Control | Average | SMC | Control | Average | |
|
|
8.00 | 3.56 | 5.78 | 7.11 | 2.89 | 5.00 | |
|
|
15.44 | 5.89 | 10.67 | 1.00 | 0.56 | 0.78 | |
|
|
2834 | 1741 | 2288 | 2834 | 1741 | 2288 | |
|
|
1977 | 1187 | 1582 | 1977 | 1187 | 1582 | |
|
|
1.21 | 1.75 | 1.48 | 0.89 | 0.68 | 0.79 | |
|
|
1.24 | 1.49 | 1.37 | 1.05 | 0.95 | 1.00 | |
|
|
1.15 | 1.08 | 1.12 | 1.58 | 1.31 | 1.45 | |
|
|
0.46 | 0.73 | 0.60 | 0.46 | 0.73 | 0.60 | |
|
|
0.49 | 0.65 | 0.57 | 0.49 | 0.65 | 0.57 | |
|
|
0.10 | −0.05 | 0.03 | 0.48 | 0.69 | 0.59 | |
|
|
0.13 | −0.05 | 0.04 | 0.57 | 0.68 | 0.63 | |
|
|
0.26 | 0.05 | 0.15 | 0.18 | 0.58 | 0.38 | |
|
|
0.34 | 0.01 | 0.17 | 0.26 | 0.57 | 0.41 | |
|
|
0.88 | 0.64 | 0.76 | 0.67 | 0.35 | 0.51 | |
|
|
0.988 | 0.998 | 0.993 | 0.988 | 0.998 | 0.993 | |
Table 2 shows the estimation results. The estimates based on the asymptotically unbiased estimators were all larger than those based on the approximately unbiased estimators. Given this number of clusters, we considered the latter more accurate. For l = 3, 4, and 5, the SE's based on were only slightly larger than All confidence intervals were quite wide. Although was weak, the estimated was much larger than and , leading to standard error larger than and . Given in both trial arms, . As expected from Equations (9) and (10), with strong and weak , was smaller than and . The estimators for direct effects, and , gave very similar result. Since there was no practical difference between and in either trial arm, as indicated by Equation (14), .
TABLE 2.
Estimates, SE and 95% confidence intervals (CI; exponentiation of log‐transformed values) for malaria and mortality in seasonal malaria chemoprevention trial data in 2008
| Endpoint | Estimator | Estimate | SE | 95% CI | |
|---|---|---|---|---|---|
| Malaria |
|
1.70 | 1.45 | (0.28, 10.4) | |
|
|
0.62 | 0.67 | (0.06, 6.07) | ||
|
|
1.01 | 0.97 | (0.13, 7.80) | ||
|
|
1.01 | 1.00 | (0.12, 8.26) | ||
|
|
0.74 | 0.37 | (0.25, 2.14) | ||
|
|
0.74 | 0.40 | (0.23, 2.32) | ||
|
|
0.76 | 0.38 | (0.26, 2.19) | ||
|
|
0.76 | 0.41 | (0.24, 2.38) | ||
|
|
2.25 | 1.45 | (0.57, 8.85) | ||
|
|
0.94 | 0.67 | (0.21, 4.24) | ||
|
|
1.38 | 0.97 | (0.31, 6.16) | ||
|
|
0.86 | 0.37 | (0.34, 2.14) | ||
|
|
0.88 | 0.38 | (0.35, 2.20) | ||
| Mortality |
|
2.21 | 1.16 | (0.73, 6.74) | |
|
|
1.16 | 0.46 | (0.50, 2.67) | ||
|
|
1.44 | 0.58 | (0.61, 3.40) | ||
|
|
1.44 | 0.61 | (0.59, 3.54) | ||
|
|
1.08 | 0.81 | (0.22, 5.26) | ||
|
|
1.08 | 0.88 | (0.19, 6.02) | ||
|
|
1.12 | 0.83 | (0.23, 5.41) | ||
|
|
1.12 | 0.91 | (0.20, 6.33) | ||
|
|
2.46 | 1.16 | (0.90, 6.70) | ||
|
|
1.22 | 0.46 | (0.55, 2.70) | ||
|
|
1.51 | 0.58 | (0.67, 3.42) | ||
|
|
1.37 | 0.81 | (0.39, 4.78) | ||
|
|
1.40 | 0.83 | (0.40, 4.92) |
Table 1 also shows that the descriptive statistics on mortality. There was less between‐cluster variability in mortality rate than malaria incidence, with in the two arms averaged at 0.79. Furthermore, in the target group, the CVs of death approximately doubled the CVs of the population size, with . Unlike malaria episodes, the correlation coefficient between deaths and population size of the target group was more substantial, with average across trial arms being 0.59. The estimates of mortality rate in the SMC and control arms were and , respectively. The correlation between deaths in the target and non‐target group was moderate, with average at 0.51; this was weaker than that for malaria episodes.
Estimation results on mortality are available in Table 2. Again, the asymptotically unbiased estimators tended to give larger estimates than their respective approximately unbiased estimators. However, the differences between the two set of estimates were smaller than those on malaria episodes, as expected from the smaller between‐cluster variability in mortality rate than malaria incidence rate. The SE's based on were only slightly larger than those based on All confidence intervals were quite wide. Since was substantial and the estimated was much larger than and , and were smaller than . With fairly similar values of and and relatively large , and were smaller than . There was no clear difference between and in either trial arm, so .
5. DISCUSSION
The proposed approximately unbiased estimators successfully reduce the bias when the number of clusters is small. They also have advantages in terms of smaller RMSE and more accurate coverage probability than the asymptotically unbiased estimators. For studies with fewer than 60 clusters per arm, we recommend the use of the approximately unbiased estimators. Some CRTs do have a large number of clusters per trial arm, for example, a trial of influenza vaccination had over 400 nursing homes per arm 20 and a trial of mass drug administration had over 700 communities per arm. 21 For such studies, the choice between the asymptotically and approximately unbiased estimators is unimportant. Furthermore, with a large number of clusters, the simple estimator has performance very similar to and . At the study planning stage, investigators may take into account this finding when they consider the cost and benefit of collecting person‐time data.
Previous simulation studies evaluated the performance of the estimator . 5 , 6 We caution that the range of parameter values they considered were somewhat narrow. From our analytic solution, the estimator is only asymptotically unbiased. As seen in Equation (15), the bias in the estimator is a non‐linear function of the between‐cluster variability of event rate. From simulation, the bias of became obvious as the variability increased, especially when the number of clusters per trial arm was below 16 or so. In the SMC study of malaria episodes, where the variability was large, the estimate was much larger than . In those situations, the use of our proposed bias‐corrected estimator is preferable over .
While performs well in terms of bias and RMSE, its variance estimator under‐estimates the true variability as the CV of cluster size or event rate increases; it approaches the target CP as the number of clusters increases. The under‐coverage can be corrected by plugging the Jackknife estimates of into the estimator for . We found that and have similar performance except when = 0.6 or = 0.5. In those settings, had type 1 error rate up to about 2% higher than the nominal 5% level when the number of clusters was six or below, and may be preferred. However, in matched‐pair CRTs, did not out‐perform . Furthermore, previous studies had shown that the estimator of cluster‐level event rate in a trial arm, , is a biased estimator, and the level of bias does not reduce in relation to increase in number of clusters. 8 In contrast, can be used to obtain an asymptotically unbiased estimate of disease incidence. 7 , 8 Even if is used to estimate incidence rate ratio, is preferable over as an estimator of the incidence rate in each trial arm.
Our consideration of and was in part motivated by an evaluation of a malaria vaccine where indirect effects were unlikely and the incidence of some outcomes such as meningitis (a safety outcome) could be highly variable between clusters, and where balanced randomization with respect to access to health care facilities (where passive surveillance took place) was difficult to ensure. 22 When is large, the number of events in the target group is likely to be more strongly correlated with the number of events in the non‐target group than with the person‐time in the target group. Furthermore, it is typical that vaccine studies anticipate no indirect effect on safety endpoints. So, these estimators are expected to estimate the same target as far as safety is concerned. In simulation studies of scenarios that the intervention only had a direct effect, we have seen that when was ≤0.2, and tended to be more powerful than and . When increased to 0.5 and in the malaria data in the SMC trial, and had substantially smaller SE than and . For situations like this, and are our estimators of choice. In various scenarios we evaluated, gave higher level of statistical power than , except that they had similar performance when was small. In that case, the magnitude of the benefits may not justify the extra cost in collection of person‐time data in both the target and non‐target groups. Otherwise, tends to be preferable over .
We foresee that the estimators that use person‐time in the target group and estimators that use number of events in the non‐target groups as a denominator may be used in different parts of the same CRT, depending on the considerations aforementioned. For example, or or their extensions may be used in efficacy analysis while and or their extensions may be used in safety analysis.
Furthermore, and are generic quantities in the sense that may be replaced by quantities other than the number of events in non‐target groups to achieve other purposes. For example, there has been interest in the use of “negative control events” to remove the bias arising from differential ascertainment of outcome events in non‐blinded CRTs. 4 The proposed estimator in the literature is in the form of , with and replaced by the number of negative control events in the intervention and control arm, respectively. 4 The previous study did not consider the properties of the estimator in situations with small number of clusters. The results here apply directly. Another example of application is to the CRT with Before and after observations (CRT‐BA) design, 23 which collects data in a baseline period before launching the randomized intervention and control comparator. By replacing by the baseline event count, becomes a baseline adjusted estimator of the total effect. It offers a robust alternative to the analysis of CRT‐BA trials.
The strengths of the present study include coverage of both non‐matched and matched‐pair CRTs, analytic evaluation of the bias of existing asymptotically unbiased estimators, proposal of estimators that capitalize on denominators other than cluster population size and their potential applications, development of a bias‐corrected version of these estimators for use in studies with a small number of clusters, and simulation evaluation of the estimators with a realistic range of variability in cluster size. A limitation is that the methods do not handle covariate adjustment. However, good use of restricted randomization may reduce the need for covariate adjustment in the analysis stage. 1
In non‐matched CRTs, one approach for controlling covariate effects is stratified analysis and pooling of stratum‐specific estimates using weights inversely proportional to variances. The limitation is that it is not practical to stratify for multiple covariates and it requires categorization of continuous covariates. Another approach is to apply ANCOVA to cluster‐level data. However, it only works for methods that generate a summary value per cluster, such as and , whose calculation begins with getting an event count and event rate per cluster, respectively. This is different from the calculation of to , which begins with generating an estimate for a trial arm. For example, in , the incidence rate in a trial arm is the sum of events over clusters divided by the sum of person‐time over clusters within the trial arm. There is not a summary value for every cluster. Furthermore, when the purpose is to estimate rate ratio instead of rate difference, the ANCOVA approach would need to analyze the log‐transformed values instead. The exponentiated intervention effect estimate is then interpreted as a ratio of geometric means, which is not the same as the widely used estimator of ratio of arithmetic means (including and ). In CRTs with a small number of clusters, the bias and bias‐correction for ratios of geometric means of event counts and event rates have yet to be investigated. Another approach is to use Poisson regression analysis with cluster‐level covariates (and individual‐level covariates if available) without using the intervention variable as predictors to obtain an expected number of events for each cluster. 5 Comparison of the deviations of the observed from the expected number of events between trial arms then offers a covariate adjusted estimate for the intervention effect. Following this idea, if in or in are replaced by this expected number of events, they become covariate adjusted estimators of the total or direct effect, respectively. The bias‐correction method may then be applied to obtain covariate adjusted or . But the variance estimators may not work well as they do not account for the uncertainty in the prediction of the expected number of events. In short, while there are several candidate approaches available, challenges remain. Further research is needed to develop and evaluate these or other approaches to covariate adjustment.
The size of is important in determining the relative strength of the different estimators. A large tends to dilute this correlation. As such, pilot data and careful consideration of the between‐cluster variability in event rate is important not only for sample size determination but also for choice of study design and statistical analysis procedures.
CONFLICT OF INTEREST
The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supporting information
Online Supplementary Material 1: Comparisons of asymptotically unbiased estimators and derivations of approximately unbiased estimators
Online Supplementary Material 2: Further simulation results, by number of clusters per trial arm and three levels of SD (α ij ); mean population size per cluster is 100
APPENDIX A.
TABLE A1.
Estimators of incidence rate ratio for matched‐pair cluster randomized trials a
| Label | Estimand b | Estimator | Variance c | |||
|---|---|---|---|---|---|---|
| Ratio of means |
|
|
|
|||
| Ratio of mean cluster‐level event rates |
|
|
|
|||
| Ratio of event rates |
|
|
|
|||
| Double ratio of counts |
|
|
|
|||
| Double ratio of event rates |
|
|
|
, and are, respectively, the number of events, person‐time/population size and cluster‐level event rates in the cluster that is randomized to receive the ith trial arm (1 for intervention and 0 for control) in the jth pair of clusters and kth group (1 for target and 0 for non‐target group), and .
and : Total effect and direct effect in terms of log incidence rate ratio.
Components of the variances of are: where
Ma X, Milligan P, Lam KF, Cheung YB. Ratio estimators of intervention effects on event rates in cluster randomized trials. Statistics in Medicine. 2022;41(1):128–145. doi: 10.1002/sim.9226
Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of Ministry of Health/National Medical Research Council, Singapore.
Funding information National Medical Research Council, MOH‐000526
DATA AVAILABILITY STATEMENT
The data used in the case study are available at https://doi.org/10.17037/DATA.117. Requests for access will be reviewed by a Data Access Committee to ensure use of the data protect participant privacy according to the terms of participant consent and ethics committee approval. Stata codes for the simulations will be deposited to figshare by Wiley.
REFERENCES
- 1. Hayes RJ, Moulton LH. Cluster Randomised Trials. Boca Raton, FL: CRC Press; 2009. [Google Scholar]
- 2. Donner A, Klar N. Design and Analysis of Cluster Randomization Trials in Health Research. Chichester, UK: Wiley; 2010. [Google Scholar]
- 3. Dron L, Taljaard M, Cheung YB, et al. Global health clinical trials: role and challenges of cluster randomised trials for global health. Lancet Global Health. 2021;9:e701‐e710. [DOI] [PubMed] [Google Scholar]
- 4. Dufault SM, Jewell NP. Analysis of counts for cluster randomized trials: negative controls and test‐negative designs. Stat Med. 2020;39(10):1429‐1439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Bennett S, Parpia T, Hayes R, Cousens S. Methods for the analysis of incidence rates in cluster randomized trials. Int J Epidemiol. 2002;31(4):839‐846. [DOI] [PubMed] [Google Scholar]
- 6. Pacheco GD, Hattendorf J, Colford JM, Mausezahl D, Smith T. Performance of analytical methods for overdispersed counts in cluster randomized trials: sample size, degree of clustering and imbalance. Stat Med. 2009;28:2989‐3011. [DOI] [PubMed] [Google Scholar]
- 7. Cochran WG. Sampling Techniques. 3rd ed. Chichester, NY: Wiley; 1977. [Google Scholar]
- 8. van Kempen GMP, van Vliet LJ. Mean and variance of ratio estimators used in fluorescence ratio imaging. Cytometry. 2000;39:300‐305. [DOI] [PubMed] [Google Scholar]
- 9. Donner A, Klar N. Methods for comparing event rates in intervention studies when the unit of allocation is a cluster. Am J Epidemiol. 1994;140:279‐289. [DOI] [PubMed] [Google Scholar]
- 10. Rao JNK, Scott AJ. A simple method for the analysis of clustered binary data. Biometrics. 1992;48(2):577‐585. [PubMed] [Google Scholar]
- 11. Halloran ME, Longini IM, Struchiner CJ. Design and Analysis of Vaccine Studies. New York, NY: Springer; 2010. [Google Scholar]
- 12. Durbin J. A note on the application of Quenouille's method of bias reduction to the estimation of ratios. Biometrika. 1959;46:477‐480. [Google Scholar]
- 13. Rao JNK, Pereira NP. On double ratio estimators. Sankhyā Ind J Stat Ser A. 1968;30:83‐90. [Google Scholar]
- 14. Hayes RJ, Bennett S. Simple sample size calculation for cluster‐randomized trials. Int J Epidemiol. 1999;28(2):319‐326. [DOI] [PubMed] [Google Scholar]
- 15. Kristunas C, Morris T, Gray L. Unequal cluster sizes in stepped‐wedge cluster randomised trials: a systematic review. BMJ Open. 2017;7:e017151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Lee S. Generating univariate and multivariate nonnormal data. Stata J. 2015;15(1):95‐109. [Google Scholar]
- 17. Cisse B, Ba EH, Sokhan C, et al. Effectiveness of seasonal malaria chemoprevention in children under ten years of age in Senegal: a stepped‐wedge cluster‐randomized trial. PLoS Med. 2016;13(11):e1002175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Girling AJ, Hemming K. Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med. 2016;35:2149‐2166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Thompson JA, Fielding K, Hargreaves J, Copas A. The optimal design of stepped wedge trials with equal allocation to sequences and a comparison to other trial designs. Clin Trials. 2016;14(6):639‐647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Gravenstein S, Davidson HE, Taljaard M, et al. Comparative effectiveness of high‐dose versus standard dose influenza vaccination on numbers of US nursing home residents admitted to hospital: a cluster‐randomized trial. Lancet Respir Med. 2017;5(9):738‐746. [DOI] [PubMed] [Google Scholar]
- 21. Keenan JD, Bailey RL, West SK, et al. Azithromycin to reduce childhood mortality in sub‐Saharan Africa. New Engl J Med. 2018;378(17):1583‐1592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Milligan P. Statistical Analysis Plan for the Malaria Vaccine Pilot Evaluation, London, UK: London School of Hygiene & Tropical Medicine and World Health Organization; 2021. https://clinicaltrials.gov/ProvidedDocs/65/NCT03806465/SAP_001.pdf. Accessed July 11, 2021. [Google Scholar]
- 23. Hemming K, Taljaard M. Sample size calculations for stepped wedge and cluster randomised trials: a unified approach. J Clin Epidemiol. 2016;69:137‐146. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Online Supplementary Material 1: Comparisons of asymptotically unbiased estimators and derivations of approximately unbiased estimators
Online Supplementary Material 2: Further simulation results, by number of clusters per trial arm and three levels of SD (α ij ); mean population size per cluster is 100
Data Availability Statement
The data used in the case study are available at https://doi.org/10.17037/DATA.117. Requests for access will be reviewed by a Data Access Committee to ensure use of the data protect participant privacy according to the terms of participant consent and ethics committee approval. Stata codes for the simulations will be deposited to figshare by Wiley.
