Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Aug 25.
Published in final edited form as: Ann Appl Stat. 2023 May 1;17(2):1592–1614. doi: 10.1214/22-aoas1684

RANDOMIZATION INFERENCE FOR CLUSTER-RANDOMIZED TEST-NEGATIVE DESIGNS WITH APPLICATION TO DENGUE STUDIES: UNBIASED ESTIMATION, PARTIAL COMPLIANCE, AND STEPPED-WEDGE DESIGN

Bingkai Wang 1,a, Suzanne M Dufault 2,c, Dylan S Small 1,b, Nicholas P Jewell 3,d
PMCID: PMC12374749  NIHMSID: NIHMS2104879  PMID: 40855892

Abstract

In 2019, the World Health Organization identified dengue as one of the top 10 global health threats. For the control of dengue, the Applying Wolbachia to Eliminate Dengue (AWED) study group conducted a cluster-randomized trial in Yogyakarta, Indonesia, and used a novel design, called the cluster-randomized test-negative design (CR-TND). This design can yield valid statistical inference with data collected by a passive surveillance system and thus has the advantage of cost-efficiency compared to traditional cluster-randomized trials. We investigate the statistical assumptions and properties of CR-TND under a randomization inference framework, which is known to be robust for small-sample problems. We find that, when the differential healthcare-seeking behavior comparing intervention and control varies across clusters (in contrast to the setting of Dufault and Jewell (Stat. Med. 39 (2020a) 1429–1439) where the differential healthcare-seeking behavior is constant across clusters), current analysis methods for CR-TND can be biased and have inflated type I error. We propose the log-contrast estimator that can eliminate such bias and improve precision by adjusting for covariates. Furthermore, we extend our methods to handle partial intervention compliance and a stepped-wedge design, both of which appear frequently in cluster-randomized trials. Finally, we demonstrate our results by simulation studies and reanalysis of the AWED study.

Keywords: Case control, healthcare-seeking behavior, partial compliance, stepped-wedge design

1. Introduction.

1.1. Motivating example: An intervention to reduce Dengue incidence.

Dengue is a widespread, rapidly increasing arboviral disease, primarily transmitted by Aedes aegypti mosquitoes. Every year, there are an estimated 50 million to 100 million dengue cases globally (Cattarino et al. (2020)). To reduce dengue transmission, recent virological research has shown that Aedes aegypti mosquitoes that are transinfected with the bacterium, Wolbachia, are more resistant to spread arboviral diseases (Rainey et al. (2014), Johnson (2015), Dutra et al. (2016)). In addition, Wolbachia-infected mosquitoes can stably invade wild Aedes aegypti mosquito populations through advantageous reproductive outcomes between Wolbachia-infected and wild mosquitoes (Walker et al. (2011)). Based on these advances, the World Mosquito Program has launched worldwide studies that have successfully used Wolbachia-infected mosquitoes to control for arboviral diseases.

The Applying Wolbachia to Eliminate Dengue (AWED) study is an unblinded cluster-randomized trial that evaluated the efficacy of Wolbachia-infected mosquito deployments to reduce dengue incidence in Yogyakarta, Indonesia (Utarini et al. (2021)). In this study, 24 contiguous geographical clusters were equally randomized to receive intervention (initial Wolbachia-infected mosquito deployments) or control (no intervention). Intervention is thus assigned at the cluster level, that is, all individuals in the same cluster are exposed to the same intervention, due to its nature. During a two to three-year follow-up, incident dengue cases were recruited by a study-initiated passive surveillance system: among recruited patients with acute fever who presented at several local primary clinics, a laboratory test for dengue was performed; dengue cases comprised of positive test results, with cases of other febrile illnesses (OFI) providing a natural control group for comparison. This design thus represents an innovative form of the original test-negative design (TND), a modified case-control design traditionally used for vaccine evaluation (Jackson and Nelson (2013), explained in Section 1.2 below). We refer to this design used by the AWED study as a cluster-randomized test-negative design (CR-TND), following the terminology in Anders et al. (2018a).

Distinct from classical cluster-randomized trials, the passive patient recruitment procedure in the AWED study is impacted by healthcare-seeking behavior of potential patients. That is, observed case counts only reflect the population of active healthcare seekers, instead of the entire population of cluster residents. This is central to the concept of TNDs that are intended to reduce confounding associated with healthcare-seeking behavior (Sullivan, Tchetgen Tchetgen and Cowling (2016)). However, since intervention assignment was not blinded in AWED, healthcare-seeking behavior may be differentially affected by knowledge of intervention assignment. For example, dengue patients in treated clusters might be less concerned with fever symptoms, as compared to control cluster inhabitants, leading to fewer clinical visits and subsequently reduced dengue counts in the treated clusters. In this article we explain how CR-TND addresses this issue, describe the statistical assumptions and properties of CR-TND, discuss and extend currently used methods for CR-TND to handle: (i) cluster variation in healthcare-seeking behaviors, (ii) partial compliance, and (iii) a stepped-wedge design.

1.2. Cluster-randomized studies with a TND.

In cluster-randomized trials where data are passively collected, a CR-TND borrows methods from a standard TND to yield valid inferences. A TND is a modification of a case-control study popularly used for studying vaccine effectiveness in observational studies, where observed data are counts of test-positives (cases) and test-negatives (controls) among vaccinated and unvaccinated healthcare seekers. It most closely resembles a case-cohort design in concept, except that data are passively collected instead of randomly sampled. Under the assumption that vaccination has no effect on the test-negative conditions, the relative risk among healthcare seekers can be estimated by the odds ratio based on these observed counts; estimation of the relative risk for the entire population, however, requires an external validity assumption (Jackson and Nelson (2013), Haber et al. (2015)). Sullivan, Tchetgen Tchetgen and Cowling (2016) considered the TND from a causal perspective and showed that a TND can reduce, but not necessarily eliminate, bias from healthcare-seeking behavior in observational studies.

A CR-TND is similar to a TND in assumptions and data structure. The similarity between CR-TND and TND provides a CR-TND with the statistical foundation for estimating the relative risk and, as a result, makes a CR-TND a cost-effective and simple design for studying rare outcomes compared to a prospective design (Anders et al. (2018a)). The major difference between a CR-TND and a TND is the exposure: in a CR-TND, intervention is randomly assigned at a community (cluster) level, while the exposure in a TND can vary across individuals and be potentially confounded by covariates. Such a difference suggests that, in a CR-TND, the healthcare-seeking behavior exposure is no longer confounded (due to random assignment). However, as for all TNDs, intervention is unblinded in the AWED study, and patients may potentially change their healthcare-seeking behavior, given knowledge of their intervention assignment.

Since Anders et al. (2018a) first introduced the CR-TND, Jewell et al. (2019) proposed a series of cluster-level estimators for the relative risk, basing inference on permutation distributions and approximations thereof. Dufault and Jewell (2020a) later addressed the impact of differential healthcare-seeking behavior caused by unblinded intervention assignment by considering a case-only analysis strategy. A further challenge of differential healthcare-seeking behavior occurs when the latter varies across clusters, which is not considered in Dufault and Jewell (2020a). Furthermore, it is valuable to account for noncompliance or partial compliance with the assigned intervention. All these issues are illustrated in the AWED study and can potentially occur in future studies involving a CR-TND.

In this article we address the above open questions. We extend the randomization inference framework for CR-TNDs, thereby avoiding distributional assumptions on the outcomes, with validity even for a small number of clusters. By statistically formalizing the assumptions for CR-TND, we identify and quantify the bias of existing methods (including the odds ratio estimator, test-positive fraction estimator, and model-based estimators) when differential healthcare-seeking behavior between intervention and control arms varies across clusters. We propose an unbiased estimator for the intervention/treatment relative risk, called the log-contrast estimator, and further incorporate covariate adjustment to improve its precision. In addition, partial compliance is accommodated through an instrumental variable method.

Recognizing the increasing use of a stepped-wedge design for intervention assignment in cluster-randomized trials (Hussey and Hughes (2007), Li et al. (2021)), we also extend CR-TNDs to accommodate staggered intervention assignment procedures. For a stepped-wedge CR-TND we characterize the underlying structure, assumptions, and unbiased estimation methods following a similar strategy for discussing parallel-arm CR-TNDs. Our results build on key results from Ji et al. (2017), Roth and Sant’Anna (2021), which propose randomization inference methods for analyzing a traditional stepped-wedge design. We extend their randomization inference framework and, to the best of our knowledge, provide the first statistical characterization and analysis approach for stepped-wedge CR-TNDs.

The remainder of this article is organized as follows. In the next section we describe our randomization inference framework and present the assumptions for estimation based on a CR-TND. In Section 3 we briefly review existing estimators for the relative risk. In Section 4 we propose methods for unbiased estimation, covariate adjustment, and handling partial interference. In Section 5 we extend the setup and results for parallel-arm CR-TNDs to stepped-wedge CR-TNDs. In Section 6 we support our theoretical results with simulation studies. In Section 7 we reanalyze AWED study data providing a comparison of estimators. In Section 8 we summarize our results and discuss future directions.

2. Definition and assumptions.

2.1. Notation.

Consider a cluster-randomized clinical trial that contains m clusters. Each cluster i, i{1,,m}, contains ni subjects defining the population of interest. For subject j=1,,ni in cluster i, let Yij be a binary test-positive outcome, that is, the outcome of interest, and Zij be the binary test-negative outcome, which is used to assist the inference for the outcome of interest. In the AWED study, Yij is an indicator of current dengue infection, and Zij is an indicator of other febrile illnesses (OFI). The value of (Yij, Zij) is determined by a single laboratory test, where a test-positive result indicates (Yij,Zij)=(1,0) and a test-negative result indicates (Yij,Zij)=(0,1). If a subject has no disease, then (Yij,Zij)=(0,0); however, a subject cannot have Yij=Zij=1, for example, OFI means febrile illnesses other than dengue in the AWED study. We assume that the laboratory test is accurate, that is, the test disease status represents the true disease status with no misclassification. In Section 6 we will use simulation studies to evaluate the impact of imprecise tests on our results. For bias corrections with imperfect tests, see Endo, Funk and Kucharski (2020); we do not pursue this further here other than in simulations.

In the context of TND, subjects may have different healthcare-seeking behavior, which impacts the observed data. For each participant we define Sij as an indicator of whether the participant j in cluster i would seek healthcare if they had symptoms that are related to the outcome of interest and/or OFIs. For example, in the AWED study, Sij=1 means that the participant would go to a clinic if they had an acute fever. We note that Sij is different from a nonmissing indicator for Yij and Zij. If a participant had Sij=1 and Yij=Zij=0, then they would not seek healthcare (since they had no relevant symptoms), and Yij, Zij would be not observed, that is, the data will never show Yij=Zij=0; if a participant had Sij=1 and max{Yij,Zij}=1, then we would observe both Yij and Zij since they were ascertained by the same laboratory test. A summary of all possible values of (Sij, Yij, Zij) is given in Table 1.

Table 1.

Categorization of population based on possible values of (Sij, Yij, Zij)

(Sij, Yij, Zij) Characteristics Observed?
(1, 1, 0) healthcare seekers with test-positive illness Yes
(1, 0, 1) healthcare seekers with test-negative illness Yes
(1, 0, 0) healthcare seekers with no illness No
(0, 1, 0) healthcare nonseekers with test-positive illness No
(0, 0, 1) healthcare nonseekers with test-negative illness No
(0, 0, 0) healthcare nonseekers with no illness No

In a cluster-randomized trial, let Ai be the intervention indicator for cluster i with Ai=1 if assigned to the intervention group and 0 otherwise. We use the Neyman–Rubin potential outcome framework, which makes the consistency assumption

Yij=Yij(Ai)=AiYij(1)+(1Ai)Yij(0),Zij=Zij(Ai)=AiZij(1)+(1Ai)Zij(0),

where Yij(a) and Zij(a) are potential outcomes if the cluster i were assigned intervention (a=1) or control (a=0). In addition, the potential outcomes are defined by the cluster-level hypothetical intervention; that is, all subjects in a cluster take the same intervention, avoiding the issue of handling intracluster interference.

Similarly to the consistency assumption above, we assume

Sij=Sij(Ai)=AiSij(1)+(1Ai)Sij(0),

where Sij(a) encodes the healthcare-seeking behavior of participant j if cluster i were assigned intervention (a=1) or control (a=0). Our setup for healthcare-seeking behavior is different from the literature on TNDs for observational studies (Sullivan, Tchetgen Tchetgen and Cowling (2016), Westreich and Hudgens (2016), Chua et al. (2020)), which regarded Sij as a preintervention confounder. For a cluster-randomized trial, we treat healthcare-seeking behavior as a post-randomization variable and allow the intervention to change such behavior, that is, Sij(1)Sij(0). This could happen when intervention is unblinded, and knowledge of intervention allocation may change healthcare-seeking behavior.

As in Jewell et al. (2019), we adopt the randomization inference framework and assume that all counterfactuals, {Yij(a),Zij(a),Sij(a):a=0,1,i=1,,m,j=1,ni}, are fixed numbers instead of random variables. We assume m1=i=1mAi is fixed and

P{(A1,,Am)=(a1,,am)}=1(mm1) (1)

for all (a1,,am) that satisfy i=1mai=m1 and ai{0,1} for i=1,,m. Then, the only randomness in (Yij, Zij, Sij) comes from intervention allocation.

In a CR-TND the observed data are defined as

(Ai,OiY,OiZ),i=1,,m,

where, for each cluster i, OiY=j=1niSijYij is the observed test-positive counts and OiZ=j=1niSijZij is the observed test-negative counts. We, however, do not observe the total case counts for either outcome (i.e., j=1niYij and j=1niZij) or the number of healthcare seekers (i.e., j=1niSij).

In the AWED study, OiY is the number of dengue cases among healthcare seekers, and OiZ is the number of OFI cases among healthcare seekers. Due to the mechanism of the passive surveillance system, no information is recorded for those people with dengue or OFI who choose not to seek healthcare. Furthermore, the number of healthcare seekers is not known either, since we do not observe potential healthcare seekers who have no disease.

The parameter of interest is the relative risk on the outcome of interest comparing intervention vs. control, defined as

λ=j=1niYij(1)j=1niYij(0),

where we assume that i=1niYij(0)>0 and the relative risk λ is constant across clusters. We discuss in Section 4.1 how to test and relax the latter assumption by adapting existing methods. Our goal is to make inference on λ based on the observed data.

2.2. Assumptions for CR-TNDs.

We make the following assumptions for estimating λ, which we refer to as the assumptions for a CR-TND. These assumptions are also seen in Anders et al. (2018a), Dufault and Jewell (2020a), Haber et al. (2015), where they are described under the super-population framework (i.e., assuming the data from each cluster i are independent and identically distributed samples from a common distribution). We modify these assumptions to fit into the randomization inference framework as follows, which places no additional assumption on the counterfactual distributions.

Assumption 1 (CR-TNDs). For each i=1,,m:

(i) Intervention has no effect on the test-negative outcomes: j=1niZij(1)=j=1niZij(0).

(ii) The relative proportion of healthcare seekers across intervention and control arms does not differ between test-positives and test-negatives,

pi(1,Y)pi(0,Y)=pi(1,Z)pi(0,Z),

where, for a=0, 1, pi(a,Y)=j=1niSij(a)Yij(a)j=1niYij(a) and pi(a,Z)=j=1niSij(a)Zij(a)j=1niZij(a) are the proportion of healthcare seekers among subjects with Yij(a)=1 and Zij(a)=1, respectively.

Assumption 1(i) indicates that Zij is a negative-control outcome, and this assumption is commonly made for test-negative designs. For Assumption 1(ii) we assume that if the intervention has an impact on healthcare-seeking behaviors, then such an impact must be the same in test-negative and test-positive populations. We denote αi=pi(1,Y)pi(0,Y)=pi(1,Z)pi(0,Y), which represents the relative change of ascertainment probability, that is, the probability of being tested, comparing intervention and control arms in the test-positive population (those with Yij(a)=1 or the test-negative population (those with Zij(a)=1).

Assumption 1 characterizes the requirements for a CR-TND. Assumption 1(i) requires that the intervention is not associated with test-negative illnesses. In the AWED study this assumption is plausible since Wolbachia-infected mosquito deployments have no effect on controlling OFI (so long as the latter does not include other flaviviruses, such as Zika or chikungunya). For studies of influenza vaccine effectiveness, this assumption, however, might be debatable, as argued by Haber et al. (2015). Assumption 1(ii) suggests that an unblinded intervention assignment should have the same impact on healthcare-seeking behavior among the test-positive population and test-negative population. This assumption is usually satisfied if the test-negative disease has similar symptoms, compared to the disease of interest, so that a patient cannot self-diagnose based on symptoms. For example, patients with dengue or OFI have common symptoms, including fever and rash, and hence, the true disease status (dengue or OFI) is “blinded” to patients before going to clinics. If the knowledge of intervention allocation changes the healthcare-seeking behavior of a patient, then such changes are not related to the true disease status due to a lack of knowledge of test status. This assumption is likely to hold in the AWED study, as participants only know their test status days after seeking care. On the contrary, when the reasons for seeking healthcare differentiate between test-positives and test-negatives, then Assumption 1(ii) may not hold since the knowledge of intervention allocation may have different impacts on healthcare-seeking behavior.

An alternative set of assumptions made for test-negative designs were provided by Jackson and Nelson (2013) and Jewell et al. (2019), which are similar but differ from Assumption 1. We describe these assumptions as Assumption 1′ below.

Assumption 1′. For each i=1,,m:

(i′) Among healthcare seekers, the incidence of test-negative outcomes does not differ between intervention and control,

j=1niSij(1)Zij(1)j=1niSij(1)=j=1niSij(0)Zij(0)j=1niSij(0).

(ii′) The intervention effect among healthcare seekers is generalizable to the whole population,

λ=j=1niYij(1)j=1niYij(0)=j=1niSij(1)Yij(1)j=1niSij(1)j=1niSij(0)Yij(0)j=1niSij(0).

Compared to Assumption 1(i), Assumption 1′(i′) is similar but made on the population of healthcare seekers, instead of the entire population. Assumption 1′(ii′) assumes external validity, which may be debatable, in practice (Sullivan, Tchetgen Tchetgen and Cowling (2016), Westreich and Hudgens (2016)). Similar to Assumption 1(ii), Assumption 1′(i′) implicitly requires that the population of test-positives and test-negatives have similar reasons for seeking healthcare. This emphasizes the importance of the test-negative conditions exhibiting similar symptoms to the test-positive condition. For example, consider a cluster-level intervention that has no effect on dengue with Sij(a)=1 if one seeks care for either dengue or breaks a leg (two dissimilar conditions), and Zij(a)=0 for a dengue case and Zij(a)=1 for a broken leg. For participants with broken legs, an (unblinded) dengue intervention would not change their healthcare-seeking behavior (i.e., Sij(1)=Sij(0) or their test results (i.e, Zij(1)=Zij(0), yielding j=1niSij(1)Zij(1)=j=1niSij(0)Zij(0); for participants with incident dengue, however, the knowledge of intervention status may change their healthcare-seeking behavior, resulting in j=1niSij(1)j=1niSij(0) leading to a violation Assumption 1′(i′).

Despite the differences between Assumption 1 and Assumption 1′, the methods proposed in Section 4 work under both sets of assumptions, allowing flexibility in trial planning. Assumption 1 and Assumption 1′ both imply that, for each i=1,,m,

OiY(1)=λciOiY(0),OiZ(1)=ciOiZ(0), (2)

where, for a=0, 1, OiY(a)=j=1niSij(a)Yij(a), and OiZ(a)=j=1niSij(a)Zij(a) are the potential observed data, given intervention a, and ci is a quantity representing the relative ascertainment, that is, differential healthcare-seeking behavior between intervention and control. Under Assumption 1, ci=αi is the relative ascertainment in the population of test-positives and test-negatives; given Assumption 1′, ci=j=1niSij(1)j=1niSij(0) is the relative ascertainment in the whole cluster population, including test-positives, test-negatives, and people with no outcome-related symptoms. For conciseness we refer to ci as the “relative ascertainment” for cluster i throughout. Our results for unbiased estimation rely on equation (2).

3. Review of existing estimators for the relative risk.

3.1. The odds ratio estimator.

The odds ratio estimator by Jackson and Nelson (2013) for λ is defined as

λ^=i=1mAiOiYi=1m(1Ai)OiYi=1m(1Ai)OiZi=1mAiOiZ. (3)

Assuming Assumption 1′, m=2m1, and cic for i=1,,m, Jewell et al. (2019) showed that E[log(λ^)]log(λ) and derived an approximate variance formula for log(λ^), both under the permutation, or randomization, distribution. When the number of clusters is large, one can perform hypothesis testing and construct confidence intervals for λ, based on a Normal distribution, for which the validity is guaranteed by the Central Limit Theorem and Delta method. Of course, the null hypothesis of no intervention effect can be examined directly from the permutation distribution without approximation.

An assumption for the above procedure is that the relative ascertainment is the same across clusters, that is, cic. Without this assumption, log(λ^) can be biased for log(λ), where the bias, given equation (2), is

E[log(λ^)]log(λ)=E[log{i=1mciAiOiY(0)i=1m(1Ai)OiY(0)i=1m(1Ai)OiZ(0)i=1mciAiOiZ(0)}].

This bias depends on the relationship between ci, OiY(0), OiZ(0) and can be large if ci is strongly correlated with OiY(0)OiZ(0), for example, when relative ascertainment of participants tends to be greater among clusters with a higher (or lower) frequency of test-positives than test-negatives under no intervention. In Section 6 we use an AWED-based simulation to demonstrate that such bias can be potentially large and result in an inflated type I error.

The assumption, cic, can be relaxed if we alternatively assume that the data from each cluster are identically distributed and perform asymptotic results, typically used in a super-population framework. This alternative assumption is debatable, however, when the number of clusters is limited or when the population of interest is limited entirely to the observed clusters. Under the randomization inference framework, we aim here to estimate λ without assuming cic.

3.2. The test-positive fraction estimator.

Jewell et al. (2019) proposed a test-positive fraction estimator for λ, based on the following statistic:

T=1m1i=1mAiOiYOiY+OiZ1mm1i=1m(1Ai)OiYOiY+OiZ.

Conditioning on the observed test-positive and test-negative counts, that is, O+Y=i=1mOiY and O+Z=i=1mOiZ, and assuming m1=m2, the conditional expectation of T, that is, E[TO+Y,O+Z], is approximated by E~T, defined as

E~T=2r(λ21){(2+r)λ+r}{rλ+2+r}, (4)

where r=O+ZO+Y; in addition, an approximate (permutation) variance estimator for T was also derived. A point estimator of λ is obtained by solving equation (4) with E~T substituted by the observed T. Conditioning on O+Y and O+Z, the test-positive fraction estimator can be used for hypothesis testing (using the permutation distribution) and constructing confidence intervals, while the variance for the estimator is less conveniently obtained.

It is reasonable to assume that O+Y and O+Z are fixed quantities if clusters are similar such that a different intervention allocation has no impact on the observed test-positive or test-negative counts. In general, however, intervention allocation will cause variation in observed test-positive and test-negative counts. Specifically, equation (2) implies that O+Y=i=1mAiOiY(1)+i=1m(1Ai)OiY(0)=i=1m(λciAi+1Ai)OiY(0), which is a fixed quantity only if (λci1)OiY(0) is a constant across i. As a result, the approximate of E[TO+Y,O+Z] in equation (4) is also random since it involves r, the ratio of observed test-negatives and test-positives. In our setting we can derive

E[TE~T]=1mi=1m(λ1)Ui(λ+Ui)(1+Ui)E[2r(λ21){(2+r)λ+r}{rλ+2+r}], (5)

where Ui=OiZ(0)OiY(0), i=1,,m are fixed quantities and the expectation is over the permutation distribution without conditioning on O+Y and O+Z. In the special case that λ=1, equation (5) yields that E[TE~T]=0 and the above inference based on T remains valid; in other cases, E[TT~T] may be nonzero which can lead to bias and inflated type I error. In Section 6 our simulation study shows that the bias can be either large or small depending on λ.

3.3. Model-based estimators.

For analysis of cluster-randomized trials, two commonly-used models are generalized linear mixed models (GLMM, Breslow and Clayton (1993)) and generalized estimating equations (GEE, Liang and Zeger (1986)). In CR-TNDs we consider logistic regression for both models, since the observed outcomes naturally produce binary data with each cluster containing OiY ones and OiZ zeros. For GLMM we include an intercept and main terms for intervention assignment and cluster-level covariates as fixed effects and a cluster-level random intercept; for GEE we consider the same fixed effects and an exchangeable within-cluster covariance structure. Both models estimate log(λ) by the regression coefficient of the intervention assignment fixed effect.

The performance of GLMM and GEE for analyzing cluster-randomized trials has been extensively studied (Murray, Varnell and Blitstein (2004), McNeish and Stapleton (2016), Jewell et al. (2019)). GEE targets a marginal effect, as we consider here, while GLMM estimates a cluster-specific effect, which can be different from the marginal estimand of interest. Both models rely on strong model assumptions that are challenging to verify, such as the generalized linear model and the correlation structure. When these assumptions are violated, the relative effect estimation and its variance estimation might be not valid. In addition, GEE is also known for suffering from inflated type I error when the number of clusters is small. Further, when the relative ascertainment ci varies across clusters in a CR-TND, both methods can be biased, and we demonstrate this directly in the simulation study.

4. Proposed randomization-inference methods for CR-TNDs.

4.1. The unbiased log-contrast estimator.

We define Li=log(OiY)log(OiZ) which is the log-contrast between test-positives and test-negatives among healthcare seekers. Using the notation of OiY(a) and OiZ(a), defined in equation (2), we further define Li(a)=log{OiY(a)}log{OiZ(a)} as the potential log-contrast given intervention a{0,1}. The consistency assumption implies that Li=AiLi(1)+(1Ai)Li(0). In addition, given equation (2), we have Li(1)=log(λ)+Li(0) for i=1,,m, a constant intervention effect model. Note that Li is a simple log-odds transformation of the test-positive fraction of Section 3.2.

The log-contrast estimator is defined as

log(λ)^=1m1i=1mAiLi1mm1i=1m(1Ai)Li, (6)

which is unbiased for log(λ). The variance of log(λ)^ is

Var(log(λ)^)=mm1(mm1)1m1i=1m{Li(0)L(0)¯}2,

where L(0)¯=m1i=1mLi(0), and can be unbiasedly estimated by

Var^(log(λ)^)=1m1σ^12+1mm1σ^02,

where σ^a2 for a=0, 1 is the sample variance of Li with Ai=a. Given the unbiased variance estimator, one can construct the statistic Tlog(λ)^Var^(log(λ)^) and construct the confidence interval for log(λ) and λ based on the Normal distribution. Of course, an exact test of the null hypothesis is available using the permutation distribution directly. Compared to the estimators defined in Section 3, our proposed estimator is able to eliminate bias when the relative ascertainment varies across clusters. In addition, the log-contrast estimator is also able to handle unequal randomization (i.e., m2m1).

The above tests are based on the assumption that the relative risk is constant across clusters, that is, λiλ. This assumption can be tested following the method of Ding, Feller and Miratrix (2016) on ((L1,,Lm)). When we reject the null hypothesis that λiλ, we can still estimate the average intervention effect τ=1mi=1mlog(λi) by the above test-statistic T, while the variance estimator Var^(log(λ)^) overestimates the true variance (Aronow, Green and Lee (2014)), indicating that inference is still valid but can be conservative.

4.2. Covariate adjustment.

Given the constant intervention effect model on Li, we are able to use the covariate-adjusted estimator of Lin (2013) and Li and Ding (2017) to improve precision. For each cluster i, let Xi=(Xi1,,Xip) be a p-dimensional vector of cluster-level baseline variables. We assume that each Xi is a fixed vector.

The covariate-adjusted estimator for log(λ) is defined as

log(λ)^β=log(λ)^βT{1m1i=1mAiXi1mm1i=1m(1Ai)Xi}, (7)

where βRp can be any constant vector. Direct calculation shows that log(λ)^β is unbiased for log(λ).

Given the assumption of a constant intervention effect, the variance of log(λ)^β is minimized at β=V(X)1C(X,L(1))=V(X)1C(X,L(0)), where V(X)=1m1i=1m(XiX¯)(XiX¯)T and C(X,L(a))=1m1i=1m(XiX¯)(Li(a)L(a)¯)T with X¯=1mi=1mXi for a=0, 1. The plug-in estimator β^ for β is a straightforward adaption from Example 9 of Li and Ding (2017), where the only modification is that we use a weighted average to combine β^1 and β^0 since they estimate the same quantity in our constant intervention effect model. Asymptotically, β^ is equivalent to β in terms of efficiency (Li and Ding (2017), Roth and Sant’Anna (2021)).

Given the assumption of a constant intervention effect, the variance of log(λ)^β is minimized at β=V(X)1C(X,L(1))=V(X)1C(X,L(0)), where V(X)=1m1i=1m(XiX¯)(XiX¯)T and C(X,L(a))=1m1i=1m(XiX¯)(Li(0)L(a)¯)T with X¯=1mi=1mXi for a=0, 1. According to Roth and Sant’Anna (2021), Var(log(λ)^)=Var(log(λ)^β)+mm1(mm1)βTV(X)β, quantifying the variance reduction through covariate adjustment. In practice, since the values of Li(0) are not fully observed, β is unknown. However, one can construct an estimator for β by β^=m1mβ^1+mm1mβ^0, where β^1 and β^0 are the sample least-square coefficients of Li on Xi for the intervention group and control group, respectively. The plug-in estimator β^ for β is a straightforward adaption from Example 9 of Li and Ding (2017), where the only modification is that we use a weighted average to combine β^1 and β^0 since they estimate the same quantity in our constant intervention effect model. Asymptotically, β^ is equivalent to β in terms of efficiency (Li and Ding (2017), Roth and Sant’Anna (2021)). The variance of log(λ)^β^ can be estimated by

1m1σ^1,β^2+1mm1σ^0,β^2,

where σ^1,β^2 and σ^0,β^2 are the unbiased residual variance estimator for the intervention group and control group after regressing Li on covariates Xi, respectively.

4.3. Dose-response relationship under partial compliance.

In cluster-randomized clinical trials, noncompliance or partial compliance may occur after intervention is assigned. For example, in the AWED study, a Wolbachia exposure index (WEI) was used to measure an individual-level compliance status, defined as a weighted score accounting for the mobility of participants immediately prior to symptom onset and the percentage of Wolbachia-infected mosquitoes (Utarini et al. (2021)) in all city locations, as measured through mosquito trapping. The variable, WEI, is a continuous variable taking values in [0, 1] with a larger value indicating more exposure to the intervention (e.g., 1 means no mobility and 100% Wolbachia-infected mosquitoes around the place of residence). The cluster-level WEI is then defined as the average of WEI among enrolled patients within a cluster, representing the cluster-level compliance status. Averaged over the follow-up period of the AWED study, the cluster-level WEI ranges from 0.66 to 0.75 in intervention clusters and 0.22 to 0.44 in control clusters. We now focus on quantifying the intervention effect based on the actual intervention “received.”

We consider an instrumental variable method with a linear model to capture the dose-response relationship for a general CR-TND. For each cluster i, define Di[0,1] as the cluster-level measure of intervention received and Li(d,a) as the potential outcome of Li, the log-contrast, if cluster i received intervention a, a{0,1} and complied with the intervention as measured by d, d[0,1]. Here, Li(d,a) extends Li(a), defined in Section 4.1, by accounting for compliance: if d=a, then the cluster i receives a perfect exposure to the intervention, or control, respectively, as a=1 or a=0, and Li(a,a)=Li(a); if d(0,1), then the cluster i only has partial intervention compliance. We make the following assumptions to allow identification of the dose-response relationship.

Assumption 2 (Linear dose-response relationship). For i=1,,m:

(i) Consistency: Li=Li(Di,Ai).

(ii) Exclusion restriction: Li(d,1)=Li(d,0) for d[0,1].

(iii) The linear model: Li(d,a)=Li(0,a)+γd for γR and a{0,1}.

Assumption 2(i) connects the observed data and the counterfactual outcome by letting d=Di and a=Ai. Assumption 2(ii) indicates that the intervention assignment has no direct effect on the outcomes, given the actual intervention score d. Both Assumptions 2(i) and (ii) are standard assumptions for instrumental variable methods (Angrist, Imbens and Rubin (1996)). Assumption 2(iii) specifies a linear model, indicating that the intervention effect is proportional to “dose” received.

We base our approach on randomization inference with instrumental variables (Rosenbaum (2002), Section 5.4) and its extension for group randomization (Small, Ten Have and Rosenbaum (2008)), generalizing from a binary intervention setting to a continuous intervention setting. In the special case of perfect compliance, that is, Di=Ai for all i, Assumption 2 is compatible with Assumption 1 or 1′ with γ=logλ.

Given the null hypothesis H0:γ=γ0, we can compute the covariate-adjusted test statistic and estimate its variance similarly to the covariate-adjusted estimator in Section 4.2. Specifically, let

Tγ0=1v^{1m1i=1mAi(Liγ0Diβ^TXi)1mm1i=1m(1Ai)(Liγ0Diβ^TXi)},

where β^ is the regression coefficient vector of Liγ0Di on Xi, and v^ is the corresponding unbiased residual variance estimator multiplied by mm1(mm1). Under the null, Tγ0 follows a standard Normal distribution asymptotically. Then, hypothesis testing can be performed for H0, based on a Normal approximation or by an exact permutation test, and a confidence interval can be obtained by inverting such tests (Rosenbaum (2002), Section 2.6). The point estimate for γ can be obtained by a Hodges–Lehmann estimator; see Hodges and Lehmann (1963) or Section 2.7.2 of Rosenbaum (2002).

5. Extensions to the stepped-wedge design.

5.1. Setup.

In a stepped-wedge design an intervention is sequentially assigned to clusters such that all clusters start with no intervention, and, at the end, all receive the intervention. Let {1,,T} be the time window of intervention allocation. For each t{1,,T}, qt clusters start the intervention such that t=1Tqt=m. Let Ai be the time that cluster i is assigned to the intervention group and Ω be the set of all possible values of (A1,,Am). Then, Ω contains (mq1,,qt) entries. The randomization scheme implies that P{(A1,,Am)=(t1,,tm)}=1(mq1,,qt) for any (t1,,tm)Ω.

Similar to the definition of Yij(a), Zij(a), Sij(a) in Section 2.1, we analogously define the counterfactuals Yijt(a), Zijt(a), Sijt(a), which represent the potential test-positive outcome, test-negative outcome, and healthcare-seeking behavior for participant j in cluster i at time t, respectively, if the cluster i has intervention status a at time t, t{1,,T} and a{0,1}. Here, we make the simplifying assumption that the intervention effect is immediate and not altered by the duration of the intervention. For estimating intervention effects that change over time or over the duration of the intervention, we return to further discussion in Section 8. We again make the following consistency assumption:

Yijt=I{Ait}Yijt(1)+I{Ai>t}Yijt(0),Zijt=I{Ait}Zijt(1)+I{Ai>t}Zijt(0),Sijt=I{Ait}Sijt(1)+I{Ai>t}Sijt(0).

Then, the observed data are (Ai, j=1nitSijtYijt, j=1nitSijtZijt) for i=1,,m and t=1,,T, where nit is the number of subjects in cluster i at time t.

Our goal is to estimate the relative risk comparing intervention to control, defined as

λ=j=1niYijt(1)j=1niYijt(0)

for each i=1,,m and t=1,,T, where we assume that j=1niYijt(0)>0 and the relative risk is constant across clusters and time.

We make the following assumption, an extension of Assumption 1 to stepped-wedge CR-TNDs (SW-TND).

Assumption 3 (SW-TND). For each i=1,,m and t=1,,T:

(i) The intervention has no effect on the test-negative outcomes: j=1niZijt(1)=j=1niZijt(0).

(ii) At any time, the relative proportion of healthcare seekers comparing intervention and control does not differ between test-positives and test-negatives,

pit(1,Y)pit(0,Y)=pit(1,Z)pit(0,Z)=cit,

where for a=0, 1, pit(a,Y)=j=1niSijt(a)Yijt(a)j=1niYijt(a) and pit(a,Z)=j=1niSijt(a)Zijt(a)j=1niZijt(a) are the proportion of healthcare seekers among subjects with Yijt(a)=1 and Zijt(a)=1, respectively.

For the stepped-wedge design, Assumption 3 essentially assumes Assumption 1 at each time point t. If the time period T is relatively short, a test-negative design that satisfies Assumption 1 would be likely to imply Assumption 3 also. Note that we do not make any assumption on the temporal trend of the test-positive or test-negative diseases which can vary across clusters. Similar to the parallel-arm CR-TND, we use cit to denote the relative ascertainment for cluster i at time t.

5.2. Estimation.

We define Lit=log{j=1niSijtYijt}log{j=1niSijtZijt} and, for a=0, 1, Lit(a)=log{j=1niSijt(a)Yijt(a)}log{j=1niSijt(a)Zijt(a)}. With this setup and Assumption 3, it follows that

Lit=I{Ait}Lit(1)+I{Ai>t}Lit(0),Lit(1)=log(λ)+Lit(0).

We construct the following estimator for log(λ), referred to as the stepped-wedge log-contrast estimator:

log(λ)^(SW)=t=1T1wt{1mti=1mI{Ait}Lit1mmti=1mI{Ai>t}Lit}, (8)

where mt=t=1tqt is the number of clusters in the intervention group at time t and wt are arbitrary prespecified weights with t=1T1wt=1. The proposed estimator log(λ)^(SW) is a weighted average of the difference-in-means estimators across t. Since each of the difference-in-means estimator is unbiased for log(λ), then the estimator log(λ)^(SW) is also unbiased for log(λ).

The variance of log(λ)^(SW) is wTΣw, where

w=(w1,,wT1)T,Σ=(mm1(mm1)S1,1mmT1(mm1)ST1,1mmT1(mm1)ST1,1mmT1(mmT1)ST1,T1),St1,t2=1m1i=1m{Lit1(0)Lt1(0)¯}{Lit2(0)Lt2(0)¯},Lt(0)¯=1mi=1mLit(0).

Given w, the variance can be unbiasedly estimated by substituting Σ^ for Σ, where Σ^ is an unbiased estimator of Σ with the (t1,t2) entry (t1t2) being mmt21(mmt1)S^t1,t2, where

S^t1,t2={c^(1,1)(t1,t2)ifmt1max{mt2mt1,mmt2},c^(0,1)(t1,t2)ifmt2mt1max{mt1,mmt2},c^(0,0)(t1,t2)ifmmt2max{mt1,mt2mt1},}

where c^(1,1)(t1,t2), c^(0,1)(t1,t2), and c^(0,0)(t1,t2) are the sample covariance between Lit1 and Lit2 among clusters with Ait1, t1<Ait2, and Ai>t2, respectively.

For the weights w, a convenient option is to set wt=(T1)1. To optimize the efficiency of the proposed estimator, we can minimize the variance by setting

w=Σ11T11T1TΣ11T1, (9)

where 1T1 is a (T1)-dimensional column vector with all entries 1. Given the null hypothesis H0:λ=λ0, w can be exactly computed; in other cases, it can be approximated by plugging in Σ^. When m is small and T is large, however, such a plug-in estimator of w can be less stable than a prespecified w, since the inverse of Σ^ can be highly variable and potentially not invertible.

Given the proposed estimator and variance estimator, statistical inference can then be performed, as described in Section 4.1. Again, it is possible to carry out exact inference using the permutation distribution.

6. Simulation studies.

6.1. Simulation setup.

We consider three simulation studies that are based on historical dengue and OFI data in Yogyakarta. The first simulation study verifies the unbiasedness of the proposed log-contrast estimator for CR-TNDs and compares it to the existing estimators defined in Section 3. The second simulation study demonstrates the validity of the proposed method for partial compliance. The third simulation study focuses on the SW-TND setting, where we evaluate the performance of the stepped-wedge log-contrast estimator. For all simulations we consider λ=1, 0.6, 0.2, which represent a zero, moderate, and large intervention effect, respectively.

The first simulation study is based on reported (serious) dengue cases from 2013–2015, and OFI cases from 2014–2015 for each of the 24 clusters in Yogyakarta. In addition, cluster population size (measured in 10,000s) in the Year 2015 is used as a covariate. All data are available in Table S1 of the Supplementary Material of Utarini et al. (2021). Since no information on relative ascertainment is available, we independently sample ci, i=1,,24 from a Beta distribution with the shape parameter set to 0.5. Of note, relative ascertainment is generated once and applied to all simulated data sets, representing a fixed latent characteristic of the 24 clusters; otherwise, if ci is sampled for each simulated data set, then a distributional assumption, that is, the Beta distribution, is placed on relative ascertainment across data sets, a scenario not of our interest, as discussed in Section 4.1. From the above information, O~iY, O~iZ, Xi denote ascertained dengue cases, OFI cases, and population size, respectively.

The data are simulated through the following steps. First, we randomly generate {OiY(0),,O24Y(0)} from a multinomial distribution with parameters (nY,O~1YnY,,O~24YnY) and generate {O1Z(0),,O24Z(0)} from a multinomial distribution with parameters (nZ,O~1ZnZ,,O~24ZnZ), where nY=i=124O~iY and nZ=i=124O~iZ. We then introduce correlation between the outcome and covariate by multiplying OiY(0) by 2Xi and dividing OiZ(0) by 2Xi. Next, we set OiY(1)=λciOiY(0) and OiZ(1)=ciOiZ(0). Finally, letting (A1,,A24) be the random intervention allocation following the distribution (1) with m1=12, we define OiY=AiOiY(1)+(1Ai)OiY(0) and OiZ=AiOiZ(1)+(1Ai)OiZ(0). The simulated data are {(Ai,OiY,OiZ,Xi)}i=124.

In the second simulation we first define OiY(0,0) and OiZ(0,0) as OiY(0) and OiZ(0), as in the first simulation study, respectively. We then independently generate Di(1), i=1,,24 from a uniform distribution on [0.6, 1] and Di(0), i=1,,24 from a uniform distribution on [0, 0.4] which represent a counterfactual partial compliance. Next, we define OiY=OiY(Di(Ai),Ai)={Aici+(1Ai)}OiY(0,0)eγDi(Ai) and OiZ=OiZ(Di(Ai),Ai)={Aici+(1Ai)}OiZ(0,0), where ci and Ai are generated in the same way, as in the first simulation study. The simulated data are {Ai,OiY,OiZ,Xi}i=124.

For the third simulation study we use observed dengue cases collected for every two consecutive years between 2003 and 2014 for each of the 24 clusters in Yogyakarta. Due to lack of data in the years 2004 and 2009, each cluster has nine data points. The data are available in the Supplementary Material of Jewell et al. (2019), Table 2. We use O~itY to denote the above dengue cases for i=1,,24 and t=1,,9 and define ntY=i=124O~itY. OFI data are only available for the Year 2014–2015, which are the same as those used in first simulation study, and we keep the notation of O~iZ and nZ. To generate the OFI case counts for earlier years, we define O~itZ=O~iZntYn9Y and ntZ=nZntYn9Y. For each t we generate cit and OitY(a), OitZ(a), a=0, 1, following a similar procedure to the first simulation study by substituting (O~itY, O~itZ) for (O~iY, O~iZ), except that the covariate Xi is omitted. Specifically, cit is independently generated for each i and t, representing both temporal and cluster variation of relative ascertainment. Intervention starts at t=2 and, for each t=2,,9, three untreated clusters are randomly selected to start intervention. We then define OitY=I{Ai<t}OitY(1)+I{Ait}OitY(0) and OitZ=I{Ai<t}OitZ(1)+I{Ait}OitZ(0). The simulated data are {(Ai,Oi1Y,,Oi9Y,Oi1Z,,Oi9Z)}i=124. Furthermore, our data generating distribution implies that, within each simulated data set, the temporal correlation of outcomes varies across clusters.

Table 2.

Simulation results under the CR-TND setting. For each of the estimators, we report the bias, standard error of the estimates (SE), average of the standard error estimates (ASE), probability of rejecting the null hypothesis (PoR), and the coverage probability of nominal 0.95 confidence intervals based on Normal approximations (CP). The bias, SE, and ASE are on the log scale

Estimators Bias SE ASE PoR CP
λ=1(logλ=0) Odds ratio 0.14 0.32 0.31 0.09 0.91
Test-positive fraction −0.00 0.26 0.06 0.94
GLMM 0.05 0.27 0.25 0.09 0.91
GEE 0.14 0.26 0.24 0.14 0.86
Log-contrast 0.00 0.31 0.32 0.06 0.94
Covariate-adjusted −0.00 0.27 0.27 0.06 0.94
λ=0.6(logλ=0.51) Odds ratio 0.14 0.32 0.31 0.20 0.89
Test-positive fraction 0.08 0.26 0.37 0.94
GLMM 0.05 0.27 0.25 0.42 0.91
GEE 0.13 0.27 0.24 0.35 0.86
Log-contrast 0.00 0.31 0.32 0.35 0.94
Covariate-adjusted −0.00 0.27 0.27 0.47 0.94
λ=0.2(logλ=1.61) Odds ratio 0.14 0.32 0.35 1.00 0.92
Test-positive fraction 0.34 0.23 1.00 0.88
GLMM 0.07 0.27 0.25 1.00 0.89
GEE 0.11 0.28 0.25 1.00 0.87
Log-contrast 0.00 0.31 0.32 1.00 0.94
Covariate-adjusted −0.00 0.27 0.27 1.00 0.94

We simulate 10,000 data sets and estimate the log relative risk, log(λ), for both simulation studies. In the first simulation we compare the odds ratio estimator, test-positive fraction estimator, GLMM estimator, GEE estimator, and our proposed log-contrast estimator and covariate-adjusted estimator. For the test-positive fraction estimator we follow the method of Jewell et al. (2019) and obtain λ^ by solving equation (4) with plugged-in T and r. In the second simulation we evaluate the performance of the estimating procedure, described in Section 4.3, with or without covariate adjustment. For the third simulation study we compare the stepped-wedge log-contrast estimator with equal weights or optimal weights to the GLMM and GEE estimators. To model temporal correlation, the GLMM includes random effects for the cluster intercept and the cluster-by-time intercept, as suggested by Ji et al. (2017); for GEE we use an exchangeable correlation structure. Due to the limited number of clusters, the optimal weights w are estimated, assuming the true relative risk is known, since otherwise the estimated covariance matrix Σ^ is often not invertible; our simulation study for the optimal weights is hence designed for hypothesis testing.

The comparison metrics are bias, standard error of the estimates, average of the standard error estimates, probability of rejecting the null hypothesis (i.e., λ=1), and the coverage probability of nominal 0.95 confidence intervals based on Normal approximations.

6.2. Simulation results.

Table 2 summarizes the results of the first simulation study. Among all estimators our proposed log-contrast and covariate-adjusted estimators are unbiased and achieve the desired coverage probability, while all other estimators show bias for most values of λ. Due to varying relative ascertainment across clusters, the odds ratio, GLMM, and GEE estimators have bias and inflated type I error that results in 3–6%, 4–8%, and 8–9% under-coverage, respectively. The test-positive fraction estimator is valid if λ=1, while its bias and type I error increase as the true relative risk moves away from the null; such a bias can be as high as 0.34 when there is a strong intervention effect. In terms of variance, since the covariate is designed to be prognostic, the covariate-adjusted estimator has 24% smaller variance than the log-contrast estimator, showing that adjustment for prognostic baseline variables can improve precision. The model-based estimators have similar standard errors, compared to the covariate-adjusted estimator, while their standard error estimates consistently underestimate the true standard error, that is, ASE smaller than SE, which contributes to the inflated type I error of model-based inference. The test-positive fraction estimator has the smallest standard error among all estimators, whereas its validity is hampered by its bias; furthermore, a standard error estimate of the test-positive fraction estimator is more difficult to compute directly and so is omitted here; see Jewell et al. (2019).

To examine the impact of different choices of relative ascertainment, we independently repeat the first simulation study with λ=0.6 for 100 times, each with a distinct configuration of ci’s. For each estimator we present the distribution of bias and coverage probability in Figure 1 which further confirms the findings in Table 2. Despite changes in relative ascertainment, the log-contrast and covariate-adjusted estimators maintain the property of unbiasedness and correct coverage probability. The other estimators, however, have more dispersed and shifted distributions of absolute bias and coverage probability, implying a bias and incorrect coverage probability, whose magnitude varies upon the characteristics of relative ascertainment.

Fig. 1.

Fig. 1.

Distributions of the absolute value of bias (left panel) and the coverage probability of Normal-approximated 95% confidence interval (right panel) for each estimator over 100 replications of the first simulation study. Each replication uses an independently-generated relative ascertainment which causes variation in the bias and coverage probability.

Table 3 shows the simulation results under partial compliance. Across all γ, our proposed methods have negligible bias and correct coverage, as expected. Furthermore, by adjusting for covariates, power is increased, and the average length of the 95% confidence interval is shortened by 34%, demonstrating the benefit of covariate adjustment in improving precision.

Table 3.

Simulation results for discovering dose-response relationship under partial compliance. We consider the estimating procedure described in Section 4.3, with or without covariate adjustment, and report the bias, probability of rejecting the null hypothesis (PoR), and the coverage probability of nominal 0.95 confidence intervals (CP)

γ Covariate adjustment Bias PoR CP
log(1) = 0 No 0.00 0.05 0.95
Yes 0.00 0.04 0.96
log(0.6) = −0.51 No −0.01 0.10 0.95
Yes 0.00 0.13 0.95
log(0.2) = −1.61 No −0.01 0.46 0.95
Yes 0.00 0.81 0.95

Table 4 displays the performance of estimators for the SW-TND. Similar to the first simulation study, GLMM and GEE lead to biased effect estimation, underestimation of the standard error, and under-coverage of the 95% confidence interval. In contrast, the stepped-wedge log-contrast estimators remain unbiased and maintain the 0.05 type I error rate as desired. By using the optimal weights, the variance is reduced by 27%, compared to the equal weights. The equal weights, however, eliminate the uncertainty in the estimation of weights and, hence, have the advantage of being more stable for data with large T and small m over the optimal weights. Comparing model-based inference and randomization-based inference, we observe that the former is more powerful than the latter under a stepped-wedge design, resembling the results by Table 5 of Ji et al. (2017). The power gain of model-based inference comes from additional model assumptions: both GLMM and GEE assume the temporal trend is the same across clusters and directly model temporal correlation, while the randomization-based inference does not make such an assumption. When this assumption is violated, the validity of model-based inference is affected, as reflected by the bias and inflated type I error in this simulation study. In addition, GLMMs may further lead to bias and under-coverage since it targets a conditional effect which differs from the marginal effect we consider here.

Table 4.

Simulation results under the SW-TND setting. For the log-contrast estimator (with equal or optimal weight), GLMM estimator, and GEE estimator, we report its average bias (Bias), standard error estimates (SE), average of the standard error estimators (ESE), probability of rejecting the null hypothesis (PoR), and coverage probability (CP). The bias, SE, and ESE are on the log scale

Estimators Bias SE ESE PoR CP
λ=1(logλ=0) Log-contrast (equal weight) 0.00 0.21 0.20 0.06 0.94
Log-contrast (optimal weight) 0.00 0.18 0.18 0.05 0.95
GLMM 0.01 0.10 0.08 0.10 0.90
GEE 0.06 0.06 0.07 0.11 0.89
λ=0.6(logλ=0.51) Log-contrast (equal weight) 0.00 0.21 0.20 0.70 0.94
Log-contrast (optimal weight) 0.00 0.18 0.18 0.80 0.95
GLMM 0.02 0.10 0.08 1.00 0.89
GEE 0.06 0.06 0.07 1.00 0.87
λ=0.2(logλ=1.61) Log-contrast (equal weight) 0.00 0.21 0.20 1.00 0.94
Log-contrast (optimal weight) 0.00 0.18 0.18 1.00 0.95
GLMM 0.03 0.11 0.10 1.00 0.91
GEE 0.07 0.06 0.08 1.00 0.89

Table 5.

Summary of data analysis. The point estimate and 95% confidence interval are on the original scale, while the standard error is on the log scale, that is, SE(log(λ)^)

Estimator Estimate Standard error 95% confidence interval
Odds ratio 0.23 0.29 (0.13, 0.40)
Test-positive fraction 0.23 (0.07, 0.45)
GLMM 0.24 0.18 (0.17, 0.34)
GEE 0.25 0.17 (0.18, 0.35)
Log-contrast 0.26 0.21 (0.18, 0.40)
Covariate-adjusted 0.25 0.16 (0.18, 0.34)

6.3. Sensitivity to imprecise tests.

To further explore the sensitivity of our methods to violations of assumptions, we set the laboratory test for determining (Yij, Zij) to have 10% false-positive and 10% false-negative rates. We then repeat all three simulation studies, where the only change is that estimation is based on the imprecise case counts, instead of the true case counts. In this setting, letting {QiY(a), QiZ(a)} denote the inaccurate case counts based on the truth {OiY(a), OiZ(a)}, we have

QiY(1)=λciQiY(0)+0.1(1λ)ciOiZ(0),QiZ(1)=ciQiZ(0)0.1(1λ)ciOiY(0),

leading to violations of equation (2), due to the additional terms 0.1(1λ)ciOiZ(0) and 0.1(1λ)ciOiY(0).

The simulation results are given in Appendix 8. Across all simulation studies and estimators, imprecise tests will lead to positive bias, that is, shrinking the estimated intervention effect to the null. The bias induced by test misclassification is comparable across estimators and increases as the intervention effect gets larger. These findings are consistent with existing results on imprecise tests (Orenstein et al. (2007)). When there is no treatment effect, that is, λ=1, our proposed methods remain valid. When λ is small, the large bias can lead to poor coverage, especially for an SW-TND.

7. Application to the AWED trial.

We reanalyze the AWED study using the estimators defined in Sections 3 and 4. For each cluster the dengue and OFI cases are aggregated across the follow-up period, respectively. For all estimators a Normal approximation is used to construct confidence intervals. For GLMM, GEE, and covariate-adjusted estimators, the population size and the population proportion of children (age < 15) are used as cluster-level baseline variables. The AWED study used covariate constrained randomization to achieve covariate balance and improve precision; this constrained randomization provided the basis for hypothesis testing but was not used in confidence interval calculations. For simplicity and illustration, we also assumed complete randomization in our data application for the purpose of demonstration which would not affect point estimates but might lead to a slight overestimate of the true study standard error (Li and Ding (2020)).

Table 5 gives the results of an intention-to-treat analysis. The point estimates of the six methods have very slight differences, the combined effect of small-sample random variation and potential bias for the odds ratio, test-positive fraction, GLMM, and GEE estimators; their similarity further confirms the results in Utarini et al. (2021). Among all estimators the covariate-adjusted estimator has the highest precision, while the test-positive fraction estimator is the least precise, as reflected by its wider confidence interval and obtained by inverting tests. Comparing the covariate-adjusted and log-contrast estimators, we see that adjusting for prognostic baseline variables can lead to substantial precision gains. The GLMM and GEE estimates also adjusted for baseline variables, while their standard error estimates tend to be biased as shown in simulations.

When considering partial compliance, we adopt the linear dose-response model and use the WEI score, defined in Section 4.3, as the actual intervention received. The rate parameter γ is estimated at −3.42 (location of maximized p-value) with 95% confidence interval (−4.56, −2.34), implying an improved intervention effect as “dose” increases. In the observed dose range (0.22, 0.75), given in Section 4.3, pγ100 can be interpreted as the change of logarithm of relative risk per p% increase of the WEI score, assuming the linear model is correctly specified.

8. Discussion.

Since its introduction, the CR-TND has attracted increasing attention as a cost-efficient and convenient design for cluster-randomized designs to assess the effectiveness of interventions. Building on the fundamental work by Anders et al. (2018a), Jewell et al. (2019), we reexamined the current assumptions and methods for a CR-TND and presented a new approach that allows for cluster variation in relative participant recruitment. Our proposed estimator, the log-contrast estimator, eliminates the bias that may occur in existing methods and can improve precision by adjusting for cluster-level covariates. Furthermore, we extend our results to handle partial compliance and a stepped-wedge design.

Our proposed approaches are based on cluster-level information, that is, case counts, cluster-level covariates, and compliance data at the cluster level. When available, individual-level data can be summarized into cluster-level data and then analyzed by our proposed methods. Alternatively, one can explore individual-level analyses, using GLMM and GEE, for individual covariate adjustment (described in Section 3.3) and individual-level instrumental variable methods (Small, Ten Have and Rosenbaum (2008), Clarke and Windmeijer (2012)). The validity of GLMM and GEE, however, relies on strong model assumptions; furthermore, Su and Ding (2021) and Wang et al. (2021) both showed that an individual-level analysis can be less precise than a cluster-level analysis in cluster-randomized trials. Individual-level instrumental variable methods remains a topic for future research including extension to CR-TNDs.

Our simulation studies showed that imprecise tests could lead to bias in a CR-TND among all estimators we considered. Such bias could be alleviated by using a consistent diagnostic algorithm and blinded testing procedures (Anders et al. (2018b)) or using methods adapted from Endo, Funk and Kucharski (2020). In addition to imprecise tests, bias could also arise when Assumption 1 or 1′ is violated. In this case, if one could mathematically quantify the magnitude of the violation, then the bias of our methods might be analytically derived and then eliminated. For example, if prior studies showed that the intervention has an effect η on test-negative illness, that is, j=1niZij(1)=ηj=1niZij(0), then the log-contrast estimator plus log(η) would be unbiased, given Assumption 1(ii) holds. On the contrary, if violations of assumptions are hard to track, then a valid statistical method could be challenging to derive, and researchers might thus need to reconsider the CR-TND design, for example, looking for an alternative test-negative illness that satisfies the CR-TND assumptions.

When dealing with partial compliance, we used a parsimonious linear model for the dose-response relationship since the number of clusters is small and the doses have limited coverage over the [0, 1] interval. For discovering a more complex dose-response relationship, a kink or spline model could be used, while they would, in general, need substantially more clusters or individual-level compliance data.

For the stepped-wedge design, when the intervention effect varies across clusters by intervention start time and by the duration of intervention, we can borrow methodology from Roth and Sant’Anna (2021) that established theory for estimating a series of estimands in a traditional stepped-wedge design; in addition, their covariate adjustment techniques and finite-sample asymptotic theory can also be adapted for test-negative designs.

The R code for simulations and data analyses is available in the Supplementary Material (Wang et al. (2023)).

Supplementary Material

zipfile of R code

APPENDIX: SIMULATION RESULTS FOR IMPRECISE TESTS

The simulation results for imprecise tests are summarized in Tables 6, 7, and 8 for CR-TNDs, partial compliance, and SW-TNDs, respectively.

Table 6.

Simulation results under the CR-TND setting with imprecise laboratory tests (10% false positive and negative rates). For each of the estimators, we report the bias, standard error of the estimates (SE), average of the standard error estimates (ASE), probability of rejecting the null hypothesis (PoR), and the coverage probability of nominal 0.95 confidence intervals based on Normal approximations (CP). The bias, SE, and ASE are on the log scale

Estimators Bias SE ASE PoR CP
λ=1(logλ=0) Odds ratio 0.14 0.32 0.31 0.09 0.91
Test-positive fraction −0.00 0.26 0.06 0.94
GLMM 0.05 0.27 0.25 0.09 0.91
GEE 0.14 0.26 0.24 0.14 0.86
Log-contrast 0.00 0.31 0.32 0.06 0.94
Covariate-adjusted −0.00 0.27 0.27 0.06 0.94
λ=0.6(logλ=0.51) Odds ratio 0.23 0.37 0.37 0.12 0.88
Test-positive fraction 0.20 0.27 0.21 0.90
GLMM 0.21 0.20 0.19 0.35 0.78
GEE 0.22 0.20 0.18 0.36 0.74
Log-contrast 0.17 0.30 0.31 0.21 0.91
Covariate-adjusted 0.16 0.18 0.18 0.46 0.85
λ=0.2(logλ=1.61) Odds ratio 0.59 0.32 0.38 0.81 0.67
Test-positive fraction 0.71 0.22 0.97 0.31
GLMM 0.64 0.17 0.19 1.00 0.07
GEE 0.62 0.18 0.17 1.00 0.09
Log-contrast 0.63 0.26 0.27 0.96 0.35
Covariate-adjusted 0.63 0.16 0.17 1.00 0.03

Table 7.

Simulation results for discovering dose-response relationship under partial compliance with imprecise laboratory tests (10% false positive and negative rates). We consider the estimating procedure, described in Section 4.3, with or without covariate adjustment and report the bias, probability of rejecting the null hypothesis (PoR), and the coverage probability of nominal 0.95 confidence intervals based on Normal approximations (CP)

γ Covariate adjustment Bias PoR CP
log(1) = 0 No 0.00 0.05 0.95
Yes 0.00 0.04 0.96
log(0.6) = −0.51 No 0.16 0.10 0.94
Yes 0.16 0.15 0.93
log(0.2) = −1.61 No 0.62 0.52 0.75
Yes 0.63 0.90 0.47

Table 8.

Simulation results under the SW-TND setting with imprecise laboratory tests (10% false positive and negative rates). For the log-contrast estimator (with equal or optimal weight), GLMM estimator, and GEE estimator, we report its average bias (Bias), standard error estimates (SE), average of the standard error estimators (ESE), probability of rejecting the null hypothesis (PoR), and coverage probability (CP). The bias, SE, and ESE are on the log scale

Estimators Bias SE ESE PoR CP
λ=1(logλ=0) Log-contrast (equal weight) 0.00 0.21 0.20 0.06 0.94
Log-contrast (optimal weight) 0.00 0.18 0.18 0.05 0.95
GLMM 0.01 0.10 0.08 0.10 0.90
GEE 0.06 0.06 0.07 0.11 0.89
λ=0.6(logλ=0.51) Log-contrast (equal weight) 0.30 0.09 0.08 0.68 0.06
Log-contrast (optimal weight) 0.27 0.08 0.09 0.75 0.07
GLMM 0.31 0.04 0.04 1.00 0.00
GEE 0.32 0.02 0.03 1.00 0.00
λ=0.2(logλ=1.61) Log-contrast (equal weight) 1.12 0.07 0.06 1.00 0.00
Log-contrast (optimal weight) 1.10 0.09 0.10 0.99 0.00
GLMM 1.14 0.04 0.04 1.00 0.00
GEE 1.15 0.02 0.03 1.00 0.00

Footnotes

SUPPLEMENTARY MATERIAL

Code (DOI: 10.1214/22-AOAS1684SUPP; .zip). The supplementary file contains the R code associated with the simulations and data analyses. The code is also available at https://github.com/BingkaiWang/CR-TND.

REFERENCES

  1. Anders KL, Cutcher Z, Kleinschmidt I, Donnelly CA, Ferguson NM, Indriani C, Ryan PA, O’Neill SL, Jewell NP et al. (2018a). Cluster-randomized test-negative design trials: A novel and efficient method to assess the efficacy of community-level Dengue interventions. Am. J. Epidemiol 187 2021–2028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Anders KL, Indriani C, Ahmad RA, Tantowijoyo W, Arguni E, Andari B, Jewell NP, Rances E, O’Neill SL et al. (2018b). The AWED trial (applying Wolbachia to eliminate Dengue) to assess the efficacy of Wolbachia-infected mosquito deployments to reduce Dengue incidence in Yogyakarta, Indonesia: Study protocol for a cluster randomised controlled trial. Trials 19 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Angrist JD, Imbens GW and Rubin DB (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc 91 444–455. [Google Scholar]
  4. Aronow PM, Green DP and Lee DKK (2014). Sharp bounds on the variance in randomized experiments. Ann. Statist 42 850–871. MR3210989 10.1214/13-AOS1200 [DOI] [Google Scholar]
  5. Breslow NE and Clayton DG (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc 88 9–25. [Google Scholar]
  6. Cattarino L, Rodriguez-Barraquer I, Imai N, Cummings DAT and Ferguson NM (2020). Mapping global variation in Dengue transmission intensity. Sci. Transl. Med 12. 10.1126/scitranslmed.aax4144 [DOI] [PubMed] [Google Scholar]
  7. Chua H, Feng S, Lewnard JA, Sullivan SG, Blyth CC, Lipsitch M and Cowling BJ (2020). The use of test-negative controls to monitor vaccine effectiveness: A systematic review of methodology. Epidemiology 31 43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Clarke PS and Windmeijer F (2012). Instrumental variable estimators for binary outcomes. J. Amer. Statist. Assoc 107 1638–1652. MR3036422 10.1080/01621459.2012.734171 [DOI] [Google Scholar]
  9. Ding P, Feller A and Miratrix L (2016). Randomization inference for treatment effect variation. J. R. Stat. Soc. Ser. B. Stat. Methodol 78 655–671. MR3506797 10.1111/rssb.12124 [DOI] [Google Scholar]
  10. Dufault SM and Jewell NP (2020a). Analysis of counts for cluster randomized trials: Negative controls and test-negative designs. Stat. Med 39 1429–1439. MR4098500 10.1002/sim.8488 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dutra HLC, Rocha MN, Dias FBS, Mansur SB, Caragata EP and Moreira LA (2016). Wolbachia blocks currently circulating Zika virus isolates in Brazilian Aedes aegypti mosquitoes. Cell Host Microbe 19 771–774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Endo A, Funk S and Kucharski AJ (2020). Bias correction methods for test-negative designs in the presence of misclassification. Epidemiol. Infect 148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Haber M, An Q, Foppa IM, Shay DK, Ferdinands JM and Orenstein WA (2015). A probability model for evaluating the bias and precision of influenza vaccine effectiveness estimates from case-control studies. Epidemiol. Infect 143 1417–1426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hodges JL Jr. and Lehmann EL (1963). Estimates of location based on rank tests. Ann. Math. Stat 34 598–611. MR0152070 10.1214/aoms/1177704172 [DOI] [Google Scholar]
  15. Hussey MA and Hughes JP (2007). Design and analysis of stepped wedge cluster randomized trials. Contemporary Clinical Trials 28 182–191. [DOI] [PubMed] [Google Scholar]
  16. Jackson ML and Nelson JC (2013). The test-negative design for estimating influenza vaccine effectiveness. Vaccine 31 2165–2168. [DOI] [PubMed] [Google Scholar]
  17. Jewell NP, Dufault S, Cutcher Z, Simmons CP and Anders KL (2019). Analysis of cluster-randomized test-negative designs: Cluster-level methods. Biostatistics 20 332–346. MR3922137 10.1093/biostatistics/kxy005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Ji X, Fink G, Robyn PJ and Small DS (2017). Randomization inference for stepped-wedge cluster-randomized trials: An application to community-based health insurance. Ann. Appl. Stat 11 1–20. MR3634312 10.1214/16-AOAS969 [DOI] [Google Scholar]
  19. Johnson KN (2015). The impact of Wolbachia on virus infection in mosquitoes. Viruses 7 5705–5717. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Li X and Ding P (2017). General forms of finite population central limit theorems with applications to causal inference. J. Amer. Statist. Assoc 112 1759–1769. MR3750897 10.1080/01621459.2017.1295865 [DOI] [Google Scholar]
  21. Li X and Ding P (2020). Rerandomization and regression adjustment. J. R. Stat. Soc. Ser. B. Stat. Methodol 82 241–268. MR4060984 [Google Scholar]
  22. Li F, Hughes JP, Hemming K, Taljaard M, Melnick ER and Heagerty PJ (2021). Mixed-effects models for the design and analysis of stepped wedge cluster randomized trials: An overview. Stat. Methods Med. Res 30 612–639. MR4236826 10.1177/0962280220932962 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Liang KY and Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22. MR0836430 10.1093/biomet/73.1.13 [DOI] [Google Scholar]
  24. Lin W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. Ann. Appl. Stat 7 295–318. MR3086420 10.1214/12-AOAS583 [DOI] [Google Scholar]
  25. McNeish D and Stapleton LM (2016). Modeling clustered data with very few clusters. Multivar. Behav. Res 51 495–518. [DOI] [PubMed] [Google Scholar]
  26. Murray DM, Varnell SP and Blitstein JL (2004). Design and analysis of group-randomized trials: A review of recent methodological developments. Am. J. Publ. Health 94 423–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Orenstein EW, De Serres G, Haber MJ, Shay DK, Bridges CB, Gargiullo P and Orenstein WA (2007). Methodologic issues regarding the use of three observational study designs to assess influenza vaccine effectiveness. Int. J. Epidemiol 36 623–631. [DOI] [PubMed] [Google Scholar]
  28. Rainey SM, Shah P, Kohl A and Dietrich I (2014). Understanding the Wolbachia-mediated inhibition of arboviruses in mosquitoes: Progress and challenges. J Gen Virol 95 517–530. 10.1099/vir.0.057422-0 [DOI] [PubMed] [Google Scholar]
  29. Rosenbaum PR (2002). Observational Studies, 2nd ed. Springer Series in Statistics. Springer, New York. MR1899138 10.1007/978-1-4757-3692-2 [DOI] [Google Scholar]
  30. Roth J and Sant’Anna PH (2021). Efficient estimation for staggered rollout designs. ArXiv preprint. Available at arXiv:2102.01291. [Google Scholar]
  31. Small DS, Ten Have TR and Rosenbaum PR (2008). Randomization inference in a group-randomized trial of treatments for depression: Covariate adjustment, noncompliance, and quantile effects. J. Amer. Statist. Assoc 103 271–279. MR2420232 10.1198/016214507000000897 [DOI] [Google Scholar]
  32. Su F and Ding P (2021). Model-assisted analyses of cluster-randomized experiments. J. R. Stat. Soc. Ser. B. Stat. Methodol 83 994–1015. MR4349125 10.1111/rssb.12468 [DOI] [Google Scholar]
  33. Sullivan SG, Tchetgen Tchetgen EJ and Cowling BJ (2016). Theoretical basis of the test-negative study design for assessment of influenza vaccine effectiveness. Am. J. Epidemiol 184 345–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Utarini A, Indriani C, Ahmad RA, Tantowijoyo W, Arguni E, Ansari MR, Supriyati E, Wardana DS, Meitika Y et al. (2021). Efficacy of Wolbachia-infected mosquito deployments for the control of Dengue. N. Engl. J. Med 384 2177–2186. 10.1056/NEJMoa2030243 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Walker TJPH, Johnson PH, Moreira LA, Iturbe-Ormaetxe I, Frentiu FD, McMeniman CJ, Leong YS, Dong Y, Axford J et al. (2011). The wMel Wolbachia strain blocks Dengue and invades caged Aedes aegypti populations. Nature 476 450–453. [DOI] [PubMed] [Google Scholar]
  36. Wang B, Harhay MO, Small DS, Morris TP and Li F (2021). On the robustness and precision of mixed-model analysis of covariance in cluster-randomized trials. ArXiv preprint. Available at arXiv:2112.00832. [Google Scholar]
  37. Wang B, Dufault SM, Small DS and Jewell NP (2023). Supplement to “Randomization inference for cluster-randomized test-negative Designs with application to Dengue studies: Unbiased estimation, partial compliance, and stepped-wedge design.” 10.1214/22-AOAS1684SUPP [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Westreich D and Hudgens MG (2016). Invited commentary: Beware the test-negative design. Am. J. Epidemiol 184 354–356. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

zipfile of R code

RESOURCES