Abstract
Two methods are developed for constructing randomization based confidence sets for the average effect of a treatment on a binary outcome. The methods are nonparametric and require no assumptions about random sampling from a larger population. Both of the resulting 1 − α confidence sets are exact in the sense that the probability of containing the true treatment effect is at least 1 − α. Both types of confidence sets are also guaranteed to have width no greater than one. In contrast, a previously proposed asymptotic confidence interval is not exact and may have width greater than one. The first approach combines Bonferroni adjusted prediction sets for the attributable effects in the treated and untreated. The second method entails inverting a permutation test. Simulations are presented comparing the two randomization based confidence sets with the asymptotic interval as well as the standard Wald confidence interval and a commonly used exact interval for the difference in binomial proportions. Results show for small to moderate sample sizes that the permutation confidence set attains the narrowest width on average among the methods that maintain nominal coverage. Extensions that allow for stratifying on categorical baseline covariates are also discussed.
Keywords: Additivity, Attributable Effects, Causal Inference, Exact Confidence Interval, Randomization Inference, Stratified Data
1. Introduction
In many settings inference is desired about the effect of a treatment relative to the absence of treatment on a particular outcome. In studies where treatment is randomly assigned, randomization based inference can be employed to draw conclusions about the effect of treatment. For instance, when the outcome is continuous, randomization based confidence intervals can be formed using the classic approach of Hodges and Lehmann [1]. In addition to randomization, this approach relies on one particular key assumption, namely that the effect of treatment is additive, i.e., the same for all individuals. Additivity is a strong assumption that may not hold in many settings, particularly if the outcome is binary [2]. In this paper, two methods are developed for constructing randomization based confidence sets for the average effect of treatment on a binary outcome without assuming additivity. These sets are formed by (i) combining prediction sets for attributable effects [3], and by (ii) inverting a permutation test.
Specifically, consider a study in which m of n individuals are randomized to treatment and subsequently a binary outcome is measured. Let the binary outcome of interest be denoted by Yj where Yj = 1 if the event occurs and 0 otherwise for individuals j = 1, …, n. Let treatment assignment be indicated by Zj where Zj = 1 if treatment and 0 if placebo. Prior to treatment assignment, assume each individual has two potential outcomes: yj(1) if assigned treatment, and yj(0) if placebo (or control). After treatment assignment, one of the two potential outcomes is observed so that the observed outcome for individual j is Yj = Zjyj(1) + (1 − Zj)yj(0). Let Z denote the vector of treatment assignments, Y denote the vector of observed outcomes, and y(z) denote the vector of potential outcomes when all n individuals are assigned z ∈ {0, 1}. Define the treatment effect for individual j to be δj = yj(1) − yj(0), so that δj = 1 if treatment causes event, 0 if treatment has no effect, and −1 if treatment prevents event. Let δ = y(1) − y(0) be the vector of treatment effects, and let τ = ∑ δj/n be the average treatment effect, where here and in the sequel . Our goal is to construct a confidence set for τ.
In both of the methods to follow, inference on δ will be used as a starting point for inference on τ. Prior to seeing the data, δ ∈ {−1, 0, 1}n, a set with 3n elements. Once the data are observed, one of the two potential outcomes is revealed and one is missing. Because the missing outcome is known to equal 0 or 1, once the data are observed δj is restricted to take one of two values for each individual j, such that there are only 2n δ vectors compatible with the observed data. Similarly, prior to observing the data, the parameter τ can take on values in {−n/n,…, 0/n,…, n/n}, a set with 2n + 1 elements of width two, where here and in the sequel we define the width of a set to be the difference between the maximum and minimum values of the set. After observing the data, it can be easily shown that the set of compatible τ values is
| (1) |
a set with n + 1 elements of width one. Each of the 2n compatible δ vectors maps to one of these n + 1 compatible τ values. The data are informative in the sense that n of the possible τ values can be rejected (with type I error zero). On the other hand, the null τ value of 0 will always be contained in the set of compatible τ values. This is analogous to a well known result about “no assumption” large sample treatment effect bounds [4]. The methods below construct confidence sets for τ that are subsets of the set (1) and thus potentially of width less than one.
The two proposed methods are similar in spirit to the classic Hodges-Lehmann confidence interval in that randomization-based tests are inverted to construct the confidence sets. However, unlike the Hodges-Lehmann approach, no assumption is made that the effect is additive. This is critical because in many settings it will be unlikely or implausible that the treatment effect is the same for all individuals. For example, to assume δj = 1 for all j corresponds to the scenario yj(1) = 1 and yj(0) = 0 for all j, i.e., everyone has an event if and only if treated. Moreover, this particular additivity assumption could be rejected with type I error zero if Yj = 0 for at least one individual assigned treatment or Yj = 1 for at least one individual assigned placebo. An analogous statement applies to the assumption that δj = −1 for all j.
The two proposed methods rely on the randomization-based mode of inference wherein the n individuals are viewed as the finite population of interest and probability arises only through the randomization assignment to treatment or placebo [5, chap. 2]. The randomization-based approach to inference has several appealing properties. For example, the resulting inferences are exact without relying on distributional assumptions and do not require large sample approximations. Randomization-based inference also does not require the observed data constitute a random sample from some infinite population, unlike the more common superpopulation model [6]. This is important in settings where assuming random sampling from the target population may be dubious. For example, individuals who volunteer to participate in a clinical trial may be a biased sample from the general population. Similarly, animals or organisms in a laboratory experiment may differ fundamentally from their counterparts in nature. See [5, 6, 7, 8, 9] for additional discussion related to the various modes of inference for treatment (i.e., causal) effects.
The outline of the rest of this paper is as follows. In Section 2, an approach for finding a confidence set for τ based on attributable effects [3] is proposed. In Section 3, a confidence set for τ is found by inverting a permutation test. In Section 4 the two proposed confidence sets are compared with a large sample confidence interval for τ [6] as well as the usual Wald confidence interval and a commonly used exact interval for the difference in binomial proportions; the different confidence intervals (or sets) are evaluated in simulation studies and illustrated using data from a vaccine adherence trial. In Section 5, extensions to settings with more than one group are considered. Section 6 concludes with a discussion.
2. Attributable Effect Sets
This section describes how a 1 − α confidence set for τ can be constructed by combining prediction sets for attributable effects [3]. The observed data {Z, Y} can be displayed in traditional 2 × 2 form as in Table 1. Noting that ∑ ZjYj = ∑ Zjyj(1), ∑ Zj(1 − Yj) = ∑ Zj(1 − yj(1)), ∑ (1 − Zj)Yj = ∑ (1 − Zj)yj(0), ∑ (1 − Zj)(1 − Yj) = ∑ (1 − Zj)(1 − yj(0)), and yj(1) = yj(0) + δj, Table 1 can be re-expressed as a function of Z, y(0), and A1(Z, δ) = ∑ Zjδj, the attributable effect of treatment in the treated [3], as shown in Table 2. In words, A1(Z, δ) = ∑ Zjyj(1) − ∑ Zjyj(0) is the difference in the number of events which occurred in the treated subjects and the number of events that would have occurred if, contrary to fact, they had been exposed to control instead. After observing the data, it can be inferred that A1(Z, δ) ∈ {∑ ZjYj − m, ∑ ZjYj − m + 1,…, ∑ ZjYj}, a set with m + 1 elements. The observed data can be used to construct a prediction set for A1(Z, δ). We refer to these sets as prediction sets rather than confidence sets because A1(Z, δ) is a random variable rather than a parameter. Rosenbaum described how to construct such prediction sets [3]. In particular, consider testing H0 : δ = δ0 for some compatible vector of effects δ0. Under H0, subtracting A1(Z, δ0) from the (1,1) cell of Table 2 and adding A1(Z, δ0) to the (1,2) cell creates a table with fixed margins, as the row margins of this “adjusted” table are fixed by design and the column margins are fixed because ∑ yj(0) does not depend on Z. Let U = ∑ ZjYj − A1(Z, δ) = ∑ Zjyj(0) denote the number of events in the treated individuals had, contrary to fact, they not been treated. Note U is pivotal because its distribution under H0 does not involve δ0, i.e., U follows a hypergeometric distribution with for u ∈ {max{0,m + ∑ yj(0) − n},…, min{∑ yj(0),m}}. Let u(δ0) = ∑ ZjYj − A1(Z, δ0), the value of U under H0, and let the two-sided Fisher’s exact test p-value be pδ0 (Z, Y) = ∑u Pr(U = u)1{Pr(U = u) ≤ Pr(U = u(δ0))}. Note each of the 2n compatible δ0 corresponds to one of the m + 1 compatible A1(Z, δ0). Therefore, those δ0 that map to the same value of A1(Z δ0) will all yield the same p-value when testing H0. Let 𝒫 (A1(Z, δ)) = {A1(Z, δ) : pδ(Z, Y) ≥ α} denote the set of compatible attributable effects of treatment in the treated where the null H0 : δ = δ0 is not rejected at significance level α. The set 𝒫 (A1(Z, δ)) is a 1 − α prediction set for A1(Z, δ) in the sense that Pr[A1(Z, δ) ∈ 𝒫(A1(Z, δ))] ≥ 1 − α.
Table 1.
Cross classification of observed counts of treatment Z and outcome Y
| Y | ||||
| 1 | 0 | Total | ||
| Z | 1 | ∑ZjYj | ∑ Zj(1 − Yj) | m |
| 0 | ∑ (1 − Zj)Yj | ∑ (1 − Zj)(1 − Yj) | n − m | |
| ∑ Yj | ∑ (1 − Yj) | n | ||
Table 2.
Cross classification of observed counts of treatment Z and outcome Y as a function of the potential outcomes yj(0) and the attributable effect A1(Z, δ)
| Y | ||||
| 1 | 0 | Total | ||
| Z | 1 | ∑ Zjyj(0) + A1(Z, δ) | ∑ Zj(1 − yj(0)) − A1(Z, δ) | m |
| 0 | ∑ (1 − Zj)yj(0) | ∑ (1 − Zj)(1 − yj(0)) | n − m | |
| ∑ yj(0) + A1(Z, δ) | n − ∑ yj(0) − A1(Z, δ) | n | ||
Similarly, define the attributable effect of treatment in the untreated as A0(Z, δ) = ∑ (1 − Zj)δj. In words, A0(Z, δ) = ∑ (1 − Zj)yj(1) − ∑ (1 − Zj)yj(0) is the difference in the number of events in the control subjects had, contrary to fact, they been treated and the number of events actually observed in the control subjects. After observing the data, it can be inferred that A0(Z, δ) ∈ {− ∑ (1 − Zj)Yj,− ∑ (1 − Zj)Yj + 1,…, − ∑ (1 − Zj)Yj + n − m}, a set with n − m + 1 elements. A 1 − α prediction set can be constructed for A0(Z, δ) in the same fashion as for A1(Z, δ). While the attributable effects A1(Z, δ) and A0(Z, δ) are random variables, they are constrained in sum to equal a constant:
| (2) |
The relationship between the attributable effects and τ in (2) suggests combining prediction sets for A1(Z, δ) and A0(Z, δ) to obtain a confidence set for τ. The following proposition indicates that a confidence set for τ can be formed by combining prediction sets with a Bonferroni type adjustment.
Proposition 1
If {L1,L1 + 1,…,U1} is a 1 − α/2 prediction set for A1(Z, δ), where L1 is the minimum of the prediction set and U1 is the maximum, and {L0,L0 + 1,…,U0} is a 1 − α/2 prediction set for A0(Z, δ), where L0 and U0 are defined similarly, then {(L1 + L0)/n, (L1 + L0 + 1)/n,…, (U1 + U0)/n} is a 1 − α confidence set for τ.
A proof of Proposition 1 is given in the Appendix. Constructing a confidence set for τ as described in Proposition 1 only requires testing n + 2 hypotheses, as there are m + 1 compatible values of A1(Z, δ0) that must be tested and there are n − m + 1 compatible values of A0(Z, δ0) that must be tested. Thus the attributable effect based confidence set for τ is computationally feasible even for large n; this is in contrast to the permutation test approach described next.
Note Proposition 1 relies on a Bonferroni type adjustment. Because A1(Z, δ) and A0(Z, δ) are constrained according to (2), it might be tempting to instead add the lower and upper bounds of two 1 − α prediction sets and divide by n (i.e., without a Bonferroni type adjustment). However, such a naive approach is not guaranteed to provide coverage of at least 1 − α as demonstrated by the following example. Suppose an experiment is to be conducted with m = 4 of n = 9 individuals to be assigned treatment. As each individual’s pair of outcomes {yj(0), yj(1)} can take on 4 values, there are 49 possible sets of potential outcomes for the finite population of individuals. Each of these sets maps to one of the 2n + 1 = 19 values of τ. Consider the subset of these 49 sets that map to τ = 1/9. For each of the sets of potential outcomes in this subset, there are possible observed data sets. Applying the naive approach described above of combining two 95% prediction sets without a Bonferroni adjustment to each of the possible observed data sets, only 92% of the sets contain τ = 1/9.
3. Inverted Permutation Test
A permutation based approach can also be employed to find a confidence set for τ. Prior to specifying a null hypothesis H0 : δ = δ0, each individual has one observed and one missing potential outcome; however, under H0, both outcomes are known. A null hypothesis with this property is considered sharp. If the missing outcome for individual j is yj(0), it is known under the null to equal , and if the missing outcome is yj(1), it is known under the null to equal . To determine how likely the observed data are under H0, a test statistic can be chosen, its distribution under the null computed, and a measure of extremeness of the observed data defined [8, §4.1]. A natural choice for the test statistic is the difference in observed means
| (3) |
Neyman [10] showed that T is an unbiased estimator of τ, i.e., E(T) = τ, where the expected value is taken over all possible hypothetical randomizations of m of the n individuals to treatment under the true δ vector. The sampling distribution of T under the null can be determined exactly by computing T for each of the possible randomizations because all potential outcomes are known under the sharp null H0. For randomization c = 1, …, C, let tc denote the value of T under H0. Each randomization occurs with probability 1/C, so the permutation test p-value is defined to be where tobs is the value of T for the observed data, and . The subset of compatible δ0 vectors where the permutation test p-value is greater than or equal to α forms a 1 − α confidence set for δ. The τ0 values corresponding to the δ0 vectors in this confidence set for δ form a 1 − α confidence set for τ.
Although finding a confidence set for δ entails explicitly testing 2n hypotheses, finding a confidence set for τ can be accomplished by testing only O(n4) hypotheses. To see this, let for z ∈ {0, 1} and y ∈ {0, 1}. For the n11 individuals with Zj = 1 and Yj = 1, δj can be 0 or 1. Holding the δj value fixed for the other n10 + n01 + n00 individuals, for fixed υ ∈ {0, 1, …, n11} all δ vectors with ∑j:Zj=Yj=1 1{δj = 1} = υ will lead to the same τ value and permutation p-value, i.e., it is sufficient to test n11 + 1 hypotheses about individuals with Zj = Yj = 1. Similar logic can be applied to the other three cross-classifications of treatment and outcome, such that it is sufficient to test (n11 + 1)(n10 + 1)(n01 + 1)(n00 + 1) hypotheses to find a confidence set for τ.
As O(n4) becomes large, computing permutation confidence sets may become infeasible. In addition to utilizing the compiler package [11], the following two strategies may be employed to improve computational efficiency. First, rather than using all possible randomizations to find the permutation p-value for each hypothesis being tested, a Monte Carlo procedure based on a random sample of the randomizations can be employed to approximate the p-value [12]. Second, the lower limit of the confidence set for τ can be found as follows. Starting with the smallest compatible τ value, compute the permutation p-value for each corresponding δ vector. If at least one p-value is greater than or equal to α, set the lower limit to this value of τ. Otherwise, repeat this process for the next largest compatible τ value until a corresponding δ vector is found whose p-value is greater than or equal to α. The upper limit can be found analogously starting with the largest compatible τ value.
4. Illustrations
4.1. Simple Examples
In this section, the attributable effects and permutation confidence sets for τ are compared with an asymptotic confidence interval for τ. Robins [6] proposed the following large sample (1 − α) confidence interval for τ
| (4) |
where p̂1 = ∑ ZjYj/m, p̂0 = ∑ (1 − Zj)Yj/(n − m), R̂ = {(2p̂0 − p̂1)(1 − p̂1) − p̂0(1 − p̂0)}/n if p̂1 ≥ p̂0, R̂ = {(2p̂1 − p̂0)(1 − p̂0) − p̂1(1 − p̂1)}/n if p̂0 > p̂1, and z(1−α/2) denotes the 1 − α/2 quantile of a standard normal distribution. As n → ∞ with m/n → c ∈ (0, 1), the interval (4) will contain τ with probability 1 − α [6].
To compare the methods, consider an experiment with m = 4 of n = 8 individuals assigned treatment. As each individual’s outcomes {yj(0), yj(1)} can take on 4 values, there are 48 possible sets of potential outcomes for the finite population of individuals. For each of these 48 sets, there are possible observed data sets. For each of the possible combinations of potential outcomes and observed data sets, attributable effects and permutation confidence sets and asymptotic confidence intervals were computed. Figure 1 displays the coverage probability and average width for the three methods at each of the 2n + 1 = 17 values of τ for α = 0.05. To illustrate how the points in Figure 1 were computed, consider the coverage probability of the asymptotic confidence set for τ = −6/8 in the top panel of Figure 1. Of the 48 = 65536 sets of potential outcomes, 120 have τ = −6/8. For these 120 sets of potential outcomes, the asymptotic sets has coverage probability 0.79 for 28 of the sets, 0.71 for 64 of the sets, and 0.79 for 28 of the sets, so the coverage probability for the asymptotic confidence set at τ = −6/8 is the weighted mean, 0.75. The asymptotic confidence sets fail to provide the desired 95% coverage for many τ values; on the other hand, the attributable effects and permutation confidence sets provide the desired level of coverage for all τ values. Permutation confidence sets have a smaller width than the attributable effects confidence sets for each value of τ in this experiment.
Figure 1.
Coverage probability (top) and average width (bottom) of the attributable effects and permutation test based confidence sets and the asymptotic confidence interval for the average treatment effect τ.
4.2. Simulation Study
To further study the proposed methods, the permutation, attributable effects, and asymptotic approaches were compared to the usual Wald interval
| (5) |
and the Santner Snell (SS) exact confidence interval for a difference in binomial proportions [13] in a series of simulation studies. The SS confidence interval is the default exact method for a difference in binomial proportions in SAS 9.3 PROC FREQ [14]. While the Wald and SS methods do not assume additivity, both assume (implicitly perhaps) that the observed data are a random sample from some larger superpopulation. In particular, the Wald and SS methods suppose the numbers of events in the treated and control groups are binomial random variables. As explained in §4 of Robins [6], this binomial model follows from assuming either (a) individual potential outcomes are stochastic, Bernoulli random variables with equal mean across individuals, or (b) the treated and control groups constitute a random sample from some larger superpopulation. Robins argues the mean homogeneity assumption of (a) will usually be biologically implausible, and therefore (b) is implicitly being assumed whenever the binomial model is employed.
Data were simulated under three scenarios: (i) a randomization model, (ii) a randomization model under varying degrees of additivity, and (iii) a superpopulation model. In all simulations where , a random sample with replacement of 100 randomizations was used to approximate permutation test p-values.
Simulations for scenario (i), a randomization model, were carried out for fixed values of n, m, and τ using the following steps:
-
0
Potential outcomes were generated by first letting yj(1) = 1 and yj(0) = 0 such that δj = 1 for individuals j = 1, …, τn. Then for j = τn + 1, …, n, the potential outcome yj(1) was sampled from a Bernoulli distribution with mean 0.5. Finally the potential outcomes yτn+1(0), …, yn(0) were set equal to a random permutation of yτn+1(1), …, yn(1). Generating the potential outcomes in this fashion ensured the average treatment effect equaled τ.
-
1
Observed data were generated by randomly assigning m individuals to treatment and n − m individuals to control. Observed outcomes were then generated based on treatment assignment and the potential outcomes from step 0.
-
2
All five 95% confidence intervals (or sets) were computed for the observed data generated in step 2.
-
3
Steps 1–2 were repeated 1000 times.
The results for scenario (i) in Table 3 show that the permutation confidence set attained the narrowest width on average among methods that maintained nominal coverage. For all intervals (or sets) the average width decreased as τ increased for fixed n and percent assigned treatment. For fixed n and τ, average width and coverage results were similar for 30% treatment compared to 70% treatment. The asymptotic interval was strictly narrower than theWald interval, which is guaranteed [6]. Coverage of the asymptotic interval tended to be substantially less than the nominal level for τ = 0.95. For example, the coverage of the asymptotic interval for 70% assigned treatment and τ = 0.95 was only 0.65 even when n = 100.
Table 3.
Simulation results for scenario (i). Table entries give the empirical width [coverage] of 95% confidence sets or intervals, where τ is the true average treatment effect, % treatment is the percent of n total individuals assigned to treatment in each experiment, Perm is the permutation confidence set, AE is the attributable effects confidence set, Asymptotic is the asymptotic confidence interval in [6], Wald is the usual large sample interval for a risk difference, and SS is the Santner-Snell [13] exact confidence interval.
| 30% treatment | 50% treatment | 70% treatment | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| n | Method | τ = 0.2 | τ = 0.5 | τ = 0.95 | τ = 0.2 | τ = 0.5 | τ = 0.95 | τ = 0.2 | τ = 0.5 | τ = 0.95 |
| 20 | Perm | 0.73[1.00] | 0.70[0.99] | 0.43[1.00] | 0.75[1.00] | 0.70[1.00] | 0.34[1.00] | 0.74[1.00] | 0.70[0.99] | 0.41[1.00] |
| AE | 0.81[1.00] | 0.76[1.00] | 0.53[1.00] | 0.88[1.00] | 0.75[1.00] | 0.52[1.00] | 0.84[1.00] | 0.76[1.00] | 0.53[1.00] | |
| Asymptotic | 0.80[0.96] | 0.68[1.00] | 0.12[0.30] | 0.76[0.96] | 0.60[0.98] | 0.11[0.49] | 0.82[0.97] | 0.67[0.91] | 0.10[0.70] | |
| Wald | 0.87[0.97] | 0.79[0.96] | 0.14[0.30] | 0.83[0.99] | 0.73[0.98] | 0.14[0.49] | 0.89[0.98] | 0.79[0.94] | 0.14[0.70] | |
| SS | 0.91[1.00] | 0.83[1.00] | 0.52[1.00] | 0.89[1.00] | 0.80[1.00] | 0.41[1.00] | 0.91[1.00] | 0.83[1.00] | 0.49[1.00] | |
| 40 | Perm | 0.59[1.00] | 0.53[1.00] | 0.24[1.00] | 0.58[1.00] | 0.51[1.00] | 0.20[1.00] | 0.60[1.00] | 0.52[1.00] | 0.26[1.00] |
| AE | 0.68[1.00] | 0.61[1.00] | 0.41[1.00] | 0.67[1.00] | 0.58[1.00] | 0.30[1.00] | 0.68[1.00] | 0.61[1.00] | 0.32[1.00] | |
| Asymptotic | 0.60[0.97] | 0.49[0.97] | 0.08[0.91] | 0.55[0.98] | 0.42[1.00] | 0.11[0.76] | 0.60[0.96] | 0.47[0.97] | 0.11[0.79] | |
| Wald | 0.65[0.98] | 0.58[0.99] | 0.12[0.91] | 0.60[1.00] | 0.52[1.00] | 0.13[0.76] | 0.65[0.97] | 0.56[0.98] | 0.14[0.79] | |
| SS | 0.66[1.00] | 0.60[1.00] | 0.29[1.00] | 0.63[1.00] | 0.57[1.00] | 0.26[1.00] | 0.66[1.00] | 0.61[1.00] | 0.31[1.00] | |
| 60 | Perm | 0.51[0.99] | 0.43[1.00] | 0.19[1.00] | 0.50[1.00] | 0.42[1.00] | 0.15[1.00] | 0.52[1.00] | 0.44[1.00] | 0.19[1.00] |
| AE | 0.57[1.00] | 0.49[1.00] | 0.24[1.00] | 0.57[1.00] | 0.50[1.00] | 0.23[1.00] | 0.57[1.00] | 0.52[1.00] | 0.24[1.00] | |
| Asymptotic | 0.50[0.98] | 0.37[0.98] | 0.10[0.78] | 0.45[0.98] | 0.35[0.98] | 0.10[1.00] | 0.50[0.99] | 0.41[0.98] | 0.09[0.64] | |
| Wald | 0.53[0.98] | 0.45[1.00] | 0.13[1.00] | 0.49[1.00] | 0.43[0.99] | 0.13[1.00] | 0.53[0.99] | 0.48[0.99] | 0.12[0.95] | |
| SS | 0.54[0.99] | 0.50[1.00] | 0.23[1.00] | 0.52[1.00] | 0.46[1.00] | 0.20[1.00] | 0.54[1.00] | 0.49[1.00] | 0.23[1.00] | |
| 100 | Perm | 0.42[1.00] | 0.35[1.00] | 0.14[1.00] | 0.40[1.00] | 0.33[1.00] | 0.11[1.00] | 0.42[1.00] | 0.36[1.00] | 0.14[0.99] |
| AE | 0.47[1.00] | 0.41[1.00] | 0.18[1.00] | 0.45[1.00] | 0.39[1.00] | 0.17[1.00] | 0.46[1.00] | 0.42[1.00] | 0.18[1.00] | |
| Asymptotic | 0.38[0.98] | 0.31[0.98] | 0.08[0.72] | 0.35[0.99] | 0.27[0.98] | 0.09[0.88] | 0.39[0.99] | 0.31[0.98] | 0.09[0.65] | |
| Wald | 0.41[0.99] | 0.37[0.99] | 0.11[0.96] | 0.38[1.00] | 0.33[1.00] | 0.11[1.00] | 0.42[0.99] | 0.37[1.00] | 0.11[0.82] | |
| SS | 0.42[0.99] | 0.38[1.00] | 0.17[1.00] | 0.40[1.00] | 0.35[1.00] | 0.15[1.00] | 0.42[0.99] | 0.38[1.00] | 0.17[1.00] | |
Simulations for scenario (ii) were carried out similar to scenario (i) but with varying degrees of additivity. In particular, as a measure of the amount of additivity let γ = ∑j 1{δj = 0}/n denote the proportion of individuals where the treatment has no effect, such that γ ∈ [0,1], with the degree of additivity increasing as γ → 1. For fixed values of n, m, and γ, simulations proceeded in the same manner as scenario (i) except that a different step 0 was used to generate potential outcomes. Specifically, for j = 1,…, γn, the potential outcome yj(1) was randomly sampled from a Bernoulli distribution with mean 0.5 and yj(0) was set equal to yj(1) such that δj = 0. For individuals j = γn + 1, … (1 +γ)n/2, the potential outcomes were set to yj(1) = 0 and yj(0) = 1 such that δj = −1. For individuals j = (1 +γ)n/2 + 1, …, n, the potential outcomes were set to yj(1) = 1 and yj(0) = 0 such that δj = 1. Generating the potential outcomes in this fashion ensured the degree of additivity equaled γ. The results for scenario (ii) in Table 4 show that the permutation confidence set again attained the narrowest width on average among methods that maintained nominal coverage. Coverage of the asymptotic interval tended to be less than the nominal level for n ≤ 60 and γ = 1. For n = 100 the asymptotic interval nearly achieved the nominal level for all nine combinations of m and γ.
Table 4.
Simulation results for scenario (ii). Table entries give the empirical width [coverage] of 95% confidence sets or intervals, where γ is the degree of additivity, % treatment is the percent of n total individuals assigned to treatment in each experiment, Perm is the permutation confidence set, AE is the attributable effects confidence set, Asymptotic is the asymptotic confidence interval in [6], Wald is the usual large sample interval for a risk difference, and SS is the Santner-Snell [13] exact confidence interval.
| 30% treatment | 50% treatment | 70% treatment | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| n | Method | γ = 0.2 | γ = 0.8 | γ = 1 | γ = 0.2 | γ = 0.8 | γ = 1 | γ = 0.2 | γ = 0.8 | γ = 1 |
| 20 | Perm | 0.73[1.00] | 0.74[0.99] | 0.72[0.99] | 0.76[1.00] | 0.76[0.99] | 0.66[0.98] | 0.73[1.00] | 0.73[0.99] | 0.67[0.98] |
| AE | 0.88[1.00] | 0.86[1.00] | 0.84[1.00] | 0.91[1.00] | 0.91[1.00] | 0.77[0.99] | 0.88[1.00] | 0.83[1.00] | 0.80[0.99] | |
| Asymptotic | 0.86[0.99] | 0.83[0.95] | 0.81[0.89] | 0.82[1.00] | 0.80[0.92] | 0.69[0.93] | 0.86[0.99] | 0.80[0.93] | 0.73[0.92] | |
| Wald | 0.91[0.99] | 0.89[0.95] | 0.86[0.95] | 0.85[1.00] | 0.85[0.96] | 0.75[0.94] | 0.91[0.99] | 0.86[0.93] | 0.78[0.93] | |
| SS | 0.94[1.00] | 0.93[0.97] | 0.93[0.96] | 0.91[1.00] | 0.90[0.98] | 0.78[0.99] | 0.94[1.00] | 0.93[0.99] | 0.78[0.98] | |
| 40 | Perm | 0.60[1.00] | 0.60[0.98] | 0.59[0.98] | 0.59[1.00] | 0.59[0.99] | 0.54[0.97] | 0.60[1.00] | 0.60[0.99] | 0.56[0.98] |
| AE | 0.69[1.00] | 0.69[0.99] | 0.67[0.99] | 0.68[1.00] | 0.67[1.00] | 0.64[0.99] | 0.68[1.00] | 0.69[0.99] | 0.64[1.00] | |
| Asymptotic | 0.64[0.99] | 0.62[0.92] | 0.61[0.90] | 0.59[1.00] | 0.57[0.94] | 0.55[0.92] | 0.63[0.99] | 0.62[0.91] | 0.57[0.92] | |
| Wald | 0.66[1.00] | 0.66[0.95] | 0.65[0.93] | 0.61[1.00] | 0.60[0.94] | 0.58[0.95] | 0.65[0.99] | 0.66[0.95] | 0.60[0.94] | |
| SS | 0.67[1.00] | 0.67[0.97] | 0.65[0.96] | 0.65[1.00] | 0.64[0.98] | 0.58[0.97] | 0.67[1.00] | 0.67[0.98] | 0.60[0.97] | |
| 60 | Perm | 0.52[1.00] | 0.52[0.99] | 0.52[0.98] | 0.51[1.00] | 0.50[0.98] | 0.47[0.98] | 0.52[1.00] | 0.52[0.99] | 0.49[0.97] |
| AE | 0.59[1.00] | 0.58[1.00] | 0.58[0.99] | 0.58[1.00] | 0.56[1.00] | 0.53[0.99] | 0.59[1.00] | 0.59[0.99] | 0.56[0.98] | |
| Asymptotic | 0.53[1.00] | 0.52[0.94] | 0.51[0.90] | 0.49[1.00] | 0.47[0.94] | 0.47[0.90] | 0.53[1.00] | 0.52[0.93] | 0.49[0.90] | |
| Wald | 0.54[1.00] | 0.54[0.96] | 0.53[0.94] | 0.50[1.00] | 0.49[0.96] | 0.49[0.95] | 0.54[1.00] | 0.54[0.95] | 0.51[0.93] | |
| SS | 0.55[1.00] | 0.55[0.97] | 0.54[0.97] | 0.53[1.00] | 0.52[0.99] | 0.49[0.96] | 0.55[1.00] | 0.55[0.96] | 0.51[0.96] | |
| 100 | Perm | 0.43[1.00] | 0.43[0.99] | 0.42[0.97] | 0.41[1.00] | 0.41[0.99] | 0.39[0.97] | 0.43[1.00] | 0.43[0.98] | 0.41[0.97] |
| AE | 0.47[1.00] | 0.47[1.00] | 0.47[0.98] | 0.45[1.00] | 0.45[0.99] | 0.43[0.98] | 0.47[1.00] | 0.47[0.99] | 0.44[0.98] | |
| Asymptotic | 0.41[1.00] | 0.41[0.96] | 0.41[0.92] | 0.38[1.00] | 0.38[0.96] | 0.37[0.94] | 0.42[1.00] | 0.41[0.94] | 0.39[0.94] | |
| Wald | 0.42[1.00] | 0.42[0.97] | 0.42[0.94] | 0.39[1.00] | 0.39[0.97] | 0.38[0.95] | 0.42[1.00] | 0.42[0.96] | 0.40[0.94] | |
| SS | 0.43[1.00] | 0.43[0.98] | 0.42[0.95] | 0.41[1.00] | 0.41[0.98] | 0.39[0.96] | 0.43[1.00] | 0.43[0.97] | 0.41[0.96] | |
Simulations were conducted under scenario (iii), a superpopulation model, as above but with different steps 0 and 1. In particular, potential outcomes were not generated. Rather, the observed outcome data were generated by first randomly assigning m of n individuals to treatment. Outcomes were then independently sampled from a Bernoulli distribution with mean p1 = 0.5 + Δ/2 for individuals assigned Z = 1 and from a Bernoulli distribution with mean p0 = 0.5 − Δ/2 for individuals assigned Z = 0, where Δ was some fixed value denoting the difference in the probability of an event in the superpopulation when an individual receives treatment compared to not receiving treatment. After generating observed data, all five 95% confidence intervals (or sets) were computed. This process of data generation and interval (or set) computation was repeated 1000 times, and average interval (or set) widths and coverages were computed for the five approaches. The results for scenario (iii) in Table 5 show that the SS confidence interval was the only method to achieve nominal coverage across all simulation setups (with the exception of 30% assigned treatment at Δ = 0.2 when n = 60). The Wald confidence interval did not reliably achieve nominal coverage with Δ = 0.95, an unsurprising result given that the Wald confidence interval is known to cover poorly near the boundary of the parameter space [15]. The asymptotic confidence interval undercovered even with n = 100. The permutation and attributable effects confidence sets performed well, albeit with some slight undercoverage. The permutation confidence set tended to be as or more narrow than SS.
Table 5.
Simulation results for scenario (iii). Table entries give the empirical width [coverage] of 95% confidence sets or intervals, where Δ is the true difference in binomial proportions, % treatment is the percent of n total individuals assigned to treatment in each experiment, Perm is the permutation confidence set, AE is the attributable effects confidence set, Asymptotic is the asymptotic confidence interval in [6], Wald is the usual large sample interval for a difference in binomial proportions, and SS is the Santner-Snell [13] exact confidence interval.
| 30% treatment | 50% treatment | 70% treatment | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| n | Method | Δ = 0.2 | Δ = 0.5 | Δ = 0.95 | Δ = 0.2 | Δ = 0.5 | Δ = 0.95 | Δ = 0.2 | Δ = 0.5 | Δ = 0.95 |
| 20 | Perm | 0.73[0.95] | 0.68[0.95] | 0.42[0.90] | 0.74[0.96] | 0.69[0.94] | 0.34[0.92] | 0.73[0.95] | 0.67[0.94] | 0.42[0.91] |
| AE | 0.83[0.98] | 0.75[0.98] | 0.53[0.90] | 0.85[0.98] | 0.75[0.98] | 0.53[0.92] | 0.83[0.99] | 0.75[0.98] | 0.53[0.91] | |
| Asymptotic | 0.78[0.84] | 0.63[0.82] | 0.11[0.39] | 0.73[0.86] | 0.59[0.89] | 0.11[0.41] | 0.78[0.85] | 0.61[0.78] | 0.11[0.38] | |
| Wald | 0.86[0.90] | 0.75[0.89] | 0.13[0.40] | 0.81[0.92] | 0.71[0.90] | 0.14[0.42] | 0.86[0.91] | 0.73[0.85] | 0.13[0.39] | |
| SS | 0.90[0.96] | 0.82[0.98] | 0.50[1.00] | 0.87[0.97] | 0.79[0.97] | 0.41[0.99] | 0.91[0.96] | 0.81[0.98] | 0.50[1.00] | |
| 40 | Perm | 0.59[0.95] | 0.52[0.94] | 0.26[0.91] | 0.58[0.97] | 0.51[0.96] | 0.20[0.92] | 0.59[0.95] | 0.52[0.95] | 0.26[0.94] |
| AE | 0.67[0.98] | 0.60[0.98] | 0.32[0.91] | 0.66[0.98] | 0.58[0.98] | 0.30[0.92] | 0.67[0.98] | 0.60[0.98] | 0.32[0.94] | |
| Asymptotic | 0.59[0.90] | 0.47[0.86] | 0.10[0.61] | 0.54[0.88] | 0.43[0.88] | 0.10[0.61] | 0.58[0.88] | 0.47[0.87] | 0.10[0.66] | |
| Wald | 0.64[0.92] | 0.56[0.90] | 0.12[0.62] | 0.59[0.95] | 0.52[0.94] | 0.12[0.62] | 0.63[0.90] | 0.56[0.91] | 0.13[0.66] | |
| SS | 0.66[0.99] | 0.60[0.96] | 0.31[0.99] | 0.63[0.97] | 0.56[0.98] | 0.25[0.98] | 0.65[0.95] | 0.59[0.97] | 0.31[0.99] | |
| 60 | Perm | 0.51[0.94] | 0.44[0.94] | 0.20[0.92] | 0.49[0.96] | 0.42[0.96] | 0.15[0.93] | 0.51[0.96] | 0.44[0.95] | 0.20[0.93] |
| AE | 0.57[0.96] | 0.51[0.97] | 0.24[0.92] | 0.56[0.98] | 0.50[0.99] | 0.23[0.93] | 0.57[0.97] | 0.50[0.98] | 0.24[0.93] | |
| Asymptotic | 0.49[0.91] | 0.40[0.87] | 0.10[0.55] | 0.44[0.89] | 0.35[0.90] | 0.09[0.74] | 0.49[0.91] | 0.39[0.89] | 0.09[0.54] | |
| Wald | 0.53[0.93] | 0.47[0.92] | 0.12[0.80] | 0.49[0.93] | 0.43[0.95] | 0.11[0.76] | 0.53[0.92] | 0.46[0.93] | 0.12[0.78] | |
| SS | 0.54[0.94] | 0.49[0.96] | 0.24[0.99] | 0.51[0.97] | 0.46[0.97] | 0.19[0.81] | 0.54[0.95] | 0.49[0.96] | 0.23[0.99] | |
| 100 | Perm | 0.42[0.95] | 0.35[0.94] | 0.14[0.93] | 0.39[0.96] | 0.33[0.93] | 0.11[0.95] | 0.42[0.96] | 0.35[0.93] | 0.14[0.93] |
| AE | 0.46[0.97] | 0.41[0.98] | 0.18[0.96] | 0.45[0.98] | 0.39[0.98] | 0.17[0.96] | 0.46[0.98] | 0.41[0.97] | 0.18[0.96] | |
| Asymptotic | 0.38[0.90] | 0.31[0.88] | 0.08[0.64] | 0.35[0.90] | 0.27[0.88] | 0.08[0.68] | 0.38[0.92] | 0.31[0.87] | 0.08[0.64] | |
| Wald | 0.41[0.92] | 0.36[0.93] | 0.11[0.79] | 0.38[0.94] | 0.34[0.94] | 0.10[0.91] | 0.41[0.94] | 0.36[0.93] | 0.11[0.79] | |
| SS | 0.42[0.95] | 0.38[0.96] | 0.17[0.98] | 0.40[0.95] | 0.35[0.95] | 0.14[0.98] | 0.42[0.96] | 0.38[0.96] | 0.17[0.99] | |
4.3. Vaccine Adherence Trial
In a study of adherence to the hepatitis B vaccine series [16], 96 injection drug users were randomized to a monetary incentive group or an outreach arm. Of the 48 individuals in the monetary incentive group, 33 were adherent, and of the 48 in the outreach arm, 11 were adherent. Using (3), T = 22/48, suggesting that 44 more individuals would have been adherent to the hepatitis B vaccine series if all 96 individuals were given monetary incentives compared to if no individuals received monetary incentives. The attributable effects confidence set is contained in the interval [0.23, 0.64]. The permutation confidence set, found using 100 re-randomizations for each hypothesis test, is contained in the interval [0.28, 0.64]. The SS, asymptotic, and Wald confidence intervals are [0.26, 0.63], [0.31, 0.60], and [0.28, 0.64] respectively. Thus for this example the permutation confidence set is the narrowest of the three exact approaches. The permutation confidence set has the same width as the Wald interval but is slightly wider than the asymptotic interval; however, unlike the Wald and asymptotic intervals, the permutation confidence set is guaranteed to cover at the nominal level.
5. Multiple Strata Designs and Observational Studies
The methods above can be extended to studies where stratified randomization is employed, i.e., individuals are randomized to treatment or control within strata. Assume that in each of i = 1, …, k strata, mi of ni individuals are randomized to treatment. Assume randomization is conducted independently across strata, such that there are total possible randomizations. For stratum i, let δij be the treatment effect for individual j and let δi be the vector of treatment effects. Define Z analogously for stratum i such that Zij is the treatment assignment for individual j and Zi is the vector of treatment assignments. The average treatment effect is τ = ∑i ∑j δij/n, where and where here and below and .
The permutation based approach becomes computationally unwieldy in this setting. The computational burden of the permutation confidence set is based on the product of two factors. The first factor is the number of hypotheses to test. For the one stratum setting, the number of hypotheses to test is O(n4) whereas for the k-strata setting the number of hypotheses to test is O(max{n1, …, nk}4k). The second factor is the number of permutations needed to test each hypothesis. In the one stratum problem, this number is . In the k-strata case, the second factor is . Although for fixed the second factor will be smaller for the k-strata case, the first factor in the k-strata case will be much larger and therefore will dominate the product. For example, suppose there are n = 100 individuals in k = 4 strata of equal sample size such that n1 = ⋯ = n4 = 25; then max{n1,…, n4}4k = 2516 >> 1004 = n4.
Given these computational challenges, the attributable effects based approach may be preferred in the multiple strata setting. To construct attributable effect based confidence sets, first note under H0 : δ = δ0, or equivalently that for i = 1,…, k, the observed data can be represented in a k-table analogue of Table 2. Under this null, subtracting the attributable effect of treatment in the treated, , from the (1,1) cell and adding to the (1,2) cell for stratum i = 1,… k will serve to fix all row and column margins in the k-table analogue of Table 2. As a result, the joint distribution of the corresponding pivotal quantities will be a product of independent hypergeometric distributions. The hypothesis H0 : δ = δ0 is rejected if the two-sided p-value resulting from a Cochran Mantel Haenszel exact test is sufficiently small.
As in the single stratum setting considered in Section 2, this hypothesis test can be inverted to obtain prediction sets for A1(Z, δ) and for A0(Z, δ). These prediction sets are considerably more difficult to find in the k strata setting. As , there may be multiple combinations of that sum to the same value of A1(Z, δ). Each combination producing the same A1(Z, δ) may lead to a different p-value. A value of A1(Z, δ) will be included in a 1 − α prediction set if the maximum p-value among the combinations is greater than α. Finding the maximum p-value over the combinations of that sum to the same value of A1(Z, δ) is an integer programming problem that can be solved using existing software, e.g., the R package rgenoud [17]. Proposition 1 allows for the construction of a confidence set for τ in the k strata setting also.
The methods in this section may have utility in observational studies where one is willing to assume treatment selection is independent of potential outcomes conditional on some sufficient set of covariates (i.e., there are no unmeasured confounders). In this setting, an observational study can be envisaged as a stratified randomized trial performed by nature [5, §3.2], [6]. With the strata formed by levels of the measured covariates, these methods can be employed to find exact 1 − α confidence sets for the effect of treatment or exposure on a binary outcome.
6. Discussion
In this paper, we have presented two methods for constructing randomization based confidence sets for the average effect of a treatment on a binary outcome without assuming additivity. The first approach utilizes attributable effect sets [3]; these sets are adjusted using a Bonferroni correction and combined to form a confidence set. The second method involves inverting a permutation test. Both methods are nonparametric, are guaranteed to yield sets that have width no greater than one, require no assumptions about random sampling from a larger population, and are exact in the sense that the probability of containing the true treatment effect is at least 1 − α. While the attributable effects method is computationally fast and the permutation method is computationally slow as n increases, simulations show that permutation method has smaller average width. Based on finite population simulation results, the permutation approach is recommended over the attributable effects and asymptotic approaches for n ≤ 100. Additional simulation results (not shown) indicate the asymptotic approach tends to provide nominal coverage for n > 100, although coverage may still be less than the nominal level for extreme values of τ (e.g., τ ≈ 1). Extensions that allow for stratifying on categorical baseline covariates were also considered. The R package RI2by2 is available on CRAN [18] for computing the attributable effects and permutation confidence sets as well as the asymptotic confidence interval in the one stratum setting.
There are several possible future directions to this research. For example, one future direction would be to increase the computational efficiency of the permutation based approach. Both the permutation and attributable effects based confidence sets tend to be conservative in that the empirical coverage in the simulation studies tended to be greater than the nominal level. Therefore another future research direction could explore adaptations of these two approaches which yield less conservative sets. For instance, techniques could be explored (as in [19]) such that the average coverage equals the nominal level, although such procedures would no longer necessarily be exact.
Acknowledgement
This research was supported in part by the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the the National Institutes of Health. The authors thank the Associate Editor and reviewer for helpful comments that greatly improved this paper.
Appendix
Proof of Proposition 1
Let LD = L0 + L1, UD = U0 + U1, A1 = A1(Z, δ), and A0 = A0(Z, δ). Recall that nτ = ∑ δj. After observing the data, A1 ∈ 𝒜1, where 𝒜1 = {− ∑ ZjYj, − ∑ZjYj + 1,…, ∑ Zj(1 − Yj)}. Therefore:
where the first inequality follows because [L1,U1] ⊆ 𝒜1, the second inequality is true because L1 − a1 ≤ 0 and U1 − a1 ≥ 0 for all a1 ∈ [L1,U1], and the third inequality follows from the Bonferroni inequality.
References
- 1.Hodges J, Lehmann E. Estimates of location based on rank tests. The Annals of Mathematical Statistics. 1963;34(2):598–611. [Google Scholar]
- 2.LaVange L, Durham T, Koch G. Randomization-based nonparametric methods for the analysis of multicentre trials. Statistical Methods in Medical Research. 2005;14(3):281–301. doi: 10.1191/0962280205sm397oa. [DOI] [PubMed] [Google Scholar]
- 3.Rosenbaum P. Effects attributable to treatment: Inference in experiments and observational studies with a discrete pivot. Biometrika. 2001;88(1):219–231. [Google Scholar]
- 4.Manski C. Nonparametric bounds on treatment effects. The American Economic Review. 1990;80(2):319–323. [Google Scholar]
- 5.Rosenbaum P. Observational Studies. New York, NY: Springer; 2002. [Google Scholar]
- 6.Robins J. Confidence intervals for causal parameters. Statistics in Medicine. 1988;7:773–785. doi: 10.1002/sim.4780070707. [DOI] [PubMed] [Google Scholar]
- 7.Miettinen O, Cook E. Confounding: essence and detection. American Journal of Epidemiology. 1981;114(4):593–603. doi: 10.1093/oxfordjournals.aje.a113225. [DOI] [PubMed] [Google Scholar]
- 8.Rubin D. Practical implications of modes of statistical inference for causal effects and the critical role of the assignment mechanism. Biometrics. 1991;47(4):1213–1234. [PubMed] [Google Scholar]
- 9.Lehmann E. Nonparametrics: Statistical Methods Based on Ranks. Upper Saddle River, NJ: Springer; 1998. [Google Scholar]
- 10.Neyman J. On the application of probability theory to agricultural experiments. Essay on principles. Section 9 (1990 Dabrowska and Speed translation) Statistical Science. 1923;5:465–472. [Google Scholar]
- 11.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2014. http://www.R-project.org/. [Google Scholar]
- 12.Mehta C, Patel N. StatXact 5 for Windows: Statistical Software for Exact Nonparametric Inference User Manual. CYTEL Software Corporation; 2003. [Google Scholar]
- 13.Santner T, Snell M. Small-sample confidence intervals for p1 – p2 and p1/p2 in 2 × 2 contingency tables. Journal of the American Statistical Association. 1980;75(370):386–394. [Google Scholar]
- 14.SAS Institute Inc. SAS Software, Version 9.3. Cary, NC: 2014. http://www.sas.com/. [Google Scholar]
- 15.Agresti A, Caffo B. Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. The American Statistician. 2000;54(4):280–288. [Google Scholar]
- 16.Seal K, Kral A, Lorvick J, McNees A, Gee L, Edlin B. A randomized controlled trial of monetary incentives vs. outreach to enhance adherence to the hepatitis B vaccine series among injection drug users. Drug and Alcohol Dependence. 2003;71(2):127–131. doi: 10.1016/s0376-8716(03)00074-7. [DOI] [PubMed] [Google Scholar]
- 17.Mebane W, Jr, Sekhon J. Genetic optimization using derivatives: the rgenoud package for R. Journal of Statistical Software. 2011;42(11):1–26. [Google Scholar]
- 18.Rigdon J. RI2by2: Randomization inference for treatment effects on a binary outcome. R package version 1.2. 2014 doi: 10.1002/sim.6384. http://CRAN.R-project.org/package=RI2by2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Thulin M. Coverage-adjusted confidence intervals for a binomial proportion. Scandinavian Journal of Statistics. 2014;41(2):291–300. [Google Scholar]

