Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Mar 15.
Published in final edited form as: J Appl Stat. 2012 Nov 5;40(2):311–319. doi: 10.1080/02664763.2012.740629

A comment on sample size calculations for binomial confidence intervals

Lai Wei 1,*, Alan D Hutson 1
PMCID: PMC4792103  NIHMSID: NIHMS766152  PMID: 26989292

Abstract

In this article we examine sample size calculations for a binomial proportion based on the confidence interval width of the Agresti–Coull, Wald and Wilson Score intervals. We pointed out that the commonly used methods based on known and fixed standard errors cannot guarantee the desired confidence interval width given a hypothesized proportion. Therefore, a new adjusted sample size calculation method was introduced, which is based on the conditional expectation of the width of the confidence interval given the hypothesized proportion. With the reduced sample size, the coverage probability can still maintain at the nominal level and is very competitive to the converge probability for the original sample size.

Keywords: binomial proportion, sample size calculation, expected width, Agresti—Coull interval, Wald interval, Wilson Score interval

1. Introduction

A common problem in biostatistics is the estimation of a binomial parameter, π, based on a binomial observation, Y ~ B(n, π), e.g. π may represent the fraction surviving beyond one year, the sensitivity or specificity of a diagnostic test, or the proportion of values outside the normal range, to name just a few common applications. In early stage biomedical investigations of pilot studies, researchers are oftentimes more inclined to base their inferences about π = π0 through the use of the (1 − α) × 100% confidence interval for π versus a formal hypothesis test of H0 : π = π0. The reasons for this preference are varied, but generally involve particulars in early phase investigations where very little is known about π or historically confidence intervals are the preferred approach within a given subspeciality, e.g. diagnostic testing.

It is well known that the maximum-likelihood estimator for π is given by π^=Y/n and that asymptotically n(π^π)/π(1π)~N(0,1). Historically, this result was the basis for the large sample (1 − α) × 100% approximate confidence interval for π given by

π^±zα/2π^(1π^)n, (1)

where zα/2 is the upper (1 − α/2)th quantile from a standard normal distribution, oftentimes given as Φ−1(1 − α/2). The interval given by Equation (1) is referred to as a Wald interval for π [1].

In recent years, there have been several improvements to the approximate interval given by Equation (1), e.g. see Piegorsch [5] for an excellent survey of these results. Most notable improvement is a straightforward correction developed by Brown et al. [3,4], which extends the approach of Agresti and Coull [1], and is given by

π±zα/2π(1π)n, (2)

where the modified estimator of π is given by π=(Y+zα/22/2)/(n+zα/22) and n=n+zα/22, for details see Brown et al. [3,4]. This approach improves the coverage probabilities of the confidence interval for π substantially as compared with the Wald-type interval for a general level α.

Wilson [6] gave an interval estimation approach which is the inversion of the score test for π, and is given by

π^+(1/2n)z1α/22±zα/2π^(1π^)/n+(zα/22/4n2)1+(1/n)zα/22. (3)

By simple derivation, it can be expressed as

π±zα/2nnπ^(1π^)+zα/224, (4)

where π, ñ and π^ are as above.

This interval is also called the Wilson Score interval. The coverage probability is closer to the nominal value compared with the Wald interval and it has good properties even for a small number of trials and/or an extreme probability.

In terms of deriving an estimate for sample size n required to carry out a given study, the basic recommendation based on Equation (1) is to solve the equation

δ0=zα/2π0(1π0)n (5)

for n at some desired interval width w0 = 2δ0 and given some hypothesized and ‘known’ π0, e.g. see the oft-cited article by Arkin and Wachtel [2]. This method is widely used even though it is not particularly accurate, justifying Piegorsch’s [5] comment that it is ‘methodus non-gratus’. Improvements in sample size estimation based on more accurate intervals such as the one developed by Brown et al. [3,4] at Equation (2) are provided in Piegorsch [5]. We argue, however, that this is the incorrect parameterization of the problem at hand even if the more accurate interval estimators are utilized, i.e. Is this the correct approach to ‘guarantee’ the appropriate confidence interval width given π0 is known? Given that the calculated interval width is a random variable, it is impossible to correctly ensure a priori, the fixed width w0. We can, however, calculate the distribution of the random variable corresponding to the interval width conditional on π0 and its associated quantities. Furthermore, we illustrate in later sections that the desired width as defined as w0 may not even be achievable, i.e. the range of support for the random variable W=2δ falls below w0, where WW, WAC and WS denote the estimated width for Wald, Agresti–Coull and Wilson Score methods, respectively, and the corresponding half interval lengths are given by

δW=zα/2π^(1π^)n, (6)
δAC=zα/2π(1π)n, (7)

and

δS=zα/2nnπ^(1π^)+zα/224. (8)

Note also that even if the range of W=2δ does encompass w0, the P(Ww0) is quite high or even equal to 1. To summarize, the expected width of W is less than w0. This will be illustrated in more detail in the next few sections.

In the following sections, we will introduce a new sample size calculation method, from which the improved interval is based upon, i.e. solve

wE=E(W|π0), (9)

for n corresponding to a desired width of wE given a fixed sample size, versus a cutpoint w0 that may not even fall within the support of W.

2. Specific case π0=12

The specific case that π0=12 is very important theoretically given that it is oftentimes recommended as the default conservative approach given that nothing is known about π since it provides the maximum w0 for a given n. It has been noted by Brown et al. [3,4] that Agresti–Coull confidence interval outperforms the Wald intervals. Hence, we will use the Agresti–Coull interval for derivation of the theorems in this section and then generalize the properties in Section 3 for the other two intervals. For the Agresti–Coull interval, the probability distribution of WAC is given by

P(WAC=w(y)|π0)=(ny)π0y(1π0)ny, (10)

where

w(y)=2zα/2π(y)(1π(y))n+zα/22, (11)

and

π(y)=y+zα/22/2n+zα/22,y=0,1,2,,n, (12)

The sample space for WAC is given by the set W={w(0),w(1),,w(n)} with associated binomial probabilities given in Equation (10).

In order to develop our argument for the appropriate sample size metric, we first present two very straightforward theorems:THEOREM 1a (n = odd) Given π0=12 and a sample size of n = 2m + 1, m = 0, 1, 2, , the maximum of WACis strictly less than w0.

Proof The proof is straightforward in that the maximum for WAC occurs at the value of π where π(1π) is maximized, which occurs at the values for which Y = (n − 1)/2 and Y = (n + 1)/2 corresponding to values for π of ((n1)/2+zα/22/2)/(n+zα/22) and ((n+1)/2+zα/22/2)/(n+zα/22), respectively. The width w0 is maximized for the value of π0 for which π0(1π0)is maximized, which occurs at π0=12. Noting that π at Y = (n − 1)/2 and Y = (n + 1)/2 are strictly smaller than π0=12 completes the proof.

THEOREM 1b (n = even) Given π0=12 and a sample size of n = 2m, m = 0, 1, 2, , the maximum value that WAC may attain is equal to w0.

Proof The proof follows a similar logic as Theorem 1a in that the maximum for WAC occurs at Y = n/2.

COROLLARY 1b For sample sizes of n = 2m, m = 0, 1, 2, , the P(WAC<w0|π0=12)1 as n → ∞.

Proof The proof follows from Theorem 1b in that P(WAC<w0|π0=12)=1P(WAC=w0|π0=12), which occurs at Y = n/2.

THEOREM 2 The E(WAC|π0=12)<w0.

Proof Expand WAC in a Taylor series about π0=12. Such that

2zα/2π(1π)n=2zα/2n[12(π12)2(π12)42(π12)6O((π12)8)]. (13)

Taking expectations on both sides of Equation (13) and noting that the values for E(π12)2k>0 for all k in the summand completes the proof.

The implications of Theorems 1a, b and 2 may be summarized as follows:

  1. From Theorems 1a and b: If we anticipate that π0=12, then solving for n from Equation (7) will guarantee a high probability that the calculated width WAC will be less than the desired width w0. If the sample size turns out to be odd then it is impossible for the calculated interval to be as wide or wider than w0, i.e. WAC<w0 for odd sample sizes.

  2. From Theorem 2: If π0=12, then the value for the width of the confidence interval that we would expect from repeated experiments is less than the desired w0.

3. General case and applicability to other intervals

In this section, we will investigate the properties of WAC for the case π012.

THEOREM 3 P(WAC<w0|π012)12 as n → ∞.

Proof Given π012, the probability distribution of WAC is given in Equation (10). By Equation (11), we can simplify P(WAC<w0|π012) as

P(WAC<w0|π012)=P(π(1π)<π0(1π0)|π012). (14)

Based on the property of function f (x) = x(1 − x), we can conclude that Equation (14) can be further simplified for π0<12 and π0>12, respectively.

P(π(1π)<π0(1π0)|π0<12)=P(π<π0orπ>1π0|π0<12)=P(Y<π0(n+zα/22)zα/222orY>(1π0)(n+zα/22)zα/222|π0<12), (15)

where P(Y=y|π0)=(ny)π0y(1π0)ny. Through this derivation, the probability function in Equation (14) can be calculated through the binomial density function and is closer to 0.5 as n → ∞.

THEOREM 4 The E(WAC|π0)<w0 where π0 ∈ (0, 1)

Proof Following the proof of Theorem 2, by taking the expectation on both sides of the equation for Taylor expansion at π0, we can reach the conclusion since E(ππ0)2k>0 for π012 case. Theorem 2 gives us the truth that E(WAC|π0=12)<w0, which allows us to complete the proof.

From Theorems 3 and 4, we learn that although the WAC is not strictly less than the desired width w0 for π012, the value for the width of the confidence interval that we would expect from repeated experiments is still less than the desired w0.

Using the same rationale as the Agresti–Coull interval, we will arrive at the same conclusions for the Wald and Wilson Score intervals as Theorems 1a and b that given π0=12, the maximum of WW and WS are strictly less than w0 for n = odd, while the maximum value WW and WS may attain is equal to w0 for n = even. For the conditional expectation, E(WS|π0)<w0 where π0 ∈ (0, 1), which can be developed from the Taylor expansion following the proof of Theorem 2.

4. Comparison of fixed versus random interval approaches

In this section, we examined the fixed confidence interval approaches, including the Agresti–Coull, Wald and Wilson Score confidence intervals and compared them with our approach for calculating sample sizes based on the expectation of W given π0. Different w0’s and the confidence interval widths given π0 were calculated for sample sizes 10, 11, 100, 101, 200, 201, 1000, 1001, 2000 and 2001 for the cases π0 = 0.5, 0.95 and 0.975 in Tables 19. The probability that W is less than w0 given π0 was obtained by summing up the probabilities for which W is less than w0. The conditional expectations of W given π0 were also calculated. The n*’s, which are the suggested sample sizes by us based on E(WAC|π0)>w0, were given for the corresponding sample sizes. Additionally, we investigated the coverage probabilities for the given sample size and the suggested sample size n* through 100,000 simulations.

Table 1.

Behavior of the Agresti–Coull confidence interval width WAC as compared to the fixed value w0 at π0=12 and α = 0.05.

Sample size w0
P(WAC<w0|π0=12)
E(WAC|π0=12)
n* CPn
CPn
10 0.5268 0.7539 0.5125 9 0.9783 0.9621
11 0.5088 1.0000 0.4955 10 0.9342 0.9794
100 0.1923 0.9204 0.1914 99 0.9439 0.9564
101 0.1914 1.0000 0.1905 100 0.9545 0.9438
200 0.1373 0.9437 0.1369 199 0.9454 0.9536
201 0.1369 1.0000 0.1366 200 0.9518 0.9449
1000 0.0619 0.9748 0.0618 999 0.9458 0.9494
1001 0.0618 1.0000 0.0618 1000 0.9510 0.9466
2000 0.0438 0.9822 0.0438 1999 0.9478 0.9514
2001 0.0438 1.0000 0.0438 2000 0.9510 0.9488

Notes: n*, suggested sample size based on E(WAC|π0=12)>w0; CPn, coverage probability for sample size n; CPn, coverage probability for sample size n*.

Table 9.

Behavior of the Wilson (Score) confidence interval width WS as compared to the fixed value w0 at π0 = 0.975 and α = 0.05.

Sample size w0
P(WS<w0|π0=0.975)
E(WS|π0=0.975)
n* CPn
CPn
10 0.3108 0.7763 0.3036 9 0.9750 0.9804
11 0.2927 0.7569 0.2857 10 0.9693 0.975
100 0.0696 0.5422 0.0676 95 0.9608 0.9682
101 0.0692 0.5357 0.0672 96 0.9581 0.9664
200 0.0465 0.4383 0.0456 193 0.9626 0.9682
201 0.0463 0.6115 0.0454 194 0.9626 0.9677
1000 0.0197 0.4724 0.0196 990 0.9466 0.9485
1001 0.0196 0.5509 0.0196 991 0.9474 0.9477
2000 0.0138 0.4805 0.0138 1990 0.9480 0.9488
2001 0.0138 0.5361 0.0138 1991 0.9469 0.9488

Notes: n*, suggested sample size based on E(WS|π0=0.975)>w0; CPn, coverage probability for sample size n; CPn, coverage probability for sample size n*.

In Table 1, the WAC’s are less than w0 for all odd sample sizes, thus w0 is not attainable. For even sample sizes, the P(WAC<w0|π0=12) increases as a function of n as per Corollary 1b. In terms of the practical implications, we would likely require a sample size of n* = n − 1 to carry out the study based on the metric E(WAC|π0=12). Although the preferred sample sizes are smaller than the original sample sizes, we find that the coverage probabilities are closer to nominal level from an examination of the tables. The quantities of interest for π0 = 0.95 and π0 = 0.975 cases are obtained in Tables 2 and 3. In general, the P(WAC<w0|π0) becomes closer to 0.5 as the sample size increases. We notice that n* would be n − k, where k is a positive number and will increase as n gets larger and π0 gets further from 12. For example, for Table 2 we find empirically n* = ⌊n − log(n) + 1⌋ + 1. It is interesting to point out that the coverage probabilities for n and n* are very competitive for all kinds of sample sizes and parameter settings.

Table 2.

Behavior of the Agresti–Coull confidence interval width WAC as compared to the fixed value w0 at π0 = 0.95 and α = 0.05.

Sample size w0
P(WAC<w0|π0=0.95)
E(WAC|π0=0.95)
n* CPn
CPn
10 0.4002 0.5987 0.3948 9 0.9882 0.9289
11 0.3790 0.5688 0.3737 10 0.9842 0.9886
100 0.0959 0.4360 0.0945 97 0.9663 0.9699
101 0.0954 0.6070 0.0940 98 0.9641 0.9691
200 0.0644 0.4547 0.0638 196 0.9659 0.9516
201 0.0642 0.5766 0.0636 197 0.9662 0.9703
1000 0.0274 0.4797 0.0273 995 0.9522 0.9519
1001 0.0274 0.5346 0.0273 996 0.9510 0.9518
2000 0.0192 0.4857 0.0192 1994 0.9537 0.9490
2001 0.0192 0.5245 0.0192 1995 0.9537 0.9488

Notes: n*, suggested sample size based on E(WAC|π0=0.95)>w0; CPn, coverage probability for sample size n; CPn, coverage probability for sample size n*.

Table 3.

Behavior of the Agresti–Coull confidence interval width WAC as compared to the fixed value w0 at π0 = 0.975 and α = 0.05.

Sample size w0
P(WAC<w0|π0=0.975)
E(WAC|π0=0.975)
n* CPn
CPn
10 0.3831 0.7763 0.3801 9 0.9750 0.9804
11 0.3613 0.7569 0.3582 10 0.9693 0.9750
100 0.0777 0.5422 0.0763 97 0.9608 0.9650
101 0.0771 0.5357 0.0758 98 0.9864 0.9634
200 0.0497 0.4383 0.0490 195 0.9626 0.9668
201 0.0496 0.6115 0.0489 196 0.9626 0.9664
1000 0.0200 0.4724 0.0199 991 0.9466 0.9588
1001 0.0200 0.5509 0.0199 992 0.9474 0.9581
2000 0.0139 0.4805 0.0139 1990 0.9480 0.9564
2001 0.0139 0.5361 0.0139 1991 0.9469 0.9563

Notes: n*, suggested sample size based on E(WAC|π0=0.975)>w0; CPn, coverage probability for sample size n; CPn, coverage probability for sample size n*.

As we can see from the results in Tables 49, the majority of the preferred sample sizes for the Wald and Wilson Score confidence intervals are less than the ones we have from Agresti–Coull interval in the corresponding parameter settings. n*’s are not attainable in Tables 5 and 6, which means we cannot find sample sizes such that the expected value of confidence interval width is greater than the desired confidence interval width. For π0=12, we find empirically the suggested sample size to carry out the study

nW=nS=nAC1=n2. (16)

For π012, the differences between n and n* are larger than the ones we have from the Agresti–Coull interval. Therefore, if researchers are inclined to choose a sample size calculation method from the Agresti–Coull, Wald or Wilson Score interval without adjustment, we suggest the Agresti–Coull interval since the obtained sample size would be closer to the sample size based on the conditional expectation of the random variable W given π0.

Table 4.

Behavior of the Wald confidence interval width WW as compared to the fixed value π0=12 and α = 0.05.

Sample size w0
P(WW<w0|π0=12)
E(WW|π0=12)
n* CPn
CPn
10 0.6198 0.7539 0.5856 8 0.8907 0.9307
11 0.5910 1.0000 0.5617 9 0.9342 0.8198
100 0.1960 0.9204 0.1950 98 0.9439 0.9466
101 0.1950 1.0000 0.1941 99 0.9545 0.9567
200 0.1386 0.9437 0.1382 198 0.9454 0.9456
201 0.1382 1.0000 0.1379 199 0.9518 0.9538
1000 0.0620 0.9748 0.0619 998 0.9458 0.9457
1001 0.0619 1.0000 0.0619 999 0.9510 0.9502
2000 0.0438 0.9822 0.0438 1998 0.9478 0.9487
2001 0.0438 1.0000 0.0438 1999 0.9510 0.9519

Notes: n*, suggested sample size based on E(WW|π0=12)>w0; CPn, coverage probability for sample size n; CPn, coverage probability for sample size n*.

Table 5.

Behavior of the Wald confidence interval width WW as compared to the fixed value w0 at π0 = 0.95 and α = 0.05.

Sample size w0
P(WW<w0|π0=0.95)
E(WW|π0=0.95)
n* CPn
CPn
10 0.2702 0.5987 0.1608 NAb 0.4022 NAb
11 0.2576 0.5688 0.1595 NAb 0.4311 NAb
100 0.0854 0.4360 0.0829 93 0.8770 0.9439
101 0.0850 0.6070 0.0825 94 0.8819 0.9447
200 0.0604 0.4547 0.0596 194 0.9241 0.9175
201 0.0603 0.5766 0.0594 195 0.9256 0.9188
1000 0.0270 0.4797 0.0269 994 0.9436 0.9474
1001 0.0270 0.5346 0.0269 995 0.9431 0.9542
2000 0.0191 0.4857 0.0191 1994 0.9458 0.9447
2001 0.0191 0.5245 0.0191 1995 0.9458 0.9463

Notes: n*, suggested sample size based on E(WW|π0=0.95)>w0; CPn, coverage probability for sample size n; CPn, coverage probability for sample size n*;

b

, n* and CPn are not attainable for the corresponding sample size.

Table 6.

Behavior of the Wald confidence interval width WW as compared to the fixed value w0 at π0 = 0.975 and α = 0.05.

Sample size w0
P(WW<w0|π0=0.975)
E(WW|π0=0.975)
n* CPn
CPn
10 0.1935 0.7763 0.0864 NAb 0.2256 NAb
11 0.1845 0.7569 0.0862 NAb 0.2428 NAb
100 0.0612 0.5422 0.0565 80 0.9163 0.8635
101 0.0609 0.5357 0.0563 81 0.9180 0.8664
200 0.0433 0.4383 0.0420 187 0.8724 0.9410
201 0.0432 0.6115 0.0419 188 0.8740 0.9477
1000 0.0194 0.4724 0.0193 989 0.9497 0.9481
1001 0.0193 0.5509 0.0192 990 0.9287 0.9488
2000 0.0137 0.4805 0.0136 1989 0.9397 0.9500
2001 0.0137 0.5361 0.0136 1990 0.9393 0.9508

Notes: n*, suggested sample size based on E(WW|π0=0.975)>w0; CPn, coverage probability for sample size n; CPn, coverage probability for sample size n*;

b

, n* and CPn are not attainable for the corresponding sample size.

The adjusted sample size calculation method can guarantee that the expected confidence interval width given a hypothesized proportion will not be less than the desired confidence interval width. With the reduced sample size, the coverage probability can still maintain at the nominal level. It would be appealing to extend this idea to other discrete or continuous distributions.

Table 7.

Behavior of the Wilson (Score) confidence interval width WS as compared to the fixed value w0 at π0=12 and α = 0.05.

Sample size w0
P(WS<w0|π0=12)
E(WS|π0=12)
n* CPn
CPn
10 0.5268 0.7539 0.5066 8 0.9783 0.9307
11 0.5088 1.0000 0.4906 9 0.9342 0.9612
100 0.1923 0.9204 0.1914 98 0.9439 0.9466
101 0.1914 1.0000 0.1905 99 0.9545 0.9567
200 0.1373 0.9437 0.1369 198 0.9454 0.9456
201 0.1369 1.0000 0.1366 199 0.9518 0.9538
1000 0.0619 0.9748 0.0618 998 0.9458 0.9457
1001 0.0618 1.0000 0.0618 999 0.9510 0.9502
2000 0.0438 0.9822 0.0438 1998 0.9478 0.9487
2001 0.0438 1.0000 0.0438 1999 0.9510 0.9519

Notes: n*, suggested sample size based on E(WS|π0=12)>w0; CPn, coverage probability for sample size n; CPn, coverage probability for sample size n*.

Table 8.

Behavior of the Wilson (Score) confidence interval width WS as compared to the fixed value w0 at π0 = 0.95 and α = 0.05.

Sample size w0
P(WS<w0|π0=0.95)
E(WS|π0=0.95)
n* CPn
CPn
10 0.3393 0.5987 0.3274 9 0.9136 0.9289
11 0.3216 0.5688 0.3102 10 0.8968 0.9138
100 0.0902 0.4360 0.0884 96 0.9663 0.9422
101 0.0897 0.6070 0.0879 97 0.9641 0.9706
200 0.0622 0.4547 0.0615 195 0.9659 0.9524
201 0.0620 0.5766 0.0613 196 0.9662 0.9529
1000 0.0272 0.4797 0.0271 994 0.9522 0.9521
1001 0.0272 0.5346 0.0271 995 0.9510 0.9518
2000 0.0192 0.4857 0.0191 1994 0.9537 0.9490
2001 0.0192 0.5245 0.0191 1995 0.9537 0.9488

Notes: n*, suggested sample size based on E(WS|π0=0.95)>w0; CPn, coverage probability for sample size n; CPn, coverage probability for sample size n*.

References

  • 1.Agresti A, Coull B. Approximate is better than ‘exact’ for interval estimation of binomial proportions. Amer Statist. 1998;52:119–126. [Google Scholar]
  • 2.Arkin C, Wachtel M. How many patients are necessary to assess test performance. JAMA. 1990;263:275–278. [PubMed] [Google Scholar]
  • 3.Brown L, Cai T, DasGupta A. Interval estimation for a binomial proportion. Statist Sci. 2001;16:101–117. [Google Scholar]
  • 4.Brown L, Cai T, DasGupta A. Confidence intervals for a binomial proportion and asymptotic expansions. Ann Statist. 2002;30:160–201. [Google Scholar]
  • 5.Piegorsch W. Sample sizes for improved binomial confidence intervals. Comput Statist Data Anal. 2004;46:309–316. [Google Scholar]
  • 6.Wilson E. Probable inference, the law of succession, and statistical inference. J Amer Statist Assoc. 1927;22:209–212. [Google Scholar]

RESOURCES