Abstract
In this article we examine sample size calculations for a binomial proportion based on the confidence interval width of the Agresti–Coull, Wald and Wilson Score intervals. We pointed out that the commonly used methods based on known and fixed standard errors cannot guarantee the desired confidence interval width given a hypothesized proportion. Therefore, a new adjusted sample size calculation method was introduced, which is based on the conditional expectation of the width of the confidence interval given the hypothesized proportion. With the reduced sample size, the coverage probability can still maintain at the nominal level and is very competitive to the converge probability for the original sample size.
Keywords: binomial proportion, sample size calculation, expected width, Agresti—Coull interval, Wald interval, Wilson Score interval
1. Introduction
A common problem in biostatistics is the estimation of a binomial parameter, π, based on a binomial observation, Y ~ B(n, π), e.g. π may represent the fraction surviving beyond one year, the sensitivity or specificity of a diagnostic test, or the proportion of values outside the normal range, to name just a few common applications. In early stage biomedical investigations of pilot studies, researchers are oftentimes more inclined to base their inferences about π = π0 through the use of the (1 − α) × 100% confidence interval for π versus a formal hypothesis test of H0 : π = π0. The reasons for this preference are varied, but generally involve particulars in early phase investigations where very little is known about π or historically confidence intervals are the preferred approach within a given subspeciality, e.g. diagnostic testing.
It is well known that the maximum-likelihood estimator for π is given by and that asymptotically . Historically, this result was the basis for the large sample (1 − α) × 100% approximate confidence interval for π given by
(1) |
where zα/2 is the upper (1 − α/2)th quantile from a standard normal distribution, oftentimes given as Φ−1(1 − α/2). The interval given by Equation (1) is referred to as a Wald interval for π [1].
In recent years, there have been several improvements to the approximate interval given by Equation (1), e.g. see Piegorsch [5] for an excellent survey of these results. Most notable improvement is a straightforward correction developed by Brown et al. [3,4], which extends the approach of Agresti and Coull [1], and is given by
(2) |
where the modified estimator of π is given by and , for details see Brown et al. [3,4]. This approach improves the coverage probabilities of the confidence interval for π substantially as compared with the Wald-type interval for a general level α.
Wilson [6] gave an interval estimation approach which is the inversion of the score test for π, and is given by
(3) |
By simple derivation, it can be expressed as
(4) |
where , ñ and are as above.
This interval is also called the Wilson Score interval. The coverage probability is closer to the nominal value compared with the Wald interval and it has good properties even for a small number of trials and/or an extreme probability.
In terms of deriving an estimate for sample size n required to carry out a given study, the basic recommendation based on Equation (1) is to solve the equation
(5) |
for n at some desired interval width w0 = 2δ0 and given some hypothesized and ‘known’ π0, e.g. see the oft-cited article by Arkin and Wachtel [2]. This method is widely used even though it is not particularly accurate, justifying Piegorsch’s [5] comment that it is ‘methodus non-gratus’. Improvements in sample size estimation based on more accurate intervals such as the one developed by Brown et al. [3,4] at Equation (2) are provided in Piegorsch [5]. We argue, however, that this is the incorrect parameterization of the problem at hand even if the more accurate interval estimators are utilized, i.e. Is this the correct approach to ‘guarantee’ the appropriate confidence interval width given π0 is known? Given that the calculated interval width is a random variable, it is impossible to correctly ensure a priori, the fixed width w0. We can, however, calculate the distribution of the random variable corresponding to the interval width conditional on π0 and its associated quantities. Furthermore, we illustrate in later sections that the desired width as defined as w0 may not even be achievable, i.e. the range of support for the random variable falls below w0, where , and denote the estimated width for Wald, Agresti–Coull and Wilson Score methods, respectively, and the corresponding half interval lengths are given by
(6) |
(7) |
and
(8) |
Note also that even if the range of does encompass w0, the is quite high or even equal to 1. To summarize, the expected width of is less than w0. This will be illustrated in more detail in the next few sections.
In the following sections, we will introduce a new sample size calculation method, from which the improved interval is based upon, i.e. solve
(9) |
for n corresponding to a desired width of wE given a fixed sample size, versus a cutpoint w0 that may not even fall within the support of .
2. Specific case
The specific case that is very important theoretically given that it is oftentimes recommended as the default conservative approach given that nothing is known about π since it provides the maximum w0 for a given n. It has been noted by Brown et al. [3,4] that Agresti–Coull confidence interval outperforms the Wald intervals. Hence, we will use the Agresti–Coull interval for derivation of the theorems in this section and then generalize the properties in Section 3 for the other two intervals. For the Agresti–Coull interval, the probability distribution of is given by
(10) |
where
(11) |
and
(12) |
The sample space for is given by the set with associated binomial probabilities given in Equation (10).
In order to develop our argument for the appropriate sample size metric, we first present two very straightforward theorems:THEOREM 1a (n = odd) Given and a sample size of n = 2m + 1, m = 0, 1, 2, …, the maximum of is strictly less than w0.
Proof The proof is straightforward in that the maximum for occurs at the value of where is maximized, which occurs at the values for which Y = (n − 1)/2 and Y = (n + 1)/2 corresponding to values for of and , respectively. The width w0 is maximized for the value of π0 for which is maximized, which occurs at . Noting that at Y = (n − 1)/2 and Y = (n + 1)/2 are strictly smaller than completes the proof.
THEOREM 1b (n = even) Given and a sample size of n = 2m, m = 0, 1, 2, …, the maximum value that may attain is equal to w0.
Proof The proof follows a similar logic as Theorem 1a in that the maximum for occurs at Y = n/2.
COROLLARY 1b For sample sizes of n = 2m, m = 0, 1, 2, …, the as n → ∞.
Proof The proof follows from Theorem 1b in that , which occurs at Y = n/2.
THEOREM 2 The .
Proof Expand in a Taylor series about . Such that
(13) |
Taking expectations on both sides of Equation (13) and noting that the values for for all k in the summand completes the proof.
The implications of Theorems 1a, b and 2 may be summarized as follows:
From Theorems 1a and b: If we anticipate that , then solving for n from Equation (7) will guarantee a high probability that the calculated width will be less than the desired width w0. If the sample size turns out to be odd then it is impossible for the calculated interval to be as wide or wider than w0, i.e. for odd sample sizes.
From Theorem 2: If , then the value for the width of the confidence interval that we would expect from repeated experiments is less than the desired w0.
3. General case and applicability to other intervals
In this section, we will investigate the properties of for the case .
THEOREM 3 as n → ∞.
Proof Given , the probability distribution of WAC is given in Equation (10). By Equation (11), we can simplify as
(14) |
Based on the property of function f (x) = x(1 − x), we can conclude that Equation (14) can be further simplified for and , respectively.
(15) |
where . Through this derivation, the probability function in Equation (14) can be calculated through the binomial density function and is closer to 0.5 as n → ∞.
THEOREM 4 The where π0 ∈ (0, 1)
Proof Following the proof of Theorem 2, by taking the expectation on both sides of the equation for Taylor expansion at π0, we can reach the conclusion since for case. Theorem 2 gives us the truth that , which allows us to complete the proof.
From Theorems 3 and 4, we learn that although the is not strictly less than the desired width w0 for , the value for the width of the confidence interval that we would expect from repeated experiments is still less than the desired w0.
Using the same rationale as the Agresti–Coull interval, we will arrive at the same conclusions for the Wald and Wilson Score intervals as Theorems 1a and b that given , the maximum of and are strictly less than w0 for n = odd, while the maximum value and may attain is equal to w0 for n = even. For the conditional expectation, where π0 ∈ (0, 1), which can be developed from the Taylor expansion following the proof of Theorem 2.
4. Comparison of fixed versus random interval approaches
In this section, we examined the fixed confidence interval approaches, including the Agresti–Coull, Wald and Wilson Score confidence intervals and compared them with our approach for calculating sample sizes based on the expectation of given π0. Different w0’s and the confidence interval widths given π0 were calculated for sample sizes 10, 11, 100, 101, 200, 201, 1000, 1001, 2000 and 2001 for the cases π0 = 0.5, 0.95 and 0.975 in Tables 1–9. The probability that is less than w0 given π0 was obtained by summing up the probabilities for which is less than w0. The conditional expectations of given π0 were also calculated. The n*’s, which are the suggested sample sizes by us based on , were given for the corresponding sample sizes. Additionally, we investigated the coverage probabilities for the given sample size and the suggested sample size n* through 100,000 simulations.
Table 1.
Sample size | w0 |
|
|
n* | CPn |
|
|||
---|---|---|---|---|---|---|---|---|---|
10 | 0.5268 | 0.7539 | 0.5125 | 9 | 0.9783 | 0.9621 | |||
11 | 0.5088 | 1.0000 | 0.4955 | 10 | 0.9342 | 0.9794 | |||
100 | 0.1923 | 0.9204 | 0.1914 | 99 | 0.9439 | 0.9564 | |||
101 | 0.1914 | 1.0000 | 0.1905 | 100 | 0.9545 | 0.9438 | |||
200 | 0.1373 | 0.9437 | 0.1369 | 199 | 0.9454 | 0.9536 | |||
201 | 0.1369 | 1.0000 | 0.1366 | 200 | 0.9518 | 0.9449 | |||
1000 | 0.0619 | 0.9748 | 0.0618 | 999 | 0.9458 | 0.9494 | |||
1001 | 0.0618 | 1.0000 | 0.0618 | 1000 | 0.9510 | 0.9466 | |||
2000 | 0.0438 | 0.9822 | 0.0438 | 1999 | 0.9478 | 0.9514 | |||
2001 | 0.0438 | 1.0000 | 0.0438 | 2000 | 0.9510 | 0.9488 |
Notes: n*, suggested sample size based on ; CPn, coverage probability for sample size n; , coverage probability for sample size n*.
Table 9.
Sample size | w0 |
|
|
n* | CPn |
|
|||
---|---|---|---|---|---|---|---|---|---|
10 | 0.3108 | 0.7763 | 0.3036 | 9 | 0.9750 | 0.9804 | |||
11 | 0.2927 | 0.7569 | 0.2857 | 10 | 0.9693 | 0.975 | |||
100 | 0.0696 | 0.5422 | 0.0676 | 95 | 0.9608 | 0.9682 | |||
101 | 0.0692 | 0.5357 | 0.0672 | 96 | 0.9581 | 0.9664 | |||
200 | 0.0465 | 0.4383 | 0.0456 | 193 | 0.9626 | 0.9682 | |||
201 | 0.0463 | 0.6115 | 0.0454 | 194 | 0.9626 | 0.9677 | |||
1000 | 0.0197 | 0.4724 | 0.0196 | 990 | 0.9466 | 0.9485 | |||
1001 | 0.0196 | 0.5509 | 0.0196 | 991 | 0.9474 | 0.9477 | |||
2000 | 0.0138 | 0.4805 | 0.0138 | 1990 | 0.9480 | 0.9488 | |||
2001 | 0.0138 | 0.5361 | 0.0138 | 1991 | 0.9469 | 0.9488 |
Notes: n*, suggested sample size based on ; CPn, coverage probability for sample size n; , coverage probability for sample size n*.
In Table 1, the ’s are less than w0 for all odd sample sizes, thus w0 is not attainable. For even sample sizes, the increases as a function of n as per Corollary 1b. In terms of the practical implications, we would likely require a sample size of n* = n − 1 to carry out the study based on the metric . Although the preferred sample sizes are smaller than the original sample sizes, we find that the coverage probabilities are closer to nominal level from an examination of the tables. The quantities of interest for π0 = 0.95 and π0 = 0.975 cases are obtained in Tables 2 and 3. In general, the becomes closer to 0.5 as the sample size increases. We notice that n* would be n − k, where k is a positive number and will increase as n gets larger and π0 gets further from . For example, for Table 2 we find empirically n* = ⌊n − log(n) + 1⌋ + 1. It is interesting to point out that the coverage probabilities for n and n* are very competitive for all kinds of sample sizes and parameter settings.
Table 2.
Sample size | w0 |
|
|
n* | CPn |
|
|||
---|---|---|---|---|---|---|---|---|---|
10 | 0.4002 | 0.5987 | 0.3948 | 9 | 0.9882 | 0.9289 | |||
11 | 0.3790 | 0.5688 | 0.3737 | 10 | 0.9842 | 0.9886 | |||
100 | 0.0959 | 0.4360 | 0.0945 | 97 | 0.9663 | 0.9699 | |||
101 | 0.0954 | 0.6070 | 0.0940 | 98 | 0.9641 | 0.9691 | |||
200 | 0.0644 | 0.4547 | 0.0638 | 196 | 0.9659 | 0.9516 | |||
201 | 0.0642 | 0.5766 | 0.0636 | 197 | 0.9662 | 0.9703 | |||
1000 | 0.0274 | 0.4797 | 0.0273 | 995 | 0.9522 | 0.9519 | |||
1001 | 0.0274 | 0.5346 | 0.0273 | 996 | 0.9510 | 0.9518 | |||
2000 | 0.0192 | 0.4857 | 0.0192 | 1994 | 0.9537 | 0.9490 | |||
2001 | 0.0192 | 0.5245 | 0.0192 | 1995 | 0.9537 | 0.9488 |
Notes: n*, suggested sample size based on ; CPn, coverage probability for sample size n; , coverage probability for sample size n*.
Table 3.
Sample size | w0 |
|
|
n* | CPn |
|
|||
---|---|---|---|---|---|---|---|---|---|
10 | 0.3831 | 0.7763 | 0.3801 | 9 | 0.9750 | 0.9804 | |||
11 | 0.3613 | 0.7569 | 0.3582 | 10 | 0.9693 | 0.9750 | |||
100 | 0.0777 | 0.5422 | 0.0763 | 97 | 0.9608 | 0.9650 | |||
101 | 0.0771 | 0.5357 | 0.0758 | 98 | 0.9864 | 0.9634 | |||
200 | 0.0497 | 0.4383 | 0.0490 | 195 | 0.9626 | 0.9668 | |||
201 | 0.0496 | 0.6115 | 0.0489 | 196 | 0.9626 | 0.9664 | |||
1000 | 0.0200 | 0.4724 | 0.0199 | 991 | 0.9466 | 0.9588 | |||
1001 | 0.0200 | 0.5509 | 0.0199 | 992 | 0.9474 | 0.9581 | |||
2000 | 0.0139 | 0.4805 | 0.0139 | 1990 | 0.9480 | 0.9564 | |||
2001 | 0.0139 | 0.5361 | 0.0139 | 1991 | 0.9469 | 0.9563 |
Notes: n*, suggested sample size based on ; CPn, coverage probability for sample size n; , coverage probability for sample size n*.
As we can see from the results in Tables 4–9, the majority of the preferred sample sizes for the Wald and Wilson Score confidence intervals are less than the ones we have from Agresti–Coull interval in the corresponding parameter settings. n*’s are not attainable in Tables 5 and 6, which means we cannot find sample sizes such that the expected value of confidence interval width is greater than the desired confidence interval width. For , we find empirically the suggested sample size to carry out the study
(16) |
For , the differences between n and n* are larger than the ones we have from the Agresti–Coull interval. Therefore, if researchers are inclined to choose a sample size calculation method from the Agresti–Coull, Wald or Wilson Score interval without adjustment, we suggest the Agresti–Coull interval since the obtained sample size would be closer to the sample size based on the conditional expectation of the random variable given π0.
Table 4.
Sample size | w0 |
|
|
n* | CPn |
|
|||
---|---|---|---|---|---|---|---|---|---|
10 | 0.6198 | 0.7539 | 0.5856 | 8 | 0.8907 | 0.9307 | |||
11 | 0.5910 | 1.0000 | 0.5617 | 9 | 0.9342 | 0.8198 | |||
100 | 0.1960 | 0.9204 | 0.1950 | 98 | 0.9439 | 0.9466 | |||
101 | 0.1950 | 1.0000 | 0.1941 | 99 | 0.9545 | 0.9567 | |||
200 | 0.1386 | 0.9437 | 0.1382 | 198 | 0.9454 | 0.9456 | |||
201 | 0.1382 | 1.0000 | 0.1379 | 199 | 0.9518 | 0.9538 | |||
1000 | 0.0620 | 0.9748 | 0.0619 | 998 | 0.9458 | 0.9457 | |||
1001 | 0.0619 | 1.0000 | 0.0619 | 999 | 0.9510 | 0.9502 | |||
2000 | 0.0438 | 0.9822 | 0.0438 | 1998 | 0.9478 | 0.9487 | |||
2001 | 0.0438 | 1.0000 | 0.0438 | 1999 | 0.9510 | 0.9519 |
Notes: n*, suggested sample size based on ; CPn, coverage probability for sample size n; , coverage probability for sample size n*.
Table 5.
Sample size | w0 |
|
|
n* | CPn |
|
|||
---|---|---|---|---|---|---|---|---|---|
10 | 0.2702 | 0.5987 | 0.1608 | NAb | 0.4022 | NAb | |||
11 | 0.2576 | 0.5688 | 0.1595 | NAb | 0.4311 | NAb | |||
100 | 0.0854 | 0.4360 | 0.0829 | 93 | 0.8770 | 0.9439 | |||
101 | 0.0850 | 0.6070 | 0.0825 | 94 | 0.8819 | 0.9447 | |||
200 | 0.0604 | 0.4547 | 0.0596 | 194 | 0.9241 | 0.9175 | |||
201 | 0.0603 | 0.5766 | 0.0594 | 195 | 0.9256 | 0.9188 | |||
1000 | 0.0270 | 0.4797 | 0.0269 | 994 | 0.9436 | 0.9474 | |||
1001 | 0.0270 | 0.5346 | 0.0269 | 995 | 0.9431 | 0.9542 | |||
2000 | 0.0191 | 0.4857 | 0.0191 | 1994 | 0.9458 | 0.9447 | |||
2001 | 0.0191 | 0.5245 | 0.0191 | 1995 | 0.9458 | 0.9463 |
Notes: n*, suggested sample size based on ; CPn, coverage probability for sample size n; , coverage probability for sample size n*;
, n* and are not attainable for the corresponding sample size.
Table 6.
Sample size | w0 |
|
|
n* | CPn |
|
|||
---|---|---|---|---|---|---|---|---|---|
10 | 0.1935 | 0.7763 | 0.0864 | NAb | 0.2256 | NAb | |||
11 | 0.1845 | 0.7569 | 0.0862 | NAb | 0.2428 | NAb | |||
100 | 0.0612 | 0.5422 | 0.0565 | 80 | 0.9163 | 0.8635 | |||
101 | 0.0609 | 0.5357 | 0.0563 | 81 | 0.9180 | 0.8664 | |||
200 | 0.0433 | 0.4383 | 0.0420 | 187 | 0.8724 | 0.9410 | |||
201 | 0.0432 | 0.6115 | 0.0419 | 188 | 0.8740 | 0.9477 | |||
1000 | 0.0194 | 0.4724 | 0.0193 | 989 | 0.9497 | 0.9481 | |||
1001 | 0.0193 | 0.5509 | 0.0192 | 990 | 0.9287 | 0.9488 | |||
2000 | 0.0137 | 0.4805 | 0.0136 | 1989 | 0.9397 | 0.9500 | |||
2001 | 0.0137 | 0.5361 | 0.0136 | 1990 | 0.9393 | 0.9508 |
Notes: n*, suggested sample size based on ; CPn, coverage probability for sample size n; , coverage probability for sample size n*;
, n* and are not attainable for the corresponding sample size.
The adjusted sample size calculation method can guarantee that the expected confidence interval width given a hypothesized proportion will not be less than the desired confidence interval width. With the reduced sample size, the coverage probability can still maintain at the nominal level. It would be appealing to extend this idea to other discrete or continuous distributions.
Table 7.
Sample size | w0 |
|
|
n* | CPn |
|
|||
---|---|---|---|---|---|---|---|---|---|
10 | 0.5268 | 0.7539 | 0.5066 | 8 | 0.9783 | 0.9307 | |||
11 | 0.5088 | 1.0000 | 0.4906 | 9 | 0.9342 | 0.9612 | |||
100 | 0.1923 | 0.9204 | 0.1914 | 98 | 0.9439 | 0.9466 | |||
101 | 0.1914 | 1.0000 | 0.1905 | 99 | 0.9545 | 0.9567 | |||
200 | 0.1373 | 0.9437 | 0.1369 | 198 | 0.9454 | 0.9456 | |||
201 | 0.1369 | 1.0000 | 0.1366 | 199 | 0.9518 | 0.9538 | |||
1000 | 0.0619 | 0.9748 | 0.0618 | 998 | 0.9458 | 0.9457 | |||
1001 | 0.0618 | 1.0000 | 0.0618 | 999 | 0.9510 | 0.9502 | |||
2000 | 0.0438 | 0.9822 | 0.0438 | 1998 | 0.9478 | 0.9487 | |||
2001 | 0.0438 | 1.0000 | 0.0438 | 1999 | 0.9510 | 0.9519 |
Notes: n*, suggested sample size based on ; CPn, coverage probability for sample size n; , coverage probability for sample size n*.
Table 8.
Sample size | w0 |
|
|
n* | CPn |
|
|||
---|---|---|---|---|---|---|---|---|---|
10 | 0.3393 | 0.5987 | 0.3274 | 9 | 0.9136 | 0.9289 | |||
11 | 0.3216 | 0.5688 | 0.3102 | 10 | 0.8968 | 0.9138 | |||
100 | 0.0902 | 0.4360 | 0.0884 | 96 | 0.9663 | 0.9422 | |||
101 | 0.0897 | 0.6070 | 0.0879 | 97 | 0.9641 | 0.9706 | |||
200 | 0.0622 | 0.4547 | 0.0615 | 195 | 0.9659 | 0.9524 | |||
201 | 0.0620 | 0.5766 | 0.0613 | 196 | 0.9662 | 0.9529 | |||
1000 | 0.0272 | 0.4797 | 0.0271 | 994 | 0.9522 | 0.9521 | |||
1001 | 0.0272 | 0.5346 | 0.0271 | 995 | 0.9510 | 0.9518 | |||
2000 | 0.0192 | 0.4857 | 0.0191 | 1994 | 0.9537 | 0.9490 | |||
2001 | 0.0192 | 0.5245 | 0.0191 | 1995 | 0.9537 | 0.9488 |
Notes: n*, suggested sample size based on ; CPn, coverage probability for sample size n; , coverage probability for sample size n*.
References
- 1.Agresti A, Coull B. Approximate is better than ‘exact’ for interval estimation of binomial proportions. Amer Statist. 1998;52:119–126. [Google Scholar]
- 2.Arkin C, Wachtel M. How many patients are necessary to assess test performance. JAMA. 1990;263:275–278. [PubMed] [Google Scholar]
- 3.Brown L, Cai T, DasGupta A. Interval estimation for a binomial proportion. Statist Sci. 2001;16:101–117. [Google Scholar]
- 4.Brown L, Cai T, DasGupta A. Confidence intervals for a binomial proportion and asymptotic expansions. Ann Statist. 2002;30:160–201. [Google Scholar]
- 5.Piegorsch W. Sample sizes for improved binomial confidence intervals. Comput Statist Data Anal. 2004;46:309–316. [Google Scholar]
- 6.Wilson E. Probable inference, the law of succession, and statistical inference. J Amer Statist Assoc. 1927;22:209–212. [Google Scholar]