A comment on sample size calculations for binomial confidence intervals

Lai Wei; Alan D Hutson

doi:10.1080/02664763.2012.740629

. Author manuscript; available in PMC: 2016 Mar 15.

Published in final edited form as: J Appl Stat. 2012 Nov 5;40(2):311–319. doi: 10.1080/02664763.2012.740629

A comment on sample size calculations for binomial confidence intervals

Lai Wei ^1,^*, Alan D Hutson ¹

PMCID: PMC4792103 NIHMSID: NIHMS766152 PMID: 26989292

Abstract

In this article we examine sample size calculations for a binomial proportion based on the confidence interval width of the Agresti–Coull, Wald and Wilson Score intervals. We pointed out that the commonly used methods based on known and fixed standard errors cannot guarantee the desired confidence interval width given a hypothesized proportion. Therefore, a new adjusted sample size calculation method was introduced, which is based on the conditional expectation of the width of the confidence interval given the hypothesized proportion. With the reduced sample size, the coverage probability can still maintain at the nominal level and is very competitive to the converge probability for the original sample size.

Keywords: binomial proportion, sample size calculation, expected width, Agresti—Coull interval, Wald interval, Wilson Score interval

1. Introduction

A common problem in biostatistics is the estimation of a binomial parameter, π, based on a binomial observation, Y ~ B(n, π), e.g. π may represent the fraction surviving beyond one year, the sensitivity or specificity of a diagnostic test, or the proportion of values outside the normal range, to name just a few common applications. In early stage biomedical investigations of pilot studies, researchers are oftentimes more inclined to base their inferences about π = π₀ through the use of the (1 − α) × 100% confidence interval for π versus a formal hypothesis test of H₀ : π = π₀. The reasons for this preference are varied, but generally involve particulars in early phase investigations where very little is known about π or historically confidence intervals are the preferred approach within a given subspeciality, e.g. diagnostic testing.

It is well known that the maximum-likelihood estimator for π is given by $\hat{π} = Y / n$ and that asymptotically $\sqrt{n} (\hat{π} - π) / \sqrt{π (1 - π)} ~ N (0, 1)$ . Historically, this result was the basis for the large sample (1 − α) × 100% approximate confidence interval for π given by

\hat{π} \pm z_{α / 2} \sqrt{\frac{\hat{π} (1 - \hat{π})}{n}},

(1)

where z_α/₂ is the upper (1 − α/2)th quantile from a standard normal distribution, oftentimes given as Φ⁻¹(1 − α/2). The interval given by Equation (1) is referred to as a Wald interval for π [1].

In recent years, there have been several improvements to the approximate interval given by Equation (1), e.g. see Piegorsch [5] for an excellent survey of these results. Most notable improvement is a straightforward correction developed by Brown et al. [3,4], which extends the approach of Agresti and Coull [1], and is given by

\tilde{π} \pm z_{α / 2} \sqrt{\frac{\tilde{π} (1 - \tilde{π})}{\tilde{n}}},

(2)

where the modified estimator of π is given by $\tilde{π} = (Y + z_{α / 2}^{2} / 2) / (n + z_{α / 2}^{2})$ and $\tilde{n} = n + z_{α / 2}^{2}$ , for details see Brown et al. [3,4]. This approach improves the coverage probabilities of the confidence interval for π substantially as compared with the Wald-type interval for a general level α.

Wilson [6] gave an interval estimation approach which is the inversion of the score test for π, and is given by

\frac{\hat{π} + (1 / 2 n) z_{1 - α / 2}^{2} \pm z_{α / 2} \sqrt{\hat{π} (1 - \hat{π}) / n + (z_{α / 2}^{2} / 4 n^{2})}}{1 + (1 / n) z_{α / 2}^{2}} .

(3)

By simple derivation, it can be expressed as

\tilde{π} \pm \frac{z_{α / 2}}{\tilde{n}} \sqrt{n \hat{π} (1 - \hat{π}) + \frac{z_{α / 2}^{2}}{4}},

(4)

where $\tilde{π}$ , ñ and $\hat{π}$ are as above.

This interval is also called the Wilson Score interval. The coverage probability is closer to the nominal value compared with the Wald interval and it has good properties even for a small number of trials and/or an extreme probability.

In terms of deriving an estimate for sample size n required to carry out a given study, the basic recommendation based on Equation (1) is to solve the equation

δ_{0} = z_{α / 2} \sqrt{\frac{π_{0} (1 - π_{0})}{n}}

(5)

for n at some desired interval width w₀ = 2δ₀ and given some hypothesized and ‘known’ π₀, e.g. see the oft-cited article by Arkin and Wachtel [2]. This method is widely used even though it is not particularly accurate, justifying Piegorsch’s [5] comment that it is ‘methodus non-gratus’. Improvements in sample size estimation based on more accurate intervals such as the one developed by Brown et al. [3,4] at Equation (2) are provided in Piegorsch [5]. We argue, however, that this is the incorrect parameterization of the problem at hand even if the more accurate interval estimators are utilized, i.e. Is this the correct approach to ‘guarantee’ the appropriate confidence interval width given π₀ is known? Given that the calculated interval width is a random variable, it is impossible to correctly ensure a priori, the fixed width w₀. We can, however, calculate the distribution of the random variable corresponding to the interval width conditional on π₀ and its associated quantities. Furthermore, we illustrate in later sections that the desired width as defined as w₀ may not even be achievable, i.e. the range of support for the random variable $\tilde{W} = 2 \tilde{δ}$ falls below w₀, where ${\tilde{W}}_{W}$ , ${\tilde{W}}_{AC}$ and ${\tilde{W}}_{S}$ denote the estimated width for Wald, Agresti–Coull and Wilson Score methods, respectively, and the corresponding half interval lengths are given by

{\tilde{δ}}_{W} = z_{α / 2} \sqrt{\frac{\hat{π} (1 - \hat{π})}{n}},

(6)

{\tilde{δ}}_{AC} = z_{α / 2} \sqrt{\frac{\tilde{π} (1 - \tilde{π})}{\tilde{n}}},

(7)

and

{\tilde{δ}}_{S} = \frac{z_{α / 2}}{\tilde{n}} \sqrt{n \hat{π} (1 - \hat{π}) + \frac{z_{α / 2}^{2}}{4}} .

(8)

Note also that even if the range of $\tilde{W} = 2 \tilde{δ}$ does encompass w₀, the $P (\tilde{W} \leq w_{0})$ is quite high or even equal to 1. To summarize, the expected width of $\tilde{W}$ is less than w₀. This will be illustrated in more detail in the next few sections.

In the following sections, we will introduce a new sample size calculation method, from which the improved interval is based upon, i.e. solve

w_{E} = E (\tilde{W} | π_{0}),

(9)

for n corresponding to a desired width of w_E given a fixed sample size, versus a cutpoint w₀ that may not even fall within the support of $\tilde{W}$ .

2. Specific case $π_{0} = \frac{1}{2}$

The specific case that $π_{0} = \frac{1}{2}$ is very important theoretically given that it is oftentimes recommended as the default conservative approach given that nothing is known about π since it provides the maximum w₀ for a given n. It has been noted by Brown et al. [3,4] that Agresti–Coull confidence interval outperforms the Wald intervals. Hence, we will use the Agresti–Coull interval for derivation of the theorems in this section and then generalize the properties in Section 3 for the other two intervals. For the Agresti–Coull interval, the probability distribution of ${\tilde{W}}_{AC}$ is given by

P ({\tilde{W}}_{AC} = w (y) | π_{0}) = (\begin{matrix} n \\ y \end{matrix}) π_{0}^{y} {(1 - π_{0})}^{n - y},

(10)

where

w (y) = 2 z_{α / 2} \sqrt{\frac{\tilde{π} (y) (1 - \tilde{π} (y))}{n + z_{α / 2}^{2}}},

(11)

and

\tilde{π} (y) = \frac{y + z_{α / 2}^{2} / 2}{n + z_{α / 2}^{2}}, y = 0, 1, 2, \dots, n,

(12)

The sample space for ${\tilde{W}}_{AC}$ is given by the set $\tilde{W} = {w (0), w (1), \dots, w (n)}$ with associated binomial probabilities given in Equation (10).

In order to develop our argument for the appropriate sample size metric, we first present two very straightforward theorems:THEOREM 1a (n = odd) Given $π_{0} = \frac{1}{2}$ and a sample size of n = 2m + 1, m = 0, 1, 2, …, the maximum of ${\tilde{W}}_{AC}$ is strictly less than w₀.

Proof The proof is straightforward in that the maximum for ${\tilde{W}}_{AC}$ occurs at the value of $\tilde{π}$ where $\sqrt{\tilde{π} (1 - \tilde{π})}$ is maximized, which occurs at the values for which Y = (n − 1)/2 and Y = (n + 1)/2 corresponding to values for $\tilde{π}$ of $((n - 1) / 2 + z_{α / 2}^{2} / 2) / (n + z_{α / 2}^{2})$ and $((n + 1) / 2 + z_{α / 2}^{2} / 2) / (n + z_{α / 2}^{2})$ , respectively. The width w₀ is maximized for the value of π₀ for which $\sqrt{π_{0} (1 - π_{0})}$ is maximized, which occurs at $π_{0} = \frac{1}{2}$ . Noting that $\tilde{π}$ at Y = (n − 1)/2 and Y = (n + 1)/2 are strictly smaller than $π_{0} = \frac{1}{2}$ completes the proof.

THEOREM 1b (n = even) Given $π_{0} = \frac{1}{2}$ and a sample size of n = 2m, m = 0, 1, 2, …, the maximum value that ${\tilde{W}}_{AC}$ may attain is equal to w₀.

Proof The proof follows a similar logic as Theorem 1a in that the maximum for ${\tilde{W}}_{AC}$ occurs at Y = n/2.

COROLLARY 1b For sample sizes of n = 2m, m = 0, 1, 2, …, the $P ({\tilde{W}}_{AC} < w_{0} | π_{0} = \frac{1}{2}) \to 1$ as n → ∞.

Proof The proof follows from Theorem 1b in that $P ({\tilde{W}}_{AC} < w_{0} | π_{0} = \frac{1}{2}) = 1 - P ({\tilde{W}}_{AC} = w_{0} | π_{0} = \frac{1}{2})$ , which occurs at Y = n/2.

THEOREM 2 The $E ({\tilde{W}}_{AC} | π_{0} = \frac{1}{2}) < w_{0}$ .

Proof Expand ${\tilde{W}}_{AC}$ in a Taylor series about $π_{0} = \frac{1}{2}$ . Such that

2 z_{α / 2} \sqrt{\frac{\tilde{π} (1 - \tilde{π})}{\tilde{n}}} = \frac{2 z_{α / 2}}{\sqrt{\tilde{n}}} [\frac{1}{2} - {(\tilde{π} - \frac{1}{2})}^{2} - {(\tilde{π} - \frac{1}{2})}^{4} - 2 {(\tilde{π} - \frac{1}{2})}^{6} - O ({(\tilde{π} - \frac{1}{2})}^{8})] .

(13)

Taking expectations on both sides of Equation (13) and noting that the values for $E {(\tilde{π} - \frac{1}{2})}^{2 k} > 0$ for all k in the summand completes the proof.

The implications of Theorems 1a, b and 2 may be summarized as follows:

From Theorems 1a and b: If we anticipate that $π_{0} = \frac{1}{2}$ , then solving for n from Equation (7) will guarantee a high probability that the calculated width ${\tilde{W}}_{AC}$ will be less than the desired width w₀. If the sample size turns out to be odd then it is impossible for the calculated interval to be as wide or wider than w₀, i.e. ${\tilde{W}}_{AC} < w_{0}$ for odd sample sizes.
From Theorem 2: If $π_{0} = \frac{1}{2}$ , then the value for the width of the confidence interval that we would expect from repeated experiments is less than the desired w₀.

3. General case and applicability to other intervals

In this section, we will investigate the properties of ${\tilde{W}}_{AC}$ for the case $π_{0} \neq \frac{1}{2}$ .

THEOREM 3 $P ({\tilde{W}}_{AC} < w_{0} | π_{0} \neq \frac{1}{2}) \to \frac{1}{2}$ as n → ∞.

Proof Given $π_{0} \neq \frac{1}{2}$ , the probability distribution of W_AC is given in Equation (10). By Equation (11), we can simplify $P ({\tilde{W}}_{AC} < w_{0} | π_{0} \neq \frac{1}{2})$ as

P ({\tilde{W}}_{AC} < w_{0} | π_{0} \neq \frac{1}{2}) = P (\tilde{π} (1 - \tilde{π}) < π_{0} (1 - π_{0}) | π_{0} \neq \frac{1}{2}) .

(14)

Based on the property of function f (x) = x(1 − x), we can conclude that Equation (14) can be further simplified for $π_{0} < \frac{1}{2}$ and $π_{0} > \frac{1}{2}$ , respectively.

\begin{array}{l} P (\tilde{π} (1 - \tilde{π}) < π_{0} (1 - π_{0}) | π_{0} < \frac{1}{2}) \\ = P (\tilde{π} < π_{0} or \tilde{π} > 1 - π_{0} | π_{0} < \frac{1}{2}) \\ = P (Y < π_{0} (n + z_{α / 2}^{2}) - \frac{z_{α / 2}^{2}}{2} or Y > (1 - π_{0}) (n + z_{α / 2}^{2}) - \frac{z_{α / 2}^{2}}{2} | π_{0} < \frac{1}{2}), \end{array}

(15)

where $P (Y = y | π_{0}) = (\begin{matrix} n \\ y \end{matrix}) π_{0}^{y} {(1 - π_{0})}^{n - y}$ . Through this derivation, the probability function in Equation (14) can be calculated through the binomial density function and is closer to 0.5 as n → ∞.

THEOREM 4 The $E ({\tilde{W}}_{AC} | π_{0}) < w_{0}$ where π₀ ∈ (0, 1)

Proof Following the proof of Theorem 2, by taking the expectation on both sides of the equation for Taylor expansion at π₀, we can reach the conclusion since $E {(\tilde{π} - π_{0})}^{2 k} > 0$ for $π_{0} \neq \frac{1}{2}$ case. Theorem 2 gives us the truth that $E ({\tilde{W}}_{AC} | π_{0} = \frac{1}{2}) < w_{0}$ , which allows us to complete the proof.

From Theorems 3 and 4, we learn that although the ${\tilde{W}}_{AC}$ is not strictly less than the desired width w₀ for $π_{0} \neq \frac{1}{2}$ , the value for the width of the confidence interval that we would expect from repeated experiments is still less than the desired w₀.

Using the same rationale as the Agresti–Coull interval, we will arrive at the same conclusions for the Wald and Wilson Score intervals as Theorems 1a and b that given $π_{0} = \frac{1}{2}$ , the maximum of ${\tilde{W}}_{W}$ and ${\tilde{W}}_{S}$ are strictly less than w₀ for n = odd, while the maximum value ${\tilde{W}}_{W}$ and ${\tilde{W}}_{S}$ may attain is equal to w₀ for n = even. For the conditional expectation, $E ({\tilde{W}}_{S} | π_{0}) < w_{0}$ where π₀ ∈ (0, 1), which can be developed from the Taylor expansion following the proof of Theorem 2.

4. Comparison of fixed versus random interval approaches

In this section, we examined the fixed confidence interval approaches, including the Agresti–Coull, Wald and Wilson Score confidence intervals and compared them with our approach for calculating sample sizes based on the expectation of $\tilde{W}$ given π₀. Different w₀’s and the confidence interval widths given π₀ were calculated for sample sizes 10, 11, 100, 101, 200, 201, 1000, 1001, 2000 and 2001 for the cases π₀ = 0.5, 0.95 and 0.975 in Tables 1–9. The probability that $\tilde{W}$ is less than w₀ given π₀ was obtained by summing up the probabilities for which $\tilde{W}$ is less than w₀. The conditional expectations of $\tilde{W}$ given π₀ were also calculated. The n*’s, which are the suggested sample sizes by us based on $E ({\tilde{W}}_{AC} | π_{0}) > w_{0}$ , were given for the corresponding sample sizes. Additionally, we investigated the coverage probabilities for the given sample size and the suggested sample size n* through 100,000 simulations.

Table 1.

Behavior of the Agresti–Coull confidence interval width ${\tilde{W}}_{AC}$ as compared to the fixed value w₀ at $π_{0} = \frac{1}{2}$ and α = 0.05.

Sample size

w₀

P ({\tilde{W}}_{AC} < w_{0} | π_{0} = \frac{1}{2})

E ({\tilde{W}}_{AC} | π_{0} = \frac{1}{2})

CP_n

{CP}_{n^{*}}

0.5268

0.7539

0.5125

0.9783

0.9621

0.5088

1.0000

0.4955

0.9342

0.9794

100

0.1923

0.9204

0.1914

0.9439

0.9564

101

0.1914

1.0000

0.1905

100

0.9545

0.9438

200

0.1373

0.9437

0.1369

199

0.9454

0.9536

201

0.1369

1.0000

0.1366

200

0.9518

0.9449

1000

0.0619

0.9748

0.0618

999

0.9458

0.9494

1001

0.0618

1.0000

0.0618

1000

0.9510

0.9466

2000

0.0438

0.9822

0.0438

1999

0.9478

0.9514

2001

0.0438

1.0000

0.0438

2000

0.9510

0.9488

Open in a new tab

Notes: n*, suggested sample size based on $E ({\tilde{W}}_{AC} | π_{0} = \frac{1}{2}) > w_{0}$ ; CP_n, coverage probability for sample size n; ${CP}_{n^{*}}$ , coverage probability for sample size n*.

Table 9.

Behavior of the Wilson (Score) confidence interval width ${\tilde{W}}_{S}$ as compared to the fixed value w₀ at π₀ = 0.975 and α = 0.05.

Sample size

w₀

P ({\tilde{W}}_{S} < w_{0} | π_{0} = 0.975)

E ({\tilde{W}}_{S} | π_{0} = 0.975)

CP_n

{CP}_{n^{*}}

0.3108

0.7763

0.3036

0.9750

0.9804

0.2927

0.7569

0.2857

0.9693

0.975

100

0.0696

0.5422

0.0676

0.9608

0.9682

101

0.0692

0.5357

0.0672

0.9581

0.9664

200

0.0465

0.4383

0.0456

193

0.9626

0.9682

201

0.0463

0.6115

0.0454

194

0.9626

0.9677

1000

0.0197

0.4724

0.0196

990

0.9466

0.9485

1001

0.0196

0.5509

0.0196

991

0.9474

0.9477

2000

0.0138

0.4805

0.0138

1990

0.9480

0.9488

2001

0.0138

0.5361

0.0138

1991

0.9469

0.9488

Open in a new tab

Notes: n*, suggested sample size based on $E ({\tilde{W}}_{S} | π_{0} = 0.975) > w_{0}$ ; CP_n, coverage probability for sample size n; ${CP}_{n^{*}}$ , coverage probability for sample size n*.

In Table 1, the ${\tilde{W}}_{AC}$ ’s are less than w₀ for all odd sample sizes, thus w₀ is not attainable. For even sample sizes, the $P ({\tilde{W}}_{AC} < w_{0} | π_{0} = \frac{1}{2})$ increases as a function of n as per Corollary 1b. In terms of the practical implications, we would likely require a sample size of n* = n − 1 to carry out the study based on the metric $E ({\tilde{W}}_{AC} | π_{0} = \frac{1}{2})$ . Although the preferred sample sizes are smaller than the original sample sizes, we find that the coverage probabilities are closer to nominal level from an examination of the tables. The quantities of interest for π₀ = 0.95 and π₀ = 0.975 cases are obtained in Tables 2 and 3. In general, the $P ({\tilde{W}}_{AC} < w_{0} | π_{0})$ becomes closer to 0.5 as the sample size increases. We notice that n* would be n − k, where k is a positive number and will increase as n gets larger and π₀ gets further from $\frac{1}{2}$ . For example, for Table 2 we find empirically n* = ⌊n − log(n) + 1⌋ + 1. It is interesting to point out that the coverage probabilities for n and n* are very competitive for all kinds of sample sizes and parameter settings.

Table 2.

Behavior of the Agresti–Coull confidence interval width ${\tilde{W}}_{AC}$ as compared to the fixed value w₀ at π₀ = 0.95 and α = 0.05.

Sample size

w₀

P ({\tilde{W}}_{AC} < w_{0} | π_{0} = 0.95)

E ({\tilde{W}}_{AC} | π_{0} = 0.95)

CP_n

{CP}_{n^{*}}

0.4002

0.5987

0.3948

0.9882

0.9289

0.3790

0.5688

0.3737

0.9842

0.9886

100

0.0959

0.4360

0.0945

0.9663

0.9699

101

0.0954

0.6070

0.0940

0.9641

0.9691

200

0.0644

0.4547

0.0638

196

0.9659

0.9516

201

0.0642

0.5766

0.0636

197

0.9662

0.9703

1000

0.0274

0.4797

0.0273

995

0.9522

0.9519

1001

0.0274

0.5346

0.0273

996

0.9510

0.9518

2000

0.0192

0.4857

0.0192

1994

0.9537

0.9490

2001

0.0192

0.5245

0.0192

1995

0.9537

0.9488

Open in a new tab

Notes: n*, suggested sample size based on $E ({\tilde{W}}_{AC} | π_{0} = 0.95) > w_{0}$ ; CP_n, coverage probability for sample size n; ${CP}_{n^{*}}$ , coverage probability for sample size n*.

Table 3.

Behavior of the Agresti–Coull confidence interval width ${\tilde{W}}_{AC}$ as compared to the fixed value w₀ at π₀ = 0.975 and α = 0.05.

Sample size

w₀

P ({\tilde{W}}_{AC} < w_{0} | π_{0} = 0.975)

E ({\tilde{W}}_{AC} | π_{0} = 0.975)

CP_n

{CP}_{n^{*}}

0.3831

0.7763

0.3801

0.9750

0.9804

0.3613

0.7569

0.3582

0.9693

0.9750

100

0.0777

0.5422

0.0763

0.9608

0.9650

101

0.0771

0.5357

0.0758

0.9864

0.9634

200

0.0497

0.4383

0.0490

195

0.9626

0.9668

201

0.0496

0.6115

0.0489

196

0.9626

0.9664

1000

0.0200

0.4724

0.0199

991

0.9466

0.9588

1001

0.0200

0.5509

0.0199

992

0.9474

0.9581

2000

0.0139

0.4805

0.0139

1990

0.9480

0.9564

2001

0.0139

0.5361

0.0139

1991

0.9469

0.9563

Open in a new tab

Notes: n*, suggested sample size based on $E ({\tilde{W}}_{AC} | π_{0} = 0.975) > w_{0}$ ; CP_n, coverage probability for sample size n; ${CP}_{n^{*}}$ , coverage probability for sample size n*.

As we can see from the results in Tables 4–9, the majority of the preferred sample sizes for the Wald and Wilson Score confidence intervals are less than the ones we have from Agresti–Coull interval in the corresponding parameter settings. n*’s are not attainable in Tables 5 and 6, which means we cannot find sample sizes such that the expected value of confidence interval width is greater than the desired confidence interval width. For $π_{0} = \frac{1}{2}$ , we find empirically the suggested sample size to carry out the study

n_{W}^{*} = n_{S}^{*} = n_{AC}^{*} - 1 = n - 2.

(16)

For $π_{0} \neq \frac{1}{2}$ , the differences between n and n* are larger than the ones we have from the Agresti–Coull interval. Therefore, if researchers are inclined to choose a sample size calculation method from the Agresti–Coull, Wald or Wilson Score interval without adjustment, we suggest the Agresti–Coull interval since the obtained sample size would be closer to the sample size based on the conditional expectation of the random variable $\tilde{W}$ given π₀.

Table 4.

Behavior of the Wald confidence interval width ${\tilde{W}}_{W}$ as compared to the fixed value $π_{0} = \frac{1}{2}$ and α = 0.05.

Sample size

w₀

P ({\tilde{W}}_{W} < w_{0} | π_{0} = \frac{1}{2})

E ({\tilde{W}}_{W} | π_{0} = \frac{1}{2})

CP_n

{CP}_{n^{*}}

0.6198

0.7539

0.5856

0.8907

0.9307

0.5910

1.0000

0.5617

0.9342

0.8198

100

0.1960

0.9204

0.1950

0.9439

0.9466

101

0.1950

1.0000

0.1941

0.9545

0.9567

200

0.1386

0.9437

0.1382

198

0.9454

0.9456

201

0.1382

1.0000

0.1379

199

0.9518

0.9538

1000

0.0620

0.9748

0.0619

998

0.9458

0.9457

1001

0.0619

1.0000

0.0619

999

0.9510

0.9502

2000

0.0438

0.9822

0.0438

1998

0.9478

0.9487

2001

0.0438

1.0000

0.0438

1999

0.9510

0.9519

Open in a new tab

Notes: n*, suggested sample size based on $E ({\tilde{W}}_{W} | π_{0} = \frac{1}{2}) > w_{0}$ ; CP_n, coverage probability for sample size n; ${CP}_{n^{*}}$ , coverage probability for sample size n*.

Table 5.

Behavior of the Wald confidence interval width ${\tilde{W}}_{W}$ as compared to the fixed value w₀ at π₀ = 0.95 and α = 0.05.

Sample size

w₀

P ({\tilde{W}}_{W} < w_{0} | π_{0} = 0.95)

E ({\tilde{W}}_{W} | π_{0} = 0.95)

CP_n

{CP}_{n^{*}}

0.2702

0.5987

0.1608

NA^b

0.4022

NA^b

0.2576

0.5688

0.1595

NA^b

0.4311

NA^b

100

0.0854

0.4360

0.0829

0.8770

0.9439

101

0.0850

0.6070

0.0825

0.8819

0.9447

200

0.0604

0.4547

0.0596

194

0.9241

0.9175

201

0.0603

0.5766

0.0594

195

0.9256

0.9188

1000

0.0270

0.4797

0.0269

994

0.9436

0.9474

1001

0.0270

0.5346

0.0269

995

0.9431

0.9542

2000

0.0191

0.4857

0.0191

1994

0.9458

0.9447

2001

0.0191

0.5245

0.0191

1995

0.9458

0.9463

Open in a new tab

Notes: n*, suggested sample size based on $E ({\tilde{W}}_{W} | π_{0} = 0.95) > w_{0}$ ; CP_n, coverage probability for sample size n; ${CP}_{n^{*}}$ , coverage probability for sample size n*;

, n* and ${CP}_{n^{*}}$ are not attainable for the corresponding sample size.

Table 6.

Behavior of the Wald confidence interval width ${\tilde{W}}_{W}$ as compared to the fixed value w₀ at π₀ = 0.975 and α = 0.05.

Sample size

w₀

P ({\tilde{W}}_{W} < w_{0} | π_{0} = 0.975)

E ({\tilde{W}}_{W} | π_{0} = 0.975)

CP_n

{CP}_{n^{*}}

0.1935

0.7763

0.0864

NA^b

0.2256

NA^b

0.1845

0.7569

0.0862

NA^b

0.2428

NA^b

100

0.0612

0.5422

0.0565

0.9163

0.8635

101

0.0609

0.5357

0.0563

0.9180

0.8664

200

0.0433

0.4383

0.0420

187

0.8724

0.9410

201

0.0432

0.6115

0.0419

188

0.8740

0.9477

1000

0.0194

0.4724

0.0193

989

0.9497

0.9481

1001

0.0193

0.5509

0.0192

990

0.9287

0.9488

2000

0.0137

0.4805

0.0136

1989

0.9397

0.9500

2001

0.0137

0.5361

0.0136

1990

0.9393

0.9508

Open in a new tab

Notes: n*, suggested sample size based on $E ({\tilde{W}}_{W} | π_{0} = 0.975) > w_{0}$ ; CP_n, coverage probability for sample size n; ${CP}_{n^{*}}$ , coverage probability for sample size n*;

, n* and ${CP}_{n^{*}}$ are not attainable for the corresponding sample size.

The adjusted sample size calculation method can guarantee that the expected confidence interval width given a hypothesized proportion will not be less than the desired confidence interval width. With the reduced sample size, the coverage probability can still maintain at the nominal level. It would be appealing to extend this idea to other discrete or continuous distributions.

Table 7.

Behavior of the Wilson (Score) confidence interval width ${\tilde{W}}_{S}$ as compared to the fixed value w₀ at $π_{0} = \frac{1}{2}$ and α = 0.05.

Sample size

w₀

P ({\tilde{W}}_{S} < w_{0} | π_{0} = \frac{1}{2})

E ({\tilde{W}}_{S} | π_{0} = \frac{1}{2})

CP_n

{CP}_{n^{*}}

0.5268

0.7539

0.5066

0.9783

0.9307

0.5088

1.0000

0.4906

0.9342

0.9612

100

0.1923

0.9204

0.1914

0.9439

0.9466

101

0.1914

1.0000

0.1905

0.9545

0.9567

200

0.1373

0.9437

0.1369

198

0.9454

0.9456

201

0.1369

1.0000

0.1366

199

0.9518

0.9538

1000

0.0619

0.9748

0.0618

998

0.9458

0.9457

1001

0.0618

1.0000

0.0618

999

0.9510

0.9502

2000

0.0438

0.9822

0.0438

1998

0.9478

0.9487

2001

0.0438

1.0000

0.0438

1999

0.9510

0.9519

Open in a new tab

Notes: n*, suggested sample size based on $E ({\tilde{W}}_{S} | π_{0} = \frac{1}{2}) > w_{0}$ ; CP_n, coverage probability for sample size n; ${CP}_{n^{*}}$ , coverage probability for sample size n*.

Table 8.

Behavior of the Wilson (Score) confidence interval width ${\tilde{W}}_{S}$ as compared to the fixed value w₀ at π₀ = 0.95 and α = 0.05.

Sample size

w₀

P ({\tilde{W}}_{S} < w_{0} | π_{0} = 0.95)

E ({\tilde{W}}_{S} | π_{0} = 0.95)

CP_n

{CP}_{n^{*}}

0.3393

0.5987

0.3274

0.9136

0.9289

0.3216

0.5688

0.3102

0.8968

0.9138

100

0.0902

0.4360

0.0884

0.9663

0.9422

101

0.0897

0.6070

0.0879

0.9641

0.9706

200

0.0622

0.4547

0.0615

195

0.9659

0.9524

201

0.0620

0.5766

0.0613

196

0.9662

0.9529

1000

0.0272

0.4797

0.0271

994

0.9522

0.9521

1001

0.0272

0.5346

0.0271

995

0.9510

0.9518

2000

0.0192

0.4857

0.0191

1994

0.9537

0.9490

2001

0.0192

0.5245

0.0191

1995

0.9537

0.9488

Open in a new tab

Notes: n*, suggested sample size based on $E ({\tilde{W}}_{S} | π_{0} = 0.95) > w_{0}$ ; CP_n, coverage probability for sample size n; ${CP}_{n^{*}}$ , coverage probability for sample size n*.

References

1.Agresti A, Coull B. Approximate is better than ‘exact’ for interval estimation of binomial proportions. Amer Statist. 1998;52:119–126. [Google Scholar]
2.Arkin C, Wachtel M. How many patients are necessary to assess test performance. JAMA. 1990;263:275–278. [PubMed] [Google Scholar]
3.Brown L, Cai T, DasGupta A. Interval estimation for a binomial proportion. Statist Sci. 2001;16:101–117. [Google Scholar]
4.Brown L, Cai T, DasGupta A. Confidence intervals for a binomial proportion and asymptotic expansions. Ann Statist. 2002;30:160–201. [Google Scholar]
5.Piegorsch W. Sample sizes for improved binomial confidence intervals. Comput Statist Data Anal. 2004;46:309–316. [Google Scholar]
6.Wilson E. Probable inference, the law of succession, and statistical inference. J Amer Statist Assoc. 1927;22:209–212. [Google Scholar]

[R1] 1.Agresti A, Coull B. Approximate is better than ‘exact’ for interval estimation of binomial proportions. Amer Statist. 1998;52:119–126. [Google Scholar]

[R2] 2.Arkin C, Wachtel M. How many patients are necessary to assess test performance. JAMA. 1990;263:275–278. [PubMed] [Google Scholar]

[R3] 3.Brown L, Cai T, DasGupta A. Interval estimation for a binomial proportion. Statist Sci. 2001;16:101–117. [Google Scholar]

[R4] 4.Brown L, Cai T, DasGupta A. Confidence intervals for a binomial proportion and asymptotic expansions. Ann Statist. 2002;30:160–201. [Google Scholar]

[R5] 5.Piegorsch W. Sample sizes for improved binomial confidence intervals. Comput Statist Data Anal. 2004;46:309–316. [Google Scholar]

[R6] 6.Wilson E. Probable inference, the law of succession, and statistical inference. J Amer Statist Assoc. 1927;22:209–212. [Google Scholar]

PERMALINK

A comment on sample size calculations for binomial confidence intervals

Lai Wei

Alan D Hutson

Abstract

1. Introduction

2. Specific case $π_{0} = \frac{1}{2}$

3. General case and applicability to other intervals

4. Comparison of fixed versus random interval approaches

Table 1.

Table 9.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A comment on sample size calculations for binomial confidence intervals

Lai Wei

Alan D Hutson

Abstract

1. Introduction

2. Specific case π0=12

3. General case and applicability to other intervals

4. Comparison of fixed versus random interval approaches

Table 1.

Table 9.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2. Specific case $π_{0} = \frac{1}{2}$