Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jan 8.
Published in final edited form as: Biom J. 2013 Nov 11;56(1):10.1002/bimj.201300048. doi: 10.1002/bimj.201300048

Stratified Fisher's Exact Test and its Sample Size Calculation

Sin-Ho Jung 1,1
PMCID: PMC3884832  NIHMSID: NIHMS535358  PMID: 24395208

Summary

Chi-squared test has been a popular approach to the analysis of a 2 × 2 table when the sample sizes for the four cells are large. When the large sample assumption does not hold, however, we need an exact testing method such as Fisher's test. When the study population is heterogeneous, we often partition the subjects into multiple strata, so that each stratum consists of homogeneous subjects and hence the stratified analysis has an improved testing power. While Mantel-Haenszel test has been widely used as an extension of the chi-squared test to test on stratified 2×2 tables with a large-sample approximation, we have been lacking an extension of Fisher's test for stratified exact testing. In this paper, we discuss an exact testing method for stratified 2 × 2 tables which is simplified to the standard Fisher's test in single 2 × 2 table cases, and propose its sample size calculation method that can be useful for designing a study with rare cell frequencies.

Keywords: Conditional type I error, Exact test, Hypergeometric distribution, Many 2 × 2 tables, Odds ratio

1 Introduction

In this paper, we discuss an exact test for stratified 2 × 2 tables with rare cell frequencies. Since the stratified exact test is simplified to the standard Fisher's (1935) exact test in single 2 × 2 table cases, we call it stratified Fisher's test.

Suppose that we want to compare the response probabilities between two groups, experimental (or case) and control. Oftentimes in a two group comparison, the characteristics of study subjects may be heterogeneous. In this case, the heterogeneity is characterized by some stratification factors, and a stratified method is applied in the final analysis. When the distribution of the stratification factors is identical between two groups, an unstratified testing ignoring the population heterogeneity controls the type I error rate but loses the efficiency. If the distribution of the stratification factors is different between two groups, however, an unstratified testing does not control the type I error rate. We want to test if the two groups have equal response probabilities or not while accounting for heterogeneity of the population defined by strata.

Multiple asymptotic testing methods have been proposed for testing on stratified 2 × 2 tables. Under the assumption that the odds ratios are identical among strata, Cochran (1954) proposes an asymptotic method for testing if the common odds ratio is 1 or not. Under the assumption of constant risk ratios across strata, Gart (1985) proposes an asymptotic method for testing if the common risk ratio is 1 or not. Woolson et al. (1986) and Nam (1992) propose sample size calculation methods for Cochran test, and Nam (1998) proposes a sample size method for Gart test.

In order to use these tests for testing on many 2 × 2 tables, we have to check the assumptions of common odds ratios or risk ratios in advance. For testing the common odds ratio assumption, Zelen (1971) proposes an exact method, which is implemented by StatXact, and Breslow and Day (1980) propose an asymptotic method.

If these assumptions do not seem to be valid, we need a robust test requiring no assumptions on the primary parameters for testing. Mantel and Haenszel (1959) propose an asymptotic test for testing if two groups have equal response probabilities without any assumption of common odds ratio or common risk ratio. Jung et al. (2007) propose a sample size calculation method for Mantel-Haenszel test. In this paper, we extend Fisher's exact test for testing stratified 2 × 2 tables with rare cell frequencies, and propose its sample size calculation method. These methods can be used in designing and analyzing small case-control studies or clinical trials. The input parameters to be specified for the sample size calculation of stratified Fisher's exact test are exactly the same as those for the sample size calculation of Mantel-Haenszel test. We will compare the performance of the proposed test with that of the asymptotic Mantel-Haenszel test and the standard Fisher's exact test ignoring strata under some practical settings.

2 Stratified Fisher's Exact Test

Suppose that there are J strata. Let N denote the total sample size, and nj the sample size in stratum j(j=1Jnj=N). Among nj subjects in stratum j(= 1, ..., J), mj are allocated to group 1 (case or experimental) and j to group 2 (control). For stratum j, group 1 has a response probability pj and group 2 has a response probability qj. Let j = 1 – pj, j = 1 – qj, and θj = pjj/(qjj) denote the odds ratio in stratum j. Suppose that we want to test

H0:θ1==θJ=1

against

H1:θj>1for somej=1,,J.

For stratum j(= 1, ..., J), let xj and yj denote the numbers of responders for groups 1 and 2, respectively, and zj = xj + yj denote the total number of responses. The frequency data in stratum j can be described as in Table 1.

Table 1.

Frequency data of 2 × 2 table for stratum j(= 1, ..., J)

Group
Response Case Control Total
Yes xj yj zj
No mj – xj j – yj nj – zj
Total mj j nj

We propose to reject H0 in favor of H1 if S=j=1Jxj is large. Under H0, conditioning on the margin totals (zj, mj, nj), xj has the hypergeometric distribution

f0(xjzj,mj,nj)=(mjxj)(mjzjxj)i=mjmj+(mji)(mjzji)

for mjxjmj+, where mj = max(0, zjj) and mj+ = min(zj, mj). Let z = (z1, ..., zJ), m = (m1, ..., mJ) and n = (n1, ..., nJ). Given (z, m, n), the conditional p-value for s=j=1Jxj, pv = pv(s|z, m, n), is obtained by

pv=P(Ssz,m,n,H0)=i1=m1m1+iJ=mJmJ+I(j=1Jijs)j=1Jf0(ijzj,mj,nj).

Given type I error rate α*, we reject H0 if pv < α*.

Similarly for the other one-sided alternative hypothesis

H2:θj<1for somej=1,,J,

the conditional p-value given (z, m, n) is obtained by

pv=P(Ssz,m,n,H0)=i1=m1m1+iJ=mJmJ+I(j=1Jijs)j=1Jf0(ijzj,mj,nj).

A two-sided p-value may be calculated as two times the minimum of the two one-sided p-values. Without loss of generality, we focus our discussions on the one-sided alternative hypothesis H1 in our paper.

Note that Mantel-Haenszel test also rejects H0 in favor of H1 for a large value of S, and its p-value is calculated using the standardized test statistic

W=SEV

which is asymptotically N(0, 1) under H0, where E=j=1JEj, V=j=1JVj, Ej = zjmj/nj and Vj=zjmjmj(njzj){nj2(nj1)}. Westfall, Zaykin and Young (2002) propose a permutation procedure for stratified Mantel-Haenszel test, which permutes the two-sample binary data within each stratum in the context of multiple testing. Their permutation maintains the margin totals for 2 × 2 tables, {(zj, mj, nj), 1 ≤ jJ}, and Ej and Vj depend on the margin totals only, so that the permutation-based Mantel-Haenszel test will be identical to our stratified Fisher's exact test if they go through all the possible j=1J(mj+mj+1) permutations. Their permutation test is implemented by SAS. Compared to our exact test, the permutation test requires a much longer computing time. Furthermore, a permutation test often randomly selects partial permutations to approximate the exact p-value. In this case, the resulting approximate p-value will be different depending on the selected seed number for random number generation or the number of permutations, while the exact method always provides a constant exact p-value.

A real data example is taken from Li et al. (1979), where the investigators are interested in whether thymosin (experimental), compared to placebo (control), has any effect in the treatment of bronchogenic carcinoma patients receiving radiotherapy. Table 2 summarizes the data for three strata. The one-sided p-values are 0.1563 by the stratified Fisher's exact test and 0.0760 by Mantel-Haenszel test. Stratified Fisher's test has a larger p-value than Mantel-Haenszel test because of its conservative type I error control as demonstrated in Section 4 or because of the very small numbers of failures across the strata that can lead to a biased p-value for the asymptotic Mantel-Haenszel test.

Table 2.

Response to thymosin in bronchogenic carcinoma patients (T=thymosin, P=placebo)

Stratum 1 Stratum 2 Stratum 3
T P T P T P
Success 10 12 22 9 11 20 8 7 15
Failure 1 1 2 0 1 1 0 3 3
11 13 24 9 12 21 8 10 18

3 Power and Sample Size Calculation

Jung et al. (2007) propose a sample size calculation method for Mantel-Haenszel test. In this section, we derive a sample size formula for stratified Fisher's exact test by specifying the values of the same input parameters as those for Mantel-Haenszel test by Jung et al. (2007). Following are input parameters to be specified for a sample size calculation.

Input Parameters

  • Type I and II error probabilities: (α*, β*)

  • Success probabilities for group 2 (control): (q1, ..., qJ)

  • Odds ratios: (θ1, ..., θJ) under H1, where θj > 0. Note that, given qj and θj, the success probability for group 1 (experimental) is given as pj = θjqj/(j + θjqj) in stratum j(= 1, ..., J).

  • Prevalence for each stratum: (a1, ..., aJ), where aj = E(nj/N). Note that aj > 0 and j=1Jaj=1.

  • Allocation probability for group 1 (experimental) within each stratum, (b1, ..., bJ), where bj = E(mj/nj) with 0 < bj < 1.

3.1 When Group and Stratum Allocations are Random

In designing a study, N is fixed at a predetermined size corresponding to a specified power. At the moment, we assume that, given N, the strata sizes and the sample sizes for two groups within each stratum are randomly selected by the prevalence rate of each category in the population. Hence, given N, {(xj, zj, mj, nj), 1 ≤ jJ} are random variables with following marginal or conditional probability mass functions that are indexed by the above input parameters.

Distribution Functions

  • Conditional distribution of xj given (zj, mj, nj):
    fj(xjzj,mj,nj)=(mjxj)(mjzjxj)θjxji=mjmj+(mji)(mjzji)θji
    for mjxjmj+, where mj = max(0, zjj, mj+ = min(zj, mj) and j = 1, ..., J. Under H0, this is simplified to f0(xj|zj, mj, nj).
  • Conditional distribution of zj given (mj, nj): Given (mj, nj), xj ~ B(mj, pj) and yj ~ B(j, qj) are independent, so that the conditional probability mass function of zj = xj + yj is expressed as
    gj(zjmj,nj)=x=mjmj+(mjx)pjxpjmjx(mjzjx)qjzjxqjmjzj+x
    for z = 0, 1, ..., nj and j = 1, ..., J, where B(m, p) denotes the binomial distribution with number of trials m and success probability p. Under H0, this is simplified to
    g0j(zjmj,nj)=qjzjqjnjzjx=mjmj+(mjx)(mjzjx).
    Note that (00)p0(1p)0=1 for p ∈ (0, 1).
  • Conditional distribution of mj given nj: At the moment, we assume that, given a total sample size nj of stratum j, the sample size of group 1 mj is a binomial random variable with probability mass function
    hj(mjnj)=(njmj)bjmj(1bj)njmj
    for 0 ≤ mjnj and j = 1, ..., J.
  • Conditional distribution of (n1, ..., nJ) given N is multinomial with probability mass function
    lN(n1,,nJ)=N!j=1Jnj!j=1Jajnj
    for 0 ≤ n1N, ..., 0 ≤ nJN and j=1Jnj=N.

We first derive the power function for a given sample size N using these distribution functions. Given (z, m, n) and type I error rate α*, the critical value cα* = cα*(z, m, n) is the smallest integer c satisfying

P(Scz,m,n,H0)=i1=m1m1+iJ=mJmJ+I(j=1Jijc)j=1Jf0(ijzj,mj,nj)α.

Note that scα*(z, m, n) if and only if pv(s|z, m, n) ≤ α*. We call α(z, m, n) = P(Scα*|z, m, n, H0) the conditional type I error rate given (z, m, n). Similarly, the conditional power 1 – β(z, m, n) given (z, m, n) is obtained by

P(Scαz,m,n,H1)=i1=m1m1+iJ=mJmJ+I(j=1Jijcα)j=1Jfj(ijzj,mj).

For a chosen N, the marginal type I error rate and power are given as

αNE{α(z,m,n)H0}=En(Em[Ez{α(z,m,n)m,n,H0}n])=nDNm1=0n1mJ=0nJz1=m1m1+zJ=mJm1+α(z1,,zJ;m1,,mJ;n1,nJ)×{j=1Jg0j(zjmj,nj)}{j=1Jhj(mjnj)}lN(n1,,nJ) (1)

and

1βNE{1β(z,m,n)H1}=En(m[Ez{1β(z,m,n)m,n,H1}n])=nDNm1=0n1mJ=0nJz1=m1m1+zJ=mJmJ+{1β(z1,,zJ;m1,,mJ;n1,,nJ)}×{j=1Jgi(zjmj,nj)}{j=1Jhj(mjnj)}lN(n1,,nJ). (2)

respectively, where DN={(n1,,nJ):0n1N,,0nJN,j=1Jnj=N} and Ew(·) denotes the expected value with respect to a random vector w.

Since α(z, m, n) ≤ α* for all (z, m, n), we have αNα*. Given power 1 – β*, the required sample size is chosen by the smallest integer N satisfying 1 – βN ≥ 1 – β*. In other words, while the statistical testing controls the conditional type I error α(z, m, n), the sample size is determined to guarantee a specified level of marginal power. In summary, a sample size is calculated as follows.

Sample Size Calculation

  1. Specify input parameters: J, (α*, β*), (q1, ..., qJ), (θ1, ..., θJ), (a1, ..., aJ), (b1, ..., bJ).

  2. Starting from the sample size for Mantel-Haenszel test NMH, do following by increasing N by 1,
    • B1
      For j = 1, ..., J, zj ∈ [0, nj], mj ∈ [0, nj], nj ∈ [0, N], and j=1Jnj=N,
      1. Find cα* = cα* (z, m, n).
      2. Calculate 1 – β(z, m, n) = P(Scα*|z, m, n, H1)
    • B2
      Calculate 1 – βN = E{1 – β(z, m, n)|H1}.
  3. Stop (B) if 1 – βN ≥ 1 – β*. This N is the required sample size.

3.2 When Stratum Allocation is Fixed

In a case-control study or a clinical trial, one may want to assign a fixed proportion of subjects to stratum j, say 100aj%, regardless of its prevalence. In this case, in calculating αN and 1–βN, nj are fixed at Naj and the step to calculate the expectations with respect to n are omitted. That is, given N, we set nj=[Naj],,nJ1=[NaJ1],nJ=Nj=1J1nj, and calculate (1) and (2) by

αN=m1=0n1mJ=0nJz1=m1m1+zJ=mJmJ+α(z1,,zJ;m1,,mJ;n1,nJ)×{j=1Jg0j(zjmj,nj)}{j=1Jhj(mjnj)}

and

1βN=m1=0n1mJ=0nJz1=m1m1+zJ=mJmJ+{1β(z1,,zJ;m1,,mJ;n1,nJ)}×{j=1Jgj(zjmj,nj)}{j=1Jhj(mjnj)},

where [a] is the round-off of a.

3.3 When Both Group and Stratum Allocations are Fixed

In a more simplified study design, we may want to further prespecify the allocation proportion of group 1 within each stratum. In this case, given N, nj and mj are fixed at [Naj] and [Najbj], respectively, and the calculation of (1) and (2) is further simplified to

αN=z1=m1m1+zJ=mJmJ+α(z1,,zJ;m1,,mJ;n1,,nJ)j=1Jg0j(zjmj,nj)

and

1βN=z1=m1m1+zJ=mJmJ+{1β(z1,,zJ;m1,,mJ;n1,,nJ)}j=1Jgj(zjmj,nj).

4 Numerical Studies

We want to compare the small sample performance of stratified Fisher's test and Mantel-Haenszel test using simulations. We generate B = 10,000 simulation samples of size N = 25, 50 or 75 with J = 2 strata under a1 = 0.25, 0.5 or 0.75; (b1, b2) = (1/4, 3/4), (1/2, 1/2) or (3/4, 1/4); q1 = 0.1, q2 = 0.3 or 0.7; (θ1, θ2) = (1, 1), (5, 10), (7.5, 7.5) or (10, 5). Stratified Fisher's test, the standard (unstratified) Fisher's test and Mantel-Haenszel test are applied to each simulation sample, and empirical power for each test is calculated as the proportion of simulation samples rejecting H0 with one-sided α* = 0.05. The exact type I error rate and power for stratified Fisher's test can be calculated by using the methods in Section 3, but through simulations we want to compare the performance of the these testing methods applied to the same data sets. We consider large odds ratios to investigate the performance of Fisher's tests and Mantel-Haenszel test with small sample sizes.

Table 3 summarizes the simulation results. Under H0 : θ1 = θ2 = 1, stratified Fisher's test is conservative overall. With 10,000 simulations and α* = 0.05, the 95% confidence limits for the empirical type I error rate are 0.05 ± 0.004. Due to the discreteness of the exact tests and the conservative control of conditional type I error at all possible outcomes, stratified Fisher's test is always conservative as expected, especially with a small sample size (N = 25). Unstratified test has a similar type I error rate to stratified Fisher's test when allocation proportions are identical between two strata (i.e. b1 = b2 = 1/2). However, if more patients are allocated in the stratum with a higher response probabilities (i.e. b1 = 1/4 and b2 = 3/4), then unstratified Fisher's test becomes becomes anticonservative. On the other hand, if more patients are allocated to the stratum with a smaller response probabilities (i.e., b1 = 3/4 and b2 = 1/4), then unstratified Fisher's test becomes very conservative. In this sense, a testing ignoring the strata can be biased unless the allocation proportions are identical across strata. With N = 25 or 50, Mantel-Haenszel test is anti-conservative with q2 = 0.7 (i.e. when two strata have very different response rates) or with a1 = 0.75 (i.e. when a small number of subjects are allocated to the stratum with large response probabilities). The anti-conservativeness diminishes as N increases, but is still of some issue with a1 = 0.75 and N = 75.

Table 3.

Empirical power of stratified Fisher's test/unstratified Fisher's test/Mantel-Haenszel test with one-sided α* = 0.05, J = 2 strata, and q1 = 0.1

a 1 (b1, b2) q 2 (θ1, θ2) = (1, 1) (5,10) (7.5,7.5) (10,5)
(a) N = 25
.25 (1/4, 3/4) .3 .0094/.0354/.0370 .4812/.7864/.6809 .4305/.7230/.6332 .3470/.5989/.5387
.7 .0149/.1785/.0572 .2464/.7160/.5062 .2556/.7143/.5011 .2395/.6782/.4642
(1/2, 1/2) .3 .0179/.0176/.0522 .6164/.5878/.7595 .5764/.5762/.7179 .4668/.4855/.6261
.7 .0161/.0189/.0493 .2608/.2398/.5003 .2909/.2693/.5202 .2803/.2671/.4854
(3/4, 1/4) .3 .0171/.0055/.0599 .4447/.2817/.6448 .4061/.3229/.6053 .3354/.3220/.5223
.7 .0059/.0007/.0314 .0847/.0211/.2523 .1073/.0354/.2934 .1192/.0435/.3024
.5 (1/4, 3/4) .3 .0101/.0560/.0476 .3750/.7984/.5898 .4091/.7852/.6219 .3862/.7073/.5919
.7 .0157/.2987/.0668 .2285/.8140/.4737 .2847/.8356/.5371 .3156/.8318/.5563
(1/2, 1/2) .3 .0118/.0120/.0455 .4852/.4608/.6653 .5366/.5253/.6958 .5055/.5131/.6689
.7 .0164/.0210/.0507 .2564/.2177/.4796 .3441/.2914/.5603 .3980/.3447/.6036
(3/4, 1/4) .3 .0130/.0032/.0542 .3260/.1740/.5389 .3627/.2698/.5717 .3467/.3266/.5583
.7 .0052/.0001/.0255 .0951/.0146/.2684 .1547/.0308/.3673 .2019/.0511/.4286
.75 (1/4, 3/4) .3 .0096/.0380/.0564 .2830/.6508/.5041 .3815/.6876/.5959 .4382/.6830/.6485
.7 .0125/.2124/.0669 .2238/.6819/.4463 .3325/.7500/.5620 .4054/.7967/.6364
(1/2, 1/2) .3 .0104/.0106/.0425 .3572/.3681/.5478 .4739/.4814/.6543 .5553/.5570/.7143
.7 .0097/.0166/.0478 .2514/.2274/.4532 .3981/.3510/.5950 .5014/.4460/.6863
(3/4, 1/4) .3 .0033/.0009/.0321 .1904/.1315/.3921 .2896/.2468/.5013 .3479/.3443/.5649
.7 .0019/.0004/.0162 .0999/.0258/.2703 .1980/.0592/.4100 .2937/.1077/.5202
(b) N = 50
.25 (1/4, 3/4) .3 .0188/.0808/.0419 .8271/.9837/.9073 .7918/.9713/.8816 .6851/.9156/.7950
.7 .0228/.3873/.0525 .5607/.9696/.7422 .5702/.9702/.7431 .5318/.9555/.7115
(1/2, 1/2) .3 .0287/.0272/.0540 .9343/.9174/.9657 .9064/.8958/.9463 .8343/.8353/.8983
.7 .0219/.0278/.0460 .6610/.5045/.7930 .6856/.5609/.8058 .6426/.5581/.7657
(3/4, 1/4) .3 .0248/.0051/.0546 .8273/.5754/.9048 .7890/.6340/.8734 .6789/.6166/.7898
.7 .0170/.0002/.0473 .4361/.0335/.6240 .4653/.0614/.6427 .4568/.0870/.6180
.5 (1/4, 3/4) .3 .0192/.1345/.0522 .7341/.9864/.8457 .7670/.9848/.8652 .7416/.9670/.8373
.7 .0229/.6002/.0586 .5288/.9872/.7099 .6113/.9911/.7734 .6658/.9930/.8047
(1/2, 1/2) .3 .0208/.0212/.0498 .8547/.8247/.9174 .8818/.8670/.9344 .8610/.8560/.9173
.7 .0243/.0289/.0527 .6250/.4701/.7606 .7359/.5956/.8400 .7785/.6685/.8656
(3/4, 1/4) .3 .0199/.0014/.0507 .7192/.3740/.8281 .7542/.5538/.8476 .7333/.6580/.8377
.7 .0142/.0000/.0413 .4254/.0163/.5971 .5495/.0442/.7021 .6067/.0854/.7388
.75 (1/4, 3/4) .3 .0181/.1024/.0550 .6018/.9394/.7492 .7407/.9580/.8486 .7877/.9579/.8821
.7 .0197/.4640/.0624 .4961/.9534/.6698 .6632/.9773/.8036 .7632/.9863/.8708
(1/2, 1/2) .3 .0208/.0211/.0525 .7304/.7025/.8277 .8521/.8385/.9153 .8978/.8935/.9425
.7 .0206/.0250/.0521 .6033/.4931/.7383 .7891/.6826/.8764 .8757/.7958/.9325
(3/4, 1/4) .3 .0142/.0016/.0458 .5771/.3242/.7119 .7015/.5407/.8109 .7597/.6874/.8567
.7 .0083/.0000/.0321 .4141/.0343/.5740 .6104/.1125/.7415 .7184/.2060/.8282
(c) N = 75
.25 (1/4, 3/4) .3 .0250/.1274/.0474 .9552/.9995/.9779 .9335/.9980/.9639 .8760/.9887/.9250
.7 .0279/.5583/.0554 .7572/.9975/.8654 .7694/.9969/.8714 .7293/.9949/.8366
(1/2, 1/2) .3 .0262/.0279/.0472 .9917/.9872/.9961 .9844/.9806/.9919 .9561/.9538/.9726
.7 .0291/.0316/.0519 .8660/.7010/.9223 .8730/.7500/.9236 .8442/.7538/.8998
(3/4, 1/4) .3 .0249/.0028/.0559 .9545/.7684/.9761 .9339/.8248/.9646 .8701/.8162/.9197
.7 .0235/.0001/.0467 .7096/.0394/.8142 .7305/.0862/.8230 .7008/.1235/.7999
.5 (1/4, 3/4) .3 .0230/.1974/.0472 .9042/.9993/.9472 .9214/.9995/.9586 .9059/.9976/.9490
.7 .0266/.7866/.0547 .7246/.9999/.8375 .8066/.9996/.8946 .8395/.9999/.9145
(1/2, 1/2) .3 .0245/.0258/.0498 .9660/.9506/.9818 .9768/.9694/.9873 .9686/.9661/.9812
.7 .0280/.0307/.0544 .8388/.6587/.9044 .9096/.7792/.9496 .9294/.8395/.9611
(3/4, 1/4) .3 .0234/.0013/.0487 .8984/.5448/.9407 .9153/.7408/.9519 .9025/.8300/.9408
.7 .0212/.0000/.0477 .6646/.0130/.7738 .7788/.0473/.8618 .8244/.1121/.8891
.75 (1/4, 3/4) .3 .0238/.1521/.0543 .8057/.9919/.8838 .9024/.9964/.9468 .9277/.9967/.9627
.7 .0238/.6531/.0567 .6984/.9953/.8135 .8531/.9983/.9209 .9172/.9990/.9576
(1/2, 1/2) .3 .0236/.0237/.0491 .9092/.8914/.9491 .9672/.9591/.9837 .9830/.9786/.9907
.7 .0249/.0317/.0527 .8070/.6657/.8778 .9352/.8521/.9631 .9751/.9312/.9873
(3/4, 1/4) .3 .0193/.0016/.0469 .7965/.4821/.8670 .8917/.7363/.9323 .9260/.8699/.9581
. .7 .0171/.0000/.0413 .6444/.0400/.7582 .8299/.1583/.8996 .9038/.3032/.9475

When allocation proportions are equal across the strata, ignoring the strata results in a slight loss of statistical power. Stratified Fisher's test is less powerful than Mantel-Haenszel test, but the difference in power decreases in N. For all three testing methods, the power increases when more subjects are allocated to the stratum with the larger odds ratio, e.g. θ1 < θ2 and a1 < a2.

Table 4 reports sample sizes for Mantel-Haenszel test and stratified Fisher's test. Also reported are sample sizes for stratified Fisher's test by fixing (m, n) or only n at their expected values. The design parameters are set at one-sided α* = 0.05; 1 – β* = 0.9; J = 2 strata; a1 = 0.25, 0.5 or 0.75; (b1, b2) = (0.25, 0.25), (0.25,0.75), (0.5,0.5), (0.75,0.25) or (0.75,0.75); (q1, q2) = (0.1, 0.3); (θ1, θ2) = (5, 10), (7.5, 7.5) or (10, 5). For stratified Fisher's test, fixing (m, n) at their expected values reduces N, while fixing only n requires almost the same N compared to the case with random (m, n). The sample sizes are minimized with a balanced allocation, i.e. b1 = b2 = 1/2. We also observe that the cases of (b1, b2), (1–b1, b2), (b1, 1–b2) and (1–b1, 1–b2) require similar sample sizes. That is, when the allocation between two groups is unbalanced, the required sample size does not much depend on whether the larger group is control or experimental across the different strata.

Table 4.

Sample size for Mantel-Haenszel test/stratified Fisher's test with (m, n) fixed/stratified Fisher's test with n fixed/stratified Fisher's test under J =2 strata, (q1, q2) = (0.1, 0.3), one-sided α* = 0.05, and 1 – β* = 0.9

(θ1, θ2)
a 1 (b1, b2) (5,10) (7.5,7.5) (10,5)
0.25 (0.25,0.25) 46/59/61/61 51/64/66/66 65/80/82/82
(0.25,0.75) 45/53/60/61 50/62/66/66 65/79/82/82
(0.5,0.5) 36/43/45/45 39/48/49/49 50/59/62/62
(0.75,0.25) 46/59/61/61 51/64/66/67 65/80/83/83
(0.75,0.75) 46/53/60/61 51/60/66/66 65/76/83/83
0.5 (0.25,0.25) 58/72/75/76 55/72/72/71 58/72/75/75
(0.25,0.75) 58/65/75/75 54/65/70/71 58/72/75/75
(0.5,0.5) 45/53/56/54 43/50/53/53 45/54/56/56
(0.75,0.25) 59/72/75/76 56/66/71/71 59/72/76/78
(0.75,0.75) 59/69/75/76 55/63/71/71 59/68/76/76
0.75 (0.25,0.25) 78/96/97/98 59/75/76/76 52/64/67/67
(0.25,0.75) 77/89/97/97 59/69/76/76 52/64/67/67
(0.5,0.5) 61/70/73/73 47/55/57/57 41/49/51/51
(0.75,0.25) 80/96/98/99 61/75/77/77 53/65/69/69
(0.75,0.75) 80/85/98/99 61/69/77/77 53/62/69/69

Under each setting, the sample size for stratified Fisher's test is about 30% larger than that of Mantel-Haenszel test. This di erence results from the conservative type I error and power control of stratified Fisher's test. For example, from Table 3, with (a1, b1, b2, q1, q2) = (0.5, 0.25, 0.75, 0.1, 0.3), stratified Fisher's test controls the type I error at 0.0230 with N = 75 and has a power of 0.9042 at (θ1, θ2) = (5, 10). Under this design setting, stratified Fisher's test requires a sample size of size of N = 75 with (α*, 1 – β*) = (0.05, 0.9) from Table 4. For Mantel-Haenszel test, the required sample size with (α*, 1 – β*) = (0.0230, 0.9042) under the same design setting is N = 73 which is close to N = 75 required for stratified Fisher's test. In other words, the conservativeness of the Fisher test results from the discreteness of the exact testing distributions. Mantel-Haenszel test approximates this exact distribution when the sample size is large. Crans and Schuster (2008) propose to conduct Fisher's test with a larger type I error α* = α + ∊ (∊ > 0) so that the maximal marginal type I error rate within the whole range [0, 1] of the response probability under H0 becomes close to the intended α level.

Suppose that we want to design a study similar to that of Li et al. (1979). Since this is a balanced randomized study, we fix (n1, n2, n3) at (N/3, N/3, N/3) and (m1, m2, m3) at (N/6, N/6, N/6). We further assume that (q1, q2, q3) = (0.9, 0.75, 0.6), and (θ1, θ2, θ3) = (1, 30, 30). (The estimates from Table 2 are θ^1=0.833 and θ^2=θ^3=.) In order to control the one-sided conditional type I error at α* = 0.1 and the marginal power at 1 – β* = 0.9, we need N = 83. Under the design, this sample size provides marginal αN = 0.0625 and power 1 – βN = 0.9087.

5 Discussions

Numerous testing methods have been proposed to test on two binomial proportions adjusting for stratum effect based on different assumptions. For example, Cochran (1954) test assumes common odds ratios across strata and Gart (1985) assumes common relative risks. Mantel-Haenszel test makes no assumption on the parameters. These methods are based on large sample theories, so that their testing results may be distorted with a small sample size or sparse data.

In this paper, we propose to use an exact test extending Fisher's test to the analysis of many 2 × 2 tables together with its sample size calculation method. This test does not make any assumptions of large sample size or equal parameter values across strata, so that it does not require to check any assumptions before conducting a testing. The power and sample sizes are compared between the exact test and Mantel-Haenszel test using simulations and the proposed sample size formulas. While the type I error for Mantel-Haenszel test can be anti-conservative with a small sample size or sparse data, the exact test always controls the type I error below a specified level. When the effect size is so large that the required sample size is small (say, about N=70 or smaller), the exact test needs about 20% to 30% larger sample size than Mantel-Haenszel test. However, due to the small sample sizes, the increase in sample size in this case is not very large in absolute number (say, 10 to 20), so that, for robustness of the testing results, we propose to use the exact test by slightly increasing the sample size rather than obtaining a biased result by an asymptotic test.

If J ≥ 3, the sample size calculation for stratified Fisher's test requires a long computing time. We found, in calculating the marginal type I error rate and power, that conditioning the sizes of strata (n1, ..., nJ) on their expected numbers provides very accurate sample sizes for the stratified Fisher's test even when (n1, ..., nJ) are random, while drastically saving the computing time.

REFERENCES

  • 1.Breslow NE, Day NE. The Analysis of Case-Control Studies. IARC Scientific Publications; No. 32, Lyon, France: 1980. [PubMed] [Google Scholar]
  • 2.Cochran WC. Some methods of strengthening the common χ2 tests. Biometrics. 1954;10:417–451. [Google Scholar]
  • 3.Crans GG, Schuster JJ. How conservative is Fisher's exact test? A quantitave evaluation of the two-sample comparative binomiial trial. Statistics in Medicine. 2008;27:3598–3611. doi: 10.1002/sim.3221. [DOI] [PubMed] [Google Scholar]
  • 4.Fisher RA. The logic of inductive inference (with discussion). Journal of Royal Statistical Society. 1935;98:39–82. [Google Scholar]
  • 5.Gart JJ. Approximate tests and interval estimation of the common relative risk in the combination of 2 × 2 tables. Biometrika. 1985;72:673–677. [Google Scholar]
  • 6.Jung SH, Chow SC, Chi EM. A note on sample size calculation based on propensity analysis in nonrandomized trials. Journal of Biopharmaceutical Statistics. 2007;17:35–41. doi: 10.1080/10543400601044790. [DOI] [PubMed] [Google Scholar]
  • 7.Li SH, Simon RM, Gart JJ. Small sample properties of the Mantel-Haenszel test. Biometrika. 1979;66:181–183. [Google Scholar]
  • 8.Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute. 1959;22:719–748. [PubMed] [Google Scholar]
  • 9.Nam JM. Sample size determination for case-control studies and the comparison of stratified and unstratified analyses. Biometrics. 1992;48:389–395. [PubMed] [Google Scholar]
  • 10.Nam JM. Power and sample size for stratified prospective studies using the score method for testing relative risk. Biometrics. 1998;54:331–336. [PubMed] [Google Scholar]
  • 11.Westfall PH, Zaykin DV, Young SS. Multiple tests for genetic e ects in association studies. In: Looney Stephen., editor. Methods in Molecular Biology, vol. 184 Biostatistical Methods. Humana Press; Toloway, NJ: 2002. pp. 143–168. [DOI] [PubMed] [Google Scholar]
  • 12.Woolson RF, Bean JA, Rojas PB. Sample size for case-control studies using Cochran's statistic. Biometrics. 1986;42:927–932. [PubMed] [Google Scholar]
  • 13.Zelen M. The analyses of several 2×2 contingency tables. Biometrika. 1971;58:129–137. [Google Scholar]

RESOURCES