Summary
Exact unconditional tests have been widely applied to test the difference between two probabilities for 2×2 matched-pairs binary data with small sample size. In this context, Lloyd (2008, Biometrics 64, 716–723) proposed an E + M p-value, that showed better performance than the existing M p-value and C p-value. However, the analytical calculation of the E + M p-value requires that the Barnard convexity condition be satisfied; this can be challenging to prove theoretically. In this paper, by a simple reformulation, we show that a weaker condition, conditional monotonicity, is sufficient to calculate all three p-values (M, C and E + M) and their corresponding exact sizes. Moreover, this conditional monotonicity condition is applicable to non-inferiority tests.
Keywords: Barnard convexity, conditional monotonicity, exact unconditional test, matched-pairs, non-inferiority test
1. Introduction
Lloyd (2008) considered exact unconditional tests for binary matched-pairs data when the sample size is small. A 2 × 2 table (as Table 1 in Lloyd (2008)) can summarize this type of data in four random cell counts X = (X00, X01, X10, X11) where Xij, i, j = 0, 1, is the number of matched-pairs with responses i for Treatment 1 and j for Treatment 2. Vector X is assumed to have a multinomial distribution with probabilities π = (π00, π01, π10, π11). The problem of interest is to test the difference between success probabilities in the two groups, i.e., H0: π1+ ≤ π+1 versus H1: π1+ > π+1, where π1+ = π11 + π10 for Treatment 1 and π+1 = π11 + π01 for Treatment 2. This is equivalent to testing
| (1) |
To calculate p-values (M,C and E + M) and their exact sizes for testing (1), Lloyd (2008) used the Barnard Convexity Condition (BCC) to reduce the two-dimensional nuisance parameter space to one dimension. However, a rigorous proof that the BCC is satisfied is not always available. Therefore, numerical methods are often used to verify the BCC; this is the case for the E p-value when calculating the E + M p-value (Lloyd, 2008) and for the C p-value when calculating its exact size (Berger and Sidik, 2003).
In this paper, we show that the conditional monotonicity condition (CMC), a weaker condition than BCC, is sufficient for the dimension reduction in the calculation of p-values (M,C and E + M) and their exact sizes for testing (1) and that this dimension reduction technique can be extended to the non-inferiority case.
2. The BCC and the CMC
Denote observed values of X by (x00, x01, x10, x11) and let k = x10 + x01 be the number of discordant pairs and x = x01. Following Lloyd (2008), we consider three statistics: McNemar’s test statistic, the likelihood ratio test statistic and the conditional sign test statistic. For any of the three test statistics above, denoted by T(x, k), the BCC (Lloyd, 2008) is satisfied if
| (2) |
for all (x, k). That is, T(x, k) is nondecreasing in x for any fixed k, and is nonincreasing in k for any fixed x. Note that Berger and Sidik (2003) used a slightly different definition of the BCC: T̃(x01, x10) ≙ T(x01, x01 + x10) is nondecreasing in x01 and nonincreasing in x10. It is easy to prove that this definition is stronger than Lloyd’s definition in (2).
The conditional monotonicity condition (CMC) is defined as
| (3) |
which is only one of the constraints in (2), and thus, is a weaker condition than either Lloyd’s BCC or Berger and Sidik’s BCC. For example, while the test statistic satisfies the CMC, T(x, k) does not satisfy the BCC by either definition. Note that McNemar’s, the likelihood ratio and the conditional sign test statistics all satisfy the CMC.
3. The p-value calculation
Following Lloyd (2008), we reparameterize π01 and π10 by φ and η as φ:= π10 + π01 and η:= π01/φ. Then the null space H0 = {(π01, π10): π01 ≥ π10} = {(φ, η): 0 ≤ φ ≤ 1, η ≥ 1/2}. When the test statistic T(x, k) satisfies the CMC, let h(t, k) = sup{x: T(x, k) ≤ t}, which is also nondecreasing in t. We define h(t, k) to be -1 if the supremum does not exist. Using the definition of h(t, k), for a given observed statistic tobs, the rejection set can be rewritten as {(x′, k′): T(x′, k′) ≤ tobs} = {(x′, k′): 0 ≤ k′ ≤ n, 0 ≤ x′ ≤ h(tobs, k′)}.
3.1 M p-value and C p-value
We adopt the notion that smaller values of test statistics favor H1 and note that the probability mass function of (X, K) can be decomposed into P(φ, η)(X = x, K = k) = b(k; n, φ)b(x; k, η) where b(x; n) is the binomial probability mass function (Lloyd, 2008). We also note that the M p-value is a special case of the C p-value; therefore, we focus our discussion on the C p-value.
Berger and Boos (1994) proposed a C p-value that confines the maximization within a well-behaved confidence interval in a general setting. Berger and Sidik (2003) adopted this C p-value definition for 2×2 matched-pairs data. Let Iγ be a 100(1 − γ)% confidence interval for φ (Berger and Sidik, 2003). Now we can reformulate the C p-value as
| (4) |
The last equality in (4) results from the fact that B(x; n, p) is nonincreasing in p for fixed x and n. Note that when γ = 0, the confidence-interval region is the null space H0 and thus the C p-value becomes the M p-value which is defined as sup(φ, η)∈H0 P(φ, η){T (X, K) ≤ tobs} (Suissa and Shuster, 1991) and equals . Thus, the maximization over two nuisance parameters (φ, η) in the M p-value and the C p-value can be reformulated into the maximization over the single parameter φ by using only the CMC property of the test statistic.
3.2 E + M p-value
The E + M p-value proposed by Lloyd (2008) consists of two steps. First the E p-value is defined as
| (5) |
where φ̂ = k/n is both the restricted and unrestricted MLE of φ under H0. Since the E p-value is not necessarily valid (for the definition of a valid p-value, see Berger and Boos, 1994), Lloyd (2008) further proposed a maximization step on the E p-value to obtain a valid p-value
| (6) |
where eobs = E(x, k).
It is difficult to prove that the E p-value is non-increasing for k when x is fixed and thus the E p-value may not satisfy the BCC. Lloyd (2008) numerically verified the BCC for the E p-value. Our Theorem 1 guarantees that the E p-value satisfies the CMC, so the dimension reduction technique in (4) can be applied on the E p-value to obtain the E + M p-value.
3.3 Exact size of the p-values
Let P̃ (x, k) denote any of the M, C and E + M p-values. The exact size of P̃ is defined as
where h̃ (α; k) = sup{x′: P̃ (x′, k) ≤ α}. Berger and Sidik (2003) numerically checked the BCC for the C p-value when using the dimension reduction to calculate the exact size for the C p-value. By the following Theorem 1, we can show that all of the p-values (M, C and E + M) satisfy the CMC and thus the same dimension reduction technique in (4) can be applied to calculate their exact sizes.
Theorem 1
If a test statistic T(x, k) satisfies the conditional monotonicity condition, i.e., it is nondecreasing in x for any fixed k, then the associated p-values, M(x, k), C(x, k), E(x, k) and (E + M)(x, k), also satisfy the conditional monotonicity condition.
Proof
For a fixed k and x1 ≤ x2, let tobs,1 = T (x1, k), tobs,2 = T (x2, k), and then tobs,1 ≤ tobs,2 as T (x, k) is nondecreasing in x. Note that h(t, k) is also nondecreasing in t, then h(tobs,1, k′) ≤ h(tobs,2, k′) for any k′∈ {0, …, n}. Thus we have B{h(tobs,1, k′); k′, 1/2} ≤ B{ h(tobs,2, k′); k′, 1/2}. Therefore, M(x, k) and C(x, k) are nondecreasing in x for fixed k. Note that when k is fixed, φ̂ is also fixed; therefore, E(x, k) in (5) and (E + M)(x, k) in (6) are nondecreasing in x.
4. Extension to Non-inferiority Trials
In practice, the non-inferiority test of versus is of interest. Several test statistics have been proposed to test ; these include the score statistic (Nam, 1997; Tango, 1998) and the likelihood ratio statistic (Lloyd and Moldovan, 2008). To the best of our knowledge, rigorous proofs for the BCC of the score and the likelihood ratio statistics are not available in the literature, neither are the proofs for the BCC of the E p-value and the C p-value. Here, we can prove that both of these test statistics satisfy the weaker condition CMC (the proof for the likelihood ratio test statistic is given in the Appendix), and so do the E p-value and the C p-value. We applied the same transformations φ = π01 + π10 and η = π01/φ. Note that φ has a lower bound δ under . Following the derivation in Section 3, the M p-value for the non-inferiority test is M(x, k) = supφ∈(δ,1) Φ(x, k, φ), where
The C p-value is C(x, k) = supφ∈IγΦ(x, k, φ) + γ, where Iγ is the 100(1−γ)% confidence interval for φ in Sidik (2003). In Lloyd and Moldovan (2008), the E p-value was based on the restricted MLE φ̃ of φ. We define the E p-value to be E(x, k) = F(x, k, φ̂), where φ̂ = (1 − δ)k/n+δ; φ̂ is always within [δ, 1] and is a function of k. Thus, we obtain the CMC of the E p-value that ensures the M step required for the E + M p-value. The E + M p-values obtained from these two different E p-values are practically identical from our numerical evaluations. Similar to the superiority case, the CMC is sufficient to calculate p-values and their exact sizes for non-inferiority tests with one dimensional maximization.
Acknowledgments
We would like to thank the associate editor and two referees for providing valuable comments that greatly improved the paper.
Appendix
The CMC of likelihood ratio test statistic in non-inferiority trials
By using transformations φ = π01+ π10 and η = π01/φ, the null space becomes {(φ, η): δ ≤ φ ≤ 1, η ≥ (δ + φ)/2φ }, where 0 ≤ δ ≤ 1. The log likelihood function can be written as l = l(φ, η) = k log φ + (n − k) log(1 − φ) + x log η + (k − x) log(1 − η) + C, where C is independent of the parameters (φ, η). The restricted MLE φ̃ of φ satisfies ∂l/∂φ = 0 under π01 − π10 = δ, i.e.,
| (A.1) |
A simple algebra manipulation verifies that φ̃ is the greater of the two solutions for (A.1) and is no less than δ. Furthermore, by taking derivative on x for (A.1), we obtain ∂φ̃/∂x · (−U) = 1/(φ̃ − δ) − 1/(φ̃+ δ), where U = x/(φ̃+ δ)2 +(k−x)/(φ̃ − δ)2 +(n−k)/(1− φ̃)2 is nonnegaive. Therefore, ∂φ̃/∂x ≤ 0.
Let (φ̂, η̂)={k/n, x/k} be the unrestricted MLE of (φ, η). Define D = l(φ̂, η̂) − l(φ̃, (δ + φ̃)/2φ̃), then the likelihood ratio statistic . With (A.1), we obtain d=∂D/∂x=log x(φ̃ − δ)/(k−x)(φ̃+ δ). And ∂δ/∂x = {1/(φ̃ − δ) − 1/(φ̃+ δ)}· ∂φ̃/∂x + 1/x + 1/(k − x). Here, ∂d/∂x ≥ 0 can be established if U ≥ x(k − x)/k{1/(φ̃ + δ)2 + 1/(φ̃ − δ)2 − 2/(φ̃ + δ)(φ̃ − δ)}, which is true by observing that x/(φ̃ + δ)2 ≥ x(k − x)/k(φ̃ + δ)2, and (k − x)/(φ̃ − δ)2 ≥ x(k − x)/k(φ̃ − δ)2.
Denote x0 = (k + nδ)/2. When x = x0, we have φ̂ = φ̃, so d(x0) is zero. Note that d is nondecreasing in x. If x ≥ x0, then d(x) ≥ 0 and thus D is nondecreasing, and so is ZLR. If x < x0, then d(x) ≤ 0 and thus D is nonincreasing, and then ZLR is nondecreasing due to the negative sign of 2x − k − n. Therefore, ZLR is nondecreasing in x for any fixed k.
References
- Berger RL, Sidik K. Exact unconditional tests for a 2 × 2 matched-pairs design. Statistical Methods in Medical Research. 2003;12:91–108. doi: 10.1191/0962280203sm312ra. [DOI] [PubMed] [Google Scholar]
- Berger RL, Boos DD. P values maximized over a confidence set for the nuisance parameter. Journal of the American Statistical Association. 1994;89:1012–1016. [Google Scholar]
- Lloyd CJ. A new exact and more powerful unconditional test of no treatment effect from binary matched pairs. Biometrics. 2008;64:716–723. doi: 10.1111/j.1541-0420.2007.00936.x. [DOI] [PubMed] [Google Scholar]
- Lloyd CJ, Moldovan MV. A more powerful exact test of noninferiority from binary matched-pairs data. Statistics in Medicine. 2008;27:3540–3549. doi: 10.1002/sim.3229. [DOI] [PubMed] [Google Scholar]
- Nam J. Establishing equivalence of two treatments and sample size requirements in matched-pairs design. Biometrics. 1997;53:1422–1430. [PubMed] [Google Scholar]
- Sidik K. Exact unconditional tests for testing non-inferiority in matched-pairs design. Statistics in Medicine. 2003;22:265–278. doi: 10.1002/sim.1261. [DOI] [PubMed] [Google Scholar]
- Suissa S, Shuster JJ. The 2 × 2 matched pairs trial: Exact unconditional design and analysis. Biometrics. 1991;47:361–372. [PubMed] [Google Scholar]
- Tango T. Equivalence test and confidence interval for the difference in proportions for the paired-sample design. Statistics in Medicine. 1998;17:891–908. doi: 10.1002/(sici)1097-0258(19980430)17:8<891::aid-sim780>3.0.co;2-b. [DOI] [PubMed] [Google Scholar]
