Abstract
Sequential designs can be used to save computation time in implementing Monte Carlo hypothesis tests. The motivation is to stop resampling if the early resamples provide enough information on the significance of the p-value of the original Monte Carlo test. In this paper, we consider a sequential design called the B-value design proposed by Lan and Wittes and construct the sequential design bounding the resampling risk, the probability that the accept/reject decision is different from the decision from complete enumeration. For the B-value design whose exact implementation can be done by using the algorithm proposed in Fay, Kim and Hachey, we first compare the expected resample size for different designs with comparable resampling risk. We show that the B-value design has considerable savings in expected resample size compared to a fixed resample or simple curtailed design, and comparable expected resample size to the iterative push out design of Fay and Follmann. The B-value design is more practical than the iterative push out design in that it is tractable even for small values of resampling risk, which was a challenge with the iterative push out design. We also propose an approximate B-value design that can be constructed without using a specially developed software and provides analytic insights on the choice of parameter values in constructing the exact B-value design.
Keywords: B-Value, Bootstrap, Permutation, Sequential Design, Approximation
1 Introduction
When we implement Monte Carlo hypotheses tests (MC tests) such as bootstrap or permutation tests, computation time can be often saved by early stopping. The main idea is to stop early and reject/accept the null hypothesis if the early replications provide enough evidence for a significant/insignificant p-value of the original MC test. A simple sequential design is a curtailed design where we stop and reject or accept the null hypothesis when we have the replications enough to ensure the rejection or acceptance of the null hypothesis with a full enumeration of the MC test. Fay et al. (2007) proposed using a truncated sequential probability ratio (tSPRT) boundary which is minimax with respect to the resampling risk, and provided an algorithm to calculate a valid p-value after reaching the stopping boundary. As discussed in Fay et al. (2007), there is another way to determine an MC boundary: by minimizing the resampling risk over a class of possible distributions for the p-value. Fay and Follmann (2002) used the class of distributions for the p-values generated by location shifts for a standard normal test statistic and approximated the associated distributions for the p-values within this class using beta distributions. Within this class of distributions, Fay and Follmann (2002) found by numerical search the parameter values which gave the largest resampling risk for fixed boundaries, defining a “worst case” distribution. Their approach was then to determine the design for the MC test which bounded the resampling risk for that worst case distribution and would therefore bound the resampling risk for the entire class of distributions. Fay and Follmann (2002) used this approach in motivating the iterative push out (IPO) design. In Fay and Follmann (2002), the IPO boundary was shown to provide savings in the expected resample size compared to the fixed resample size design and a simple curtailed design. Its implementation, however, is not practical for small values of resampling risk.
Fay et al. (2007) did not fully explore using the Fay and Follmann (2002) approach for bounding the resampling risk within a class of distributions for the p-value, but only plotted resampling risk as a function of fixed p-values. In this paper, we explore the Fay and Follmann (2002) approach and focus on sequential designs which bound or approximately bound the resampling risk within the previously mentioned class of distributions for the p-value. In doing so, we use the B-value boundary, which is a tSPRT boundary written in terms of a B-value proposed by Lan and Witts (1988). The B-value is a statistic which produces the expected value of the test statistic at the end of study, conditional on information up to some point, and we make a decision based on this projected value. We construct the B-value boundary either by using the numerical algorithm introduced in Fay et al. (2007), FKH algorithm, or by using approximations introduced in this paper. We call the latter design the approximate B-value design, and it is much easier to construct, but it only provides a decision rule, without estimating the p-value. In order to numerically construct the B-value boundary, we implement the algorithm developed in Fay et al. (2007), which unlike the IPO design, provides a tractable design to bound the resampling risk below small values (e.g., below 0.01). Considering the class of the p-value distributions used in Fay and Follmann (2002), we empirically show that the B-value design can provide expected resample sizes significantly smaller than those of the fixed resample size and the simple curtailed sequential designs, and the expected resample size comparable to the IPO design. As another way of implementing the B-value design, we propose a method to construct an approximate B-value boundary by using results developed in sequential analysis. The approximate B-value design asymptotically bounds the resampling risk and the associated test asymptotically maintains its size. These approximations help one to understand analytic characteristics of the sequential MC test, and the approximate B-value design can be implemented without an aid of a special algorithm.
This paper is organized as follows. In Section 2, we introduce the notation and define the B-value design. We also summarize some exact properties of the B-value design, compared to the fixed resample size, curtailed and IPO designs. In Section 3, we propose the approximate B-value design, which only requires a simple numerical integration in order to make an accept/reject decision early, subject to a desirable resampling risk. Section 4 includes an example and further remarks.
2 B-value Design
2.1 Overview
Let T be a test statistic, whose large values lead to the rejection of the null hypothesis, H0, and T0 = T(d0) denote the value of T observed for the original sample. When independent replications of data, say , are taken and we obtain , the p-value of the MC test is defined as p(d0) = P(Ti ≥ T0∣d0) for all possible replications of T. For given d0, consider p(d0) = p as fixed. With such p, our goal is to stop taking Monte Carlo samples when we are reasonably sure that either p ≤ α (i.e., the decision after an infinite resample is to reject the original null hypothesis) or p > α (i.e., the decision after an infinite resample is to fail to reject the original null hypothesis). One way to think about the Monte Carlo problem is that we have independent and identically distributed random variables, Xi = I(Ti ≥ T0), each distributed as a Bernoulli distribution with parameter p, and we want to design a study to decide between the two possibilities for the parameter p, either p ≤ α or p > α.
Suppose that X1, X2, …, are the independent and identically distributed Bernoulli random variables with parameter p, and consider the problem of testing versus . Various sequential procedures can be applied to conduct the testing of against , and in this paper we use the B-value proposed by Lan and Wittes (1988) to project the current trend to the end of a study. For our purposes, we can calculate the B-value after every Monte Carlo sample, and use these B-values to efficiently conduct a sequential testing.
Consider the statistic Zn to test whether p = α or not with n observations,
where . For the pre-determined maximum number of resamples, m, the B-value at is defined as
Consider a sequential test where we stop resampling at the n-th sample and reject if for some n ≤ m (and thus do not reject the original null hypothesis H0), or stop and fail to reject if for some n ≤ m (and thus reject the original null hypothesis H0) where c1 and c2 are appropriately chosen constants. This sequential test produces parallel boundaries on plots of (n, Sn). We modify it by incorporating the idea of a simple curtailed test, which stops at the n-th sample and rejects if Sn ≥ ⌊α(m+1)⌋ = r1, and stops and accepts if n − Sn ≥ ⌈(1 − α)(m+ 1)⌉ = r0. Let
and define
where is the smallest value of n such that U(n) ≤ n and is the smallest n for which L(n) ≥ 0. Also, denote the smallest value of n such that L(n) ≤ n − r0 as and the smallest value of n such that U(n) ≥ r1 as . Then, the stopping boundary is formed by B = BUpper ∪ BLower. See Figure 1 for the boundary with c1 = −c2 = 2.58281, m = 3,620, and α = 0.05, for which , , , and .
Figure 1.
A B-value Boundary with m = 3, 620, α = 0.05 and c1 = −c2 = 2.58281
2.2 The B-value Design Bounding the Resampling Risk
Let the stopping rule for the design be denoted by a b × 2 matrix, B, such that the j-th row denotes values of (n, s) where sampling stops, and consider a sequential MC test φ defined on the B-value boundary as follows: φ(n, s) = 0 would lead us not to reject the original null hypothesis (i.e. reject ) and φ(n, s) = 1 rejects the original null hypothesis (i.e. do not reject ). Then, the size of the sequential MC test, φ, is estimated as α*(φ, B) = EUEB (φ∣P), where EB is the expectation over the Monte Carlo sampling design after replacing p with the random variable P, and EU represents expectation with respect to P under the continuous uniform distribution. For the B-value design with φ defined this way, we require the design to bound the resampling risk under the distribution for P, both under the null and a class of alternative distributions, where the resampling risk,
is the expected probability that the accept/reject decision is different from complete enumeration for some distribution of P ∼ F. In other words, we require γ(φ, F, B) ≤ γ* for all F ∊
, where
is the class of distributions which includes the null (i.e., the uniform) and many alternative distributions, and γ* is some pre-specified bound on the resampling risk.
Fay and Follmann (2002) proposed the design to start with a simple design and to iteratively “push out” the boundary at the point where not stopping will decrease the resampling risk the greatest per added expected sample size. The pushing out algorithm will not be reviewed here, but we note that it is computationally expensive to carry out, so that IPO designs that bound the resampling risk less than or equal to γ* = 0.01 are intractable. Fay and Follmann (2002) found distributions that met the resampling criterion by first defining a class of designs which may be expanded in some systematic way such that γ gets smaller as the designs get larger, then finding the distribution F̂P ∊
which appears to give the largest values of γ(φ, F̂P, B) for each member of the class of designs, and finally pick the smallest design in the class such that γ(φ, F̂P, B) ≤ γ*. They estimated FP with beta distributions and then searched for the F̂P which gave the largest resampling risk for fixed boundaries of different sizes over all possible values of β, the probability of Type II error, and the numerical search showed that 1 − β ≈ .47 gave the largest resampling risk for fixed boundaries with α = 0.05. As discussed in Fay et al. (2007), the tSPRT boundary or equivalently the B-value boundary is then can be determined by searching the values of c1 and c2 for fixed m such that γ(φ, F̂*, B) < γ*, where F̂* is the “worst case” distribution chosen as in Fay and Follmann (2002).
In the rest of this section, we consider the B-value boundary of Section 2.1 and the test based on the valid p-value proposed in Fay et al. (2007), φFKH. Their valid p-value when (n, Sn) is a boundary point is defined as p̂v(Sn, n) = Fp̂MLE(Sn/n), where p̂MLE is the maximum likelihood estimator of p, p̂MLE(Sn, n) = Sn/n, for any closed boundary and Fp̂ is a cumulative distribution function of p̂. Fp̂ is defined in (5.2) of Fay et al. (2007) and can be computed by using the FKH algorithm. For the B-value boundary obtained by the FKH algorithm and the test function defined as
the following tables indicate the efficiency of the B-value design. In Table 1 we compare the B-value design with various values of c = c1 = −c2 to the IPO design of Fay and Follmann (2002) for α = .05 and γ* = .025. The distributions of the p-value considered in the tables are F̂* chosen as the “worst case” beta distribution, a uniform distribution, and two point mass distributions. Following the notation of Fay and Follmann (2002), we let Hα,1−β be a distribution such that Hα,1−β(p) = 1 − Φ{Φ−1(1 − p) − Φ−1(1 − α) + Φ−1(β)} where Φ is a cumulative distribution function of a standard normal distribution, and Ĥα,1−β denote the beta distribution whose mean is same as the mean of Hα,1−β and that Ĥα,1−β(α) = Hα,1−β(α) = 1 − β. In this numerical experiment, we use F̂* = Ĥ0.05,0.47 following Fay and Follmann (2002), and consider the case of c = c1 = −c2 and ε = 2(1 − Φ(c)). The choices of ε (or c) are rather arbitrary in Tables 1 and 2, while m is chosen to be the smallest sample size to bound the resampling risk for c given. These values of m in Tables 1 and 2 are chosen by using the FKH algorithm, which required to repeatedly construct the B value boundary and to compute the associate risk, and some insights on such choices are provided in Section 3 with the approximate B-value design. We see that the expected resample size, E(N), for the B-value design with ε = 0.05 is close to the E(N) from the IPO design. The E(N) for the B-value designs decrease with increasing ε (or decreasing c), and some B-value designs have lower E(N) than the IPO design for most distributions tried. Note that changing the value of ε (or c) does not change the resampling risk within the range of values tried, which will be discussed further with the approximate B-value design in the next section. In Table 2, we compare the B-value design to the fixed resample size design and the curtailed design at γ* = 0.01, a value for which the IPO design is intractable. We see that the B-value designs can give much lower values of E(N) than the other designs, indicating that the B-value design is more efficient than the IPO and simple curtailed designs. In general, we see that for larger values of ε (i.e., smaller values of c), the B-value design shows smaller E(N). Thus, at any m predetermined, it would be desirable to use the design with the smallest value of c, while bounding the resampling risk.
Table 1.
E(N∣B) with α = 0.05, P ∼ B, and γ(φ, Ĥα,.47, B) = 0.025
| Ĥα,.47 | Uniform | Point mass at α | Point mass at 0.001 | |||||
|---|---|---|---|---|---|---|---|---|
| E(N) | Risk | E(N) | Risk | E(N) | Risk | E(N) | Risk | |
| B-value Design | ||||||||
| (ε, m) = (0.005, 580) | 279.637 | 0.02499 | 87.812 | 0.00719 | 536.429 | 0.52660 | 301.020 | 0 |
| (ε, m) = (0.025, 580) | 242.704 | 0.02499 | 76.056 | 0.00719 | 534.795 | 0.52660 | 240.816 | 0 |
| (ε, m) = (0.05, 580) | 221.092 | 0.02499 | 69.231 | 0.00719 | 530.807 | 0.52657 | 210.204 | 0 |
| (ε, m) = (0.1, 580) | 194.643 | 0.02500 | 61.072 | 0.00720 | 518.236 | 0.52627 | 176.531 | 0 |
| (ε, m) = (0.2, 600) | 163.118 | 0.02475 | 51.169 | 0.00713 | 494.378 | 0.52310 | 139.796 | 0 |
| IPO Design | ||||||||
| m=576 | 213.508 | 0.0250 | 62.85 | 0.0072 | 459.6047 | 0.522 | 251.7032 | 0 |
Table 2.
E(N∣B) with α = 0.05, P ∼ B, and γ(φ, Ĥα,.47, B) = 0.01
| Ĥα,.47 | Uniform | Point mass at α | Point mass at 0.06 | |||||
|---|---|---|---|---|---|---|---|---|
| E(N) | Risk | E(N) | Risk | E(N) | Risk | E(N) | Risk | |
| B-value Design | ||||||||
| (ε, m) = (0.005, 3620) | 954.475 | 0.00999 | 292.708 | 0.00289 | 3507.646 | 0.51065 | 2861.852 | 0.00426 |
| (ε, m) = (0.025, 3620) | 803.470 | 0.00999 | 245.940 | 0.00289 | 3485.032 | 0.51065 | 2575.559 | 0.00426 |
| (ε, m) = (0.05, 3620) | 723.584 | 0.00999 | 221.320 | 0.00289 | 3444.662 | 0.51063 | 2360.361 | 0.00426 |
| (ε, m) = (0.1, 3620) | 626.848 | 0.00999 | 191.519 | 0.00289 | 3334.562 | 0.51049 | 2060.366 | 0.00430 |
| (ε, m) = (0.2, 3681) | 510.256 | 0.01000 | 155.840 | 0.00289 | 3076.974 | 0.51058 | 1671.372 | 0.00452 |
| Curtailed Design | ||||||||
| m=3620 | 2371.276 | 0.00999 | 722.753 | 0.00289 | 3515.422 | 0.50913 | 3016.334 | 0.00431 |
| Fixed Design | ||||||||
| m=3620 | 3620 | 0.00999 | 3620 | 0.00289 | 3620 | 0.51065 | 3620 | 0.00426 |
3 Approximate B-value Design
Although much more tractable than the IPO design, the numerical calculation of the B-value design in Section 2.2 is still computationally expensive and requires a specially developed software. We now consider an alternative design which we call the approximate B-value design. This design does not require a special program to construct the boundary, and this approximate B-value design would also provide some analytic insights on the properties of the B-value design illustrated in Section 2.2.
In this section, we use the B-value boundary defined in Section 2.1, but define the test function φ as follows: φ(n, s) = 1 if (n, s) ∊ BLower without reaching BUpper, and φ(n, s) = 0 if (n, s) ∊ BUpper without reaching BLower. Then (n, s) such that φ(n, s) = 0 would lead us not to reject the original null hypothesis and (n, s) such that φ(n, s) = 1 is the value to reject the original null hypothesis. We also let N denote the resample size at stopping. We note that φ of this section, the one based only on which boundary is crossed, is not exactly equivalent to φFKH of Section 2.2, which is based on the valid p-value estimated at each stopping point. Although not reported here, we observed in cases we tried that φFKH and φ are different only in a very small portion of the boundary, typically in the curtailed part, at which the valid p-value is close to α. In this sense, we call the test proposed in this section an approximate B-value test. Now, aiming to construct an approximate B-value design bounding the resampling risk, we discuss how to determine c1, c2, and m, and thus B such that γ(φ, F, B) ≤ γ* for a given choice of γ*. Since the approximate design does not automatically achieve the size of α, we also examine the size of the test following the approximate B-value design. Finally, we review how to estimate the expected resample size for the B-value design.
3.1 Approximate Upperbound of the Resampling Risk
In this section we show how to determine approximate bounds on the resampling risk given a B-value stopping boundary and the test function φ defined above. Suppose that a Monte Carlo sample forms a discrete stochastic process {X1, X2, …}, where Xi has a Bernoulli distribution with E(Xi) = p, and let . For the approximate B-value test associated with the B-value boundary of Section 2.1, we can summarize the decision rule using the following two random variables denoting the minimum values of n at which each boundary is crossed:
Recall that the resampling risk is γ(φ, F, B) = EFE(φI(P > α) ∣ P) + EFE((1 − φ)I(P ≤ α) ∣ P) where P ∼ F. Then for α < p < 1,
and for 0 < p ≤ α,
In order to use asymptotic results developed for the B-values, consider the stopping boundaries formed only by the B-values without considering the truncation conditions: for each p, let
and
where , , and asymptotically behaves like a Brownian motion with drift . Then,
and
By using some asymptotic results developed in the context of sequential analysis, we obtain
| (1) |
where f is a density function of P, and
with ρ ≅ 0.583, a correction factor for a discrete process.
See Appendix for the derivation of the upperbound in (1). Note that I1 and I2 depend only on α and m, not on c1 and c2, and the major contribution to I1 + I2 comes from those p close to α. Also R1 + R2 is negligible compared to I1 + I2. These are expected from the formulations of I1, I2, R1 and R2 as shown on the right sides of the expressions above. Since I1 works as an approximate bound of the resampling risk associated with stopping at the lower boundary between and m, and I2 serves as an approximate bound of the resampling risk associated with stopping at the upper boundary between and m, we note that the major contribution of the resampling risk comes from decisions made on these parts of the boundary, BUpper for and BLower for , rather than those on the parallel boundary points, BUpper for and BLower for .
Since I1 and I2 decrease as m increases at a given α, we can choose m to make I1 + I2 slightly less than γ*. Then any choice of (c1, c2) to make R1 + R2 ≤ γ* − I1 − I2 would asymptotically guarantee the design to bound the resampling risk. However, the larger the values of c1 and |c2| are chosen, the larger resample size is expected, and thus, at m given or predetermined, it would be desirable to choose c1 and |c2| as small as possible while bounding the resampling risk.
The approximate B-value design can be implemented as follows. For a given choice of α and γ* such that γ(φ, Ĥα,.47, B) ≤ γ*, we first choose m such that I1+I2, for f ∼ Ĥα,.47, is slightly smaller than γ*. Then for such m, we find c = c1 = −c2 such that g(c) = R1 +R2 ≤ γ* − I1 − I2. For example, if we want to achieve α = 0.05 and γ(φ, Ĥ0.05,0.47, B) ≤ 0.01, we can use a simple integration to find I1 + I2 = 0.005046 + 0.004930 = 0.009976 for the choice of m = 3,620 and f ∼ Beta(a = 0.3889, b = 2.5234) ≈ Ĥ0.05,0.47 of Table 2. Finding c such that g(c) = R1 + R2 ≤ γ* − I1 − I2 = 0.000024 would then bound the resampling risk to be under 0.01. We now note that for the values of c in Table 2, 2.241, 1.960, and 1.645 corresponding to ε = 2(1 −Φ(c)) = 0.025, 0.05, and 0.1, respectively, g(c) = R1 + R2 increases as c decreases and R1 + R2 < 1.9 × 10−7, < 2.1 × 10−6, and < 2.3 × 10−5, respectively. Thus the smallest value of c among these three, asymptotically bounding the resampling risk under γ* = 0.01, is c = 1.645, which matches with Table 2, and the numerical search over all c less than or equal to 2.241 resulted in c = 1.635 for α = 0.05, γ* = 0.01 and m = 3,620. This indicates that for any m given or predetermined, we can first compute I1 + I2 to get an idea on γ* and then can easily find the smallest design (i.e. c) satisfying the constraint on the resampling risk, while the construction of the exact boundary and the choice of m and c associated with φ FKH would require rather time consuming searches.
Remark 1. The first upperbound of the above expression can be improved by incorporating and , but these probabilities are typically negligible compared to and , respectively.
Remark 2. When the distribution of P is a point mass at α, the resampling risk is greater than 0.5 since
and and are negligible. This explains the values of the resampling risk in Tables 1 and 2 obtained for the point mass at α.
3.2 Size of the Approximate B-value Design
Based on the similar arguments used in Section 3.1, for α given and large m,
where , ρ ≅ 0.583, and g12(p, c) is obtained with η = 0, , and in Equation 3.28 of Siegmund (1985).
Then, for large m, an upperbound for the size of the test, α*(φ, B), is obtained as
where , , , and .
We note that α2 and α4 are negligible compared to α1 and α3, and also that α3 is relatively smaller than α1. The former is because is negligible with Pr(τ− < m) being small for p > α and Pr(Wm > 0) being small for p ≤ α, and the latter is expected from a positive drift of Wm when p > α. This is illustrated by numerical integration results summarized in Table 3. In Table 3, we observe that the major contribution to α̃ comes from α1 + α3, which only depends on m, not on c, and α̃ is usually slightly over α. Considering that and that is negligible, the approximate B-value design is expected to retain the size, at least approximately. Note that c = 0.908 in the last case was obtained as a smallest value of c bounding the approximate upperbound of the resampling risk, γ̃, under 0.01 at m = 4,999.
Table 3.
Approximate Size
| c | ε = 2(1 − Φ(c)) | γ̃ = I1 + I2 + R1 + R2 | α | m | α1 | α3 | α̃ |
|---|---|---|---|---|---|---|---|
| 2.241 | 0.025 | 0.024842 | 0.05 | 580 | 0.046756 | 0.004016 | 0.050772 |
| 2.241 | 0.025 | 0.009976 | 0.05 | 3620 | 0.048616 | 0.001509 | 0.050124 |
| 2.241 | 0.025 | 0.012958 | 0.01 | 3620 | 0.009404 | 0.000732 | 0.010135 |
| 0.908 | 0.364 | 0.009994 | 0.05 | 4999 | 0.048814 | 0.001276 | 0.050451 |
3.3 Expected Resample Size
As discussed in Section 3.1, the approximate B-value design can be determined by choosing m to make I1 + I2 smaller than the desirable resampling risk bound of γ* and then by choosing the smallest c such that R1 + R2 ≤ γ* − I1 − I2. We now introduce approximations for E(N), which would help us to study asymptotic efficiencies of B-value designs chosen to bound the resampling risk. As discussed in Sections 3.1 and 3.2, the B-value stopping time can be approximated as a stopping time of a Brownian motion process observed in discrete time, Wn, associated with a truncated parallel boundary, and thus we can use (1.15) of Samuel-Cahn (1974) or (3.17) of Siegmund (1985) with the correction suggested for the discrete time scale. Using these arguments, for Tb = min {n : Wn ≥ b}, we obtain
where , , and ρ ≅ 0.583. Using similar arguments in (3.37) of Siegmund (1985), we then may use
| (2) |
as an approximate lower bound of Ep[min(T,m)] for T = min(Tb,T−b). For the settings in Tables 1 and 2 with Ĥα,.47 and ε = 0.005, 0.025, 0.05, and 0.1 (or c = 2.807, 2.241, 1.960, and 1.645), we find as 291.836, 249.616, 226.213, and 197.680 for m = 580, and 965.547, 810.270, 727.441, and 629.124 for m = 3,620, respectively. These bounds based on (2) are only approximate and also Ep(N) ≤ Ep(min(T,m)), but as discussed in Section 3.6 of Siegmund (1985), we observe that these approximations are close to E(N) in Tables 1 and 2. This indicates that one can use (2) to get a quick and reasonably accurate estimate of the expected resample size without using the FKH algorithm to numerically construct the boundary and computing E(N) for each choice of m and c, which helps one to efficiently choose m and c comparing approximate values of E(N).
4 Example and Discussion
Now, we illustrate the application of the B-value design to the permutation test of Kim et al. (2000) developed to determine the number of change-points in a segmented line regression model. They noted that the test statistic to test the null hypothesis that there are k0 change-points against the alternative that there are k1 change-points for 0 ≤ k0 < k1, does not follow the classical asymptotic theory, so suggested to use a permutation distribution of the test statistic to estimate the p-value. Joinpoint, software developed by Kim et al. (2000) and available at http://srab.cancer.gov/joinpoint, sequentially conducts the permutation tests to select the number of change-points, and the sequential Monte Carlo tests proposed in Fay et al. (2007) and also in this paper are expected to improve the computational efficiency of Joinpoint. Figure 2 includes cancer incidence rates for (a) Hodgkin lymphoma and (b) Liver and bile duct cancer. These incidence rates are the age adjusted rates for all races and both sexes combined for the SEER-9 population, which includes about 9.5 % of the U.S. population (Surveillance, Epidemiology and End Results Program at National Cancer Institute). The permutation tests of no change-point versus one change-point implemented in R with 3,620 permutated data sets produced the p-values of 0.3944 and 0.0011 for Hodgkin Lymphoma and Liver and bile duct cancer, respectively, which led us to choose the straight line regression model for Hodgkin lymphoma and the one change-point model for Liver and bile duct cancer. Two plots in the lower part of Figure 2 show the trajectories of the B-value test with m = 3, 620, α = 0.05 and c = 2.58281 for these two cancer sites, where we plotted , marked as a triangle at each n, and a portion of the parallel boundary in Figure 1 up to n = 1000. Sn crossed the upper boundary when n = 93 for Hodgkin lymphoma, implying to reject , while Sn crossed the lower boundary at n = 698 for Liver and bile duct cancer, leading us not to reject . From the discussion in Sections 2 and 3, we know that the resampling risk of this B-value test is less than or equal to 0.01 and it is much more efficient than the simple curtailed test with r0 = 3,440 and r1 = 181.
Figure 2.
Hodgkin lymphoma and Liver and bile duct cancer
In this paper, we considered the B-value design to conduct MC tests with a less number of replications while we control the resampling risk. To control the resampling risks, we considered the class of p-value distributions as in Fay and Follmann (2002), and compared the performance of the B-value design with those of a simple curtailed design and the IPO design proposed in Fay and Follmann (2002). The exact B-value design which can be obtained by using the FKH algorithm is shown to be more efficient than the curtailed design in terms of the number of resamples, and than the IPO design in terms of the computational complexity. We then proposed the approximate B-value design which provides some analytic insights on the choice of parameters in numerically constructing the B-value design and which also serves as an approximate alternative that does not require a special algorithm.
There are some open problems that we leave for future research. Although the method of this paper works for a general choice of c1 and c2, we considered only the symmetric choice of c1 and c2 in numerical examples. Such symmetric choice of c1 and c2 is equivalent to the truncated SPRT boundary of Fay et al. (2007) with the Type I and II errors set equally, and it might be of interest to consider asymmetric choices of c1 and c2. Exploring how to construct more general designs aiming to minimize the expected resample size involves more details and requires further research. Also, the resampling risk defined in this paper gives the equal weight on Type I and II errors, and one may want to consider the resampling risk differently weighing the two types of errors. For a specific choice of weight, the method of Section 3 can be easily applied to approximate the resampling risk, size, and the expected resample size, but how to choose weights needs to be studied.
In this paper, we discussed how to construct the best design at m given or predetermined to have I1 + I2 < γ*, empirically showing that E(N) decreases as c = c(m) decreases for m given. As indicated in numerical examples, however, there are many designs, that is, many choices of (m, c), that satisfy the same constrains on α and γ*, so the problem of determining an optimal design over all (m, c) is an interesting problem to pursue.
In order to conduct a sequential test for versus , there are other sequential procedures proposed in the literature. For example, one may consider the likelihood ratio boundary as follows. For , the sequential likelihood ratio test rejects if
for some m0 ≤ k ≤ mf and a > 0, where L denotes the likelihood function. This condition is equivalent to
which can be written in terms of the following stopping time:
where H(x) = x ln x + (1 − x) ln (1 − x) − x ln(α) − (1 − x) ln(1− α). Then, some asymptotic results, as in Siegmund (1985, Chapter 5), can be used to construct an asymptotic sequential design, but it requires details more complicated than those in Section 3. We plan to pursue it in our future research to compare the performance of the B-value design with other sequential boundaries such as the likelihood ratio boundary.
Acknowledgments
Kim's research was partially supported by NIH Contract HHSN 261200700273P. The author thanks Dr. Michael P. Fay for very helpful discussions and extensive assistance to improve the presentation of the paper and Mark Hachey for the initial construction of Tables 1 and 2.
Appendix: Approximate Upperbound for γ(φ, F, B)
Note that
and
where , and ρ ≅ 0.583. Thus,
and
Therefore, when c1 = −c2 = c, for large m, we have
Note that works as an upperbound of , which makes a most contribution to γ1 with negligible. Similarly, the major contribution to γ2 comes from and is negligible.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Fay MP, Follmann DA. Designing Monte Carlo implementations of permutation or bootstrap hypothesis tests. American Statistician. 2002;56:63–70. [Google Scholar]
- Fay MP, Kim HJ, Hachey M. On using truncated sequential probability ratio test boundaries for Monte Carlo implementation of hypothesis tests. Journal of Computational and Graphical Statistics. 2007;16:946–967. doi: 10.1198/106186007X257025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim HJ, Fay M, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Statistics in Medicine. 2000;19:335–351. doi: 10.1002/(sici)1097-0258(20000215)19:3<335::aid-sim336>3.0.co;2-z. [DOI] [PubMed] [Google Scholar]
- Lan KKG, Wittes J. The B-Value: A tool for monitoring data. Biometrics. 1988;44:579–585. [PubMed] [Google Scholar]
- Samuel-Cahn E. Repeated significance test II, for hypotheses about the normal distribution. Communications in Statistics. 1974;3(8):711–733. [Google Scholar]
- Siegmund D. Sequential Analysis. New York: Springer-Verlag; 1985. [Google Scholar]


