Abstract
This paper develops an empirical likelihood approach to testing for stochastic ordering between two univariate distributions under right censorship. The proposed test is based on a maximally selected local empirical likelihood statistic. The asymptotic null distribution is expressed in terms of a Brownian bridge. The new procedure is shown via a simulation study to have superior power to the log-rank and weighted Kaplan–Meier tests under crossing hazard alternatives. The approach is illustrated using data from a randomized clinical trial involving the treatment of severe alcoholic hepatitis.
Keywords: Crossing survival/hazard functions, Order restricted inference, Survival analysis, Two-sample problem
1. Introduction
When comparing survival patterns between two treatment groups in a randomized clinical trial (RCT), it is often of interest to examine whether there is a uniformly higher survival rate in one of the groups. For example, in a recent RCT involving patients with severe alcoholic hepatitis, the objective is to compare a combination therapy of prednisolone plus N-acetylcysteine with prednisolone alone. Testing whether the combination therapy has a consistently higher/lower survival rate (throughout the follow-up period) addresses the issue directly, as opposed to the standard practice of using an omnibus alternative (i.e., any difference between the survival functions). This paper develops such a testing procedure that allows us to establish an ordering between two survival curves uniformly over time.
We frame our approach in terms of the classical notion of stochastic ordering. Namely, a survival function S1 is said to be stochastically larger than another survival function S2 if S1(t) ≥ S2(t) for all t ≥ 0; this is denoted as S1 ⪰ S2. We consider the problem of testing the two-sided alternative
| (1) |
based on right-censored random samples from each population (≻ denotes ⪰ with strict inequality for some t). Our approach will first be developed for testing the one-sided alternative
| (2) |
and then extended to the two-sided alternative using the union-intersection principle. Our approach also leads to a test for the null hypothesis of stochastic ordering (S1 ⪯ S2 or S1 ⪰ S2) versus the alternative of crossing survival functions.
Commonly used two-sample tests for censored data include the log-rank test and weighted Kaplan–Meier (WKM) tests (Pepe and Fleming, 1989), and these tests can be one-sided and two-sided. The log-rank test is based on an integrated weighted difference between hazard functions, and is thus designed to detect ordered hazards instead of more general stochastic ordering. Other tests based on weighted differences between hazard functions, such as the -class of weighted log-rank statistics (Gill, 1980), also share this property. The WKM class of tests targets stochastically ordered alternatives by estimating an integrated weighted difference between survival functions, but such test statistics depend on an ad hoc weight function that needs to be specified throughout follow-up.
We derive our procedure using the empirical likelihood (EL) method. EL involves forming a ratio of two nonparametric likelihoods subject to constraints on the parameters of interest. The method originates with Thomas and Grunkemeier (1975), who constructed pointwise confidence intervals for survival functions from right-censored data. EL has also been used to provide confidence regions for parameters defined by estimating equations (Owen, 1988, 2001), in numerous censored and uncensored settings. EL enjoys many appealing properties: highly accurate confidence regions, self-studentization and the possibility of Bartlett correctability. There is also evidence that EL-based tests have optimal power (see, e.g., Kitamura et al., 2012). On the other hand, order restricted inference is known to be challenging for EL (see, e.g., Owen, 2001, Ch. 10), and much less has been done in this direction. El Barmi (1996) explored EL tests for order-restricted hypotheses of the form g(θ) ≤ 0, where g is some smooth function and θ is a finite-dimensional parameter specified by estimating equations (see also Yu et al., 2011). Other recent contributions in this direction have been made by Andrews and Guggenberger (2009) and Canay (2010). As for order restrictions on distribution functions, El Barmi and McKeague (2013) studied EL-based tests for stochastic ordering, while Davidov et al. (2010) investigated EL-based tests for likelihood ratio ordering under a semiparametric biased sampling model. However, these tests are limited to uncensored data.
Our contribution is to provide a class of EL-based tests for stochastic ordering for right-censored data. First consider the one-sided alternative in (2). The idea is to construct a localized EL statistic for versus at each given t. The key step in this construction is to recast the stochastic ordering constraint into an inequality involving a single Lagrange multiplier. Then the proposed test rejects H0 for large values of the maximally selected EL statistic. A maximally selected test statistic is used (as opposed to integral-type) because it is more sensitive to local differences between the survival functions. Kolmogorov–Smirnov type test statistics (not based on EL) for stochastic ordering have been proposed by El Barmi and Mukerjee (2005) and Davidov and Herman (2009). Besides localization, another possible approach might be to use the full nonparametric likelihood (Dykstra, 1982; Park et al., 2012a) and compute its ratio under S1 ≻ S2 versus S1 = S2. However, we find the localization approach to be much more tractable. The localization approach has been used in Einmahl and McKeague (2003), Davidov and Herman (2012) and El Barmi and McKeague (2013) for testing various nonparametric hypotheses, except they considered an integral type test statistic and restricted attention to uncensored data. Park et al. (2012b) proposed a localized NPMLE under stochastic ordering (for right-censored data), but its asymptotic distribution is not known, so it is unclear how a formal test could be developed using their approach.
Various ways of formulating EL in right-censored data settings have been proposed. The standard approach for censored data (Thomas and Grunkemeier, 1975; Li, 1995) maximizes the censored data likelihood subject to contraint(s) on the parameter of interest. Wang and Jing (2001) instead used the nonparametric likelihood for uncensored data and plug-in of the Kaplan–Meier (KM) estimator of the censoring distribution. We use the former approach as it is tractable and more natural in our setting. There are in fact two different versions of EL for censored data, namely the binomial and Poisson versions (see, e.g., Murphy, 1995). We utilize the binomial version.
The paper is organized as follows. In Section 2.1 we set up the general framework and notation to be used throughout the paper. While our focus is on the two-sample test in Section 2.3, for clarity of exposition the one-sample test will be introduced first (in Section 2.2). Various extensions are discussed in Section 2.4: stochastic ordering in the null hypothesis, two-sided alternatives, and crossing survival functions. Section 3 presents results from a simulation study: the proposed two-sample EL test is shown to outperform the log-rank and WKM tests under different stochastically ordered alternatives, including alternatives with crossing hazards. Application of the proposed test to the randomized clinical trial (RCT) mentioned earlier is given in Section 4, and some concluding remarks are placed in Section 5.
2. EL tests for stochastic ordering under right censorship
2.1. Preliminaries
We begin by introducing notation for the one-sample case. Let Xi and Ci for i = 1,… , n be i.i.d. from unknown survival functions S and G, respectively; only min (Xi, Ci) and I(Xi ≤ Ci) are observed. The lifetimes Xi and the censoring times Ci are assumed to be independent. Also, S is assumed to be continuous and S(0) = G(0) = 1. Order the uncensored lifetimes as 0 < T1 < … < Tm < ∞. For each Ti (i = 0,…, m), let ri be the number alive just before Ti, di be the number of deaths at Ti and hi be the hazard at Ti. Let N(t) be the number of observed lifetimes that are less than or equal to t. Then the nonparametric likelihood (depending on the unknown survival function) supported on the observed lifetimes is proportional to
| (3) |
for hi ∈ [0,1]. The NPMLE for S(t), namely the KM estimator , is asymptotically normal with variance S2(t)S2(t), where . This variance can be consistently estimated by the well-known Greenwood formula, , where .
For the two-sample case, we use a similar framework as in the one-sample setup with a further subscript j indicating the j-th sample in the corresponding notation. The nonparametric likelihood is proportional to L(S1, S2) ≡ L(S1)L(S2). Additionally, the sample proportion nj/n is assumed to converge to some pj > 0, where n = n1 + n2. The now equals the weighted average , consistently estimating .
2.2. One-sample case
Suppose we wish to compare the survival function S with a given survival function S0 for evidence of stochastic ordering. Formally, consider testing the null hypothesis H0: S = S0 versus the alternative hypothesis H1: S ≻ S0. Our procedure is to first construct the test statistic for testing the “local” hypotheses versus for a given t, and then to deal with the general hypothesis based on some functional of the local statistics.
To construct the local test statistic at time t, consider the EL ratio
where we use the conventions sup ∅ = 0 and 0/0 = 1. This follows the formulation of Thomas and Grunkemeier (1975) except with a one-sided alternative. Note that the numerator and denominator of maximize (3) over (h1,… , hm) ∈ [0,1]m subject to the constraints
| (4) |
respectively. We solve this constrained maximization problem using the Karush–Kuhn–Tucker (KKT) method (Boyd and Vandenberghe, 2004), a generalization of the Lagrange method that allows inequality constraints. As the constraints are placed only on the lifetimes up to t, the terms after t turn out to be the same in both the numerator and denominator and thus cancel out. Also, for some t the maximum is attained on the boundary of the constraint set, in which case = 1. Specifically, in Appendix A we establish the following expression for the EL ratio:
where , and the Lagrange multiplier is determined by the equality in (4) when hi is replaced with . Here we have suppressed the dependence of and on t.
Based on the above expression, we can derive large sample properties of the local EL test statistic, . This is done by approximating via a Taylor expansion as a function of the difference between log (recall from Section 2.1 that is the KM estimator) and log S0(t). We then make use of asymptotic properties of to establish the weak convergence of . The asymptotic null distribution turns out to be chi-bar square. Namely, for t such that 0 < S0(t) < 1 and G(t) > 0,
under , where Z ~ N(0,1) and Z+ = max(Z, 0). This result can be used to test the local hypotheses versus .
To test for the alternative of stochastic ordering, we propose the following maximally selected EL statistic:
| (5) |
where 0 < t1 < t2 < ∞ are to be specified. We suppress the dependence of Kn on t1 and t2. Guidance on the choice of [t1, t2] is provided later.
Our first result gives the asymptotic null distribution of Kn. The proof is omitted, because it is similar to the two-sample case (presented in Appendix B).
Theorem 1. Suppose S0 is continuous. Then under H0, for t1 and t2 satisfying S0(t1) < 1 and S0(t2)G(t2) > 0,
where B is a standard Brownian bridge on [0,1], B+ = max(B, 0), xj = b(tj) for j = 1, 2, and b(t) = σ2(t)/{1 + σ2(t)}.
To implement the test, we pre-specify one of the intervals [t1, t2] or [x1, x2] = [b(t1), b(t2)] and determine the other via b(t) or b−1(x) = inf{t : b(t) ≥ x}. However, b is unknown, so one of the two intervals has to be estimated. If we fix [t1, t2] and estimate [x1, x2] (by [, ] say), then we cannot tabulate critical values in advance, because [, ] varies across different data sets. On the other hand, pre-determining [x1, x2] allows “universal” critical values, and this is the approach we take. Both the choice of [x1, x2] and details of implementation will be provided in the next subsection.
2.2.1. Calibrating the test
This section discusses issues in calibrating the test. The first one is the choice of [x1, x2]. Secondly, having chosen [x1, x2], we explain how to estimate [t1, t2] and implement the proposed EL test. Justification for this calibration procedure will be provided for the two-sample case in Appendix C (the justification is similar for the one-sample case), where a statistic is defined for Kn with estimated [t1, t2]. Critical values for the test are then obtained via simulation in Section 3.
The choice of [x1, x2] is important because the interval width can affect power of the EL test. In a similar context, this issue has been discussed by Davidov and Herman (2009); they proposed a (non-EL-based) test of stochastic ordering for uncensored data via localization, and point out that a narrower [x1, x2] gives smaller critical values, but may fail to capture deviations (from H0) outside the interval. Our simulation study (in Section 3) shows that the choice x1 = 0.2 and x2 = 0.98 performs well in terms of balancing power and accuracy, and this is what we recommend in practice.
Having specified [x1, x2], we need to estimate [t1, t2]. Under suitable conditions on b−1, tl can be consistently estimated by for l = 1, 2, where
is a consistent estimator of b(t). We can then compute accordingly, based on the estimates and . To ensure stability of in small samples, we further modify [, ] so that values of outside the interval [T1, Tm] (recall from Section 2.1 that these are the smallest and largest observed lifetimes) are discarded. Note that this modification makes no difference asymptotically, since eventually.
2.3. Two-sample case
We now adapt our approach to the two-sample case. The “local” hypotheses are versus for given t. The local EL ratio at time t is defined to be
| (6) |
The numerator and denominator optimize L(S1)L(S2) subject to the constraints on ∏i≤N(i)(1 – hi) for each sample. As before, an explicit form of the EL ratio can be obtained via the Lagrange method (see Appendix A for more details):
| (7) |
where , and and are given in Appendix A. The local EL test statistic is shown to converge in distribution to chi-bar square under , a direct consequence of (18) in the proof of the next Theorem.
To test H0 vs. H1, we propose the maximally selected EL statistic Kn as in (5), except is now given in (7). The following result gives the asymptotic null distribution of Kn (see Appendix B for the proof).
Theorem 2. Suppose H0 holds and the common survival function S0 is continuous. For t1 and t2 satisfying S0(t1) < 1 and S0(t2)Gj(t2) > 0 for j = 1, 2,
where B is a standard Brownian bridge on [0, 1], B+ = max(B, 0), xj = b(tj) for j = 1, 2, and b(t) = σ2(t)/{1 + σ2(t)}.
As in the one-sample case, we pre-specify [x1, x2] and estimate [t1, t2] when implementing the test. Justification for this calibration procedure will be provided in Appendix C, where a statistic is defined for Kn with estimated [t1, t2]. Issues discussed in Section 2.2.1 carry over as well.
2.4. Extensions
2.4.1. Stochastically ordered null
We have developed our EL test by treating S1 = S2 as the null hypothesis. In this section we describe how our approach also applies to the (broader) null hypothesis S1 ⪯ S2 versus the alternative S1 ≻ S2. The local EL ratio in this case (in contrast to in (6)) takes the form
where now there is an inequality constraint in the numerator, and no constraint in the denominator because the union of the local null and alternative hypotheses removes any restriction on (S1(t), S2(t)) ∈ [0, 1]2. As the NPMLE is the KM estimator, if the numerator of coincides with the unconstrained maximum and thus equals the denominator. If , it can be shown that the numerator attains its maximum on the boundary S1(t) = S2(t) of the constraint set (using log-concavity of (3)). Therefore
It can then be shown that using (7) and the equivalence between and proved in Appendix A. Hence the maximally selected EL statistic coincides with the statistic Kn constructed earlier. Also, the same null distribution can be used because S1 = S2 is least favorable under S1 ⪯ S2. Moreover, the test is consistent in the “interior” of the stochastically ordered null — if S1(t) < S2(t) for all t, then Kn tends to zero in probability (since the indicator term in Lemma 3 vanishes for all t ∈ [t1, t2] with probability tending to 1).
2.4.2. Two-sided testing
The one-sided tests introduced in the previous sections have immediate extensions to two-sided versions. The two-sided alternative in (1) is the union of the two one-sided alternatives (S1, ≻ S2 or S2 ≻ S1). Based on the union-intersection principle, the test statistic is the maximum of the two one-sided test statistics. The asymptotic null distribution of this test statistic is supx∈[x1, x2][B2(x)/{x(1 – x)}], where B is a standard Brownian bridge, as in Theorem 2. The test can therefore be calibrated in much the same way as we did for the one-sided test.
2.4.3. Crossing survival functions
The possibility of crossing survival functions needs to be excluded for our (one-sided) test to be meaningful. This is because the one-sided test (asymptotically) rejects the null hypothesis if S1(t) > S2(t) for some t and S1(t′) < S2(t′) for some other t′. To address this issue, we recommend carrying out the one-sided test against the alternative in each direction (i.e., one test versus S1 ≻ S2 and a second test versus S1 ≺ S2). If both tests reject, then there is evidence of crossing survival functions, excluding stochastic ordering. If only one of the tests rejects, then the interpretation is that there is evidence of stochastic ordering in that specific direction.
A formal test for crossing survival functions (against the null S1 ⪯ S2 or S1 ⪯ S2) can be devised using the intersection-union principle, taking the minimum of the two one-sided test statistics as the test statistic. The R code (provided online) for implementing the one-sided test is readily adapted for this purpose, with critical values obtained from simulating
where B− is the negative part of the Brownian bridge B.
3. Simulation study
In this section, we report the results of a simulation study for the two-sample case. We restrict our attention to one-sided tests, but results for the two-sided tests are similar. We first tabulate selected critical values, and then compare the performance of with the (one-sided) log-rank and WKM tests in terms of accuracy and power.
3.1. Critical values and accuracy
Quantiles of the limiting distribution in Lemma 4 of Appendix C are used as critical values for . These are computed by simulation based on 100,000 replications of standard Brownian bridge over a fine grid on [0, 1] (100,000 equidistant points), for selected values of x1 and x2 (see Table 1).
Table 1:
Critical values for for selected x1, x2 and α.
| x1 | 0.1 | 0.15 | 0.2 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| x2\α | 0.01 | 0.05 | 0.1 | 0.01 | 0.05 | 0.1 | 0.01 | 0.05 | 0.1 |
| 0.975 | 11.822 | 8.255 | 6.648 | 11.672 | 8.074 | 6.489 | 11.542 | 7.953 | 6.365 |
| 0.98 | 11.912 | 8.329 | 6.720 | 11.758 | 8.159 | 6.556 | 11.619 | 8.028 | 6.442 |
| 0.985 | 11.996 | 8.415 | 6.807 | 11.851 | 8.253 | 6.658 | 11.739 | 8.131 | 6.532 |
To compute empirical significance levels, we simulate lifetimes from the piecewise exponential distribution displayed as solid line in upper left panel of Figure 1. We consider exponential censoring distribution: G1 = G2 = Exp(λ), where λ is chosen to give a censoring rate (CR) of 10% or 25%. Our one-sided EL statistic () is compared with the one-sided log-rank statistic. Another class of tests for comparison is the onesided WKM, and we follow recomendations of Pepe and Fleming (1989) and select the WKM statistic with the pooled variance estimator and the weight function denoted by in their paper.
Figure 1:
The piecewise exponential survival functions (top row) and the hazard functions (bottom row) in Model A (first column): S1 (solid) and S2 (dashed), and in Model B (second column): S1 (solid) and S2 (dashed).
Results on the size of our EL test are given in Table 2, where we use [x1, x2]=[0.2,0.98]. The test is slightly conservative in small samples but approaches the nominal level as the sample size increases. Such conservativeness has been seen in other maximal deviation-type statistics for stochastic ordering (Davidov and Herman, 2009). The empirical significance levels of the one-sided log-rank test and the WKM test under the same settings are closer to the nominal level, but sometimes on the anticonservative side.
Table 2:
Empirical significance levels based on 10,000 replications.
| CR | group size |
α = 0.05 |
α = 0.01 |
||||
|---|---|---|---|---|---|---|---|
| log-rank | WKM | log-rank | WKM | ||||
| 50 | 0.040 | 0.057 | 0.055 | 0.007 | 0.013 | 0.011 | |
| 10% | 80 | 0.041 | 0.052 | 0.054 | 0.008 | 0.010 | 0.010 |
| 200 | 0.045 | 0.051 | 0.048 | 0.009 | 0.011 | 0.011 | |
| 50 | 0.037 | 0.057 | 0.054 | 0.006 | 0.012 | 0.012 | |
| 25% | 80 | 0.041 | 0.051 | 0.056 | 0.008 | 0.009 | 0.010 |
| 200 | 0.046 | 0.054 | 0.050 | 0.010 | 0.010 | 0.011 | |
3.2. Power comparisons
In this section, we compare the small sample power of the proposed test with the onesided WKM and log-rank tests. Two models of lifetime distributions are considered, both with piecewise-constant hazards. In Model A, the hazard functions cross while the survival functions still remain stochastically ordered (see Figure 1, first column). In this case, the one-sided log-rank test can fail to detect the difference between the survival curves because it is designed to detect ordered hazards. In Model B, the two groups have different hazards initially but the same hazard later on, so the difference between the survival functions gradually wears off (see Figure 1, second column). This is a common phenomenon which is also seen in our real data example in Section 4. For both models, we consider exponential and uniform censoring distributions: G1 = G2 = Exp(λ1) or Uniform(0, c1), with λ1 or c1 chosen to give a CR of 10% or 25% for group 1.
Results are given in Table 3 for using [x1, x2]=[0.2, 0.98]. Note that outperforms the other tests in all the cases considered, especially in the crossing hazards scenario (Model A). The much lower power of WKM in Model A is surprising, because this test were shown to work well under crossing hazard alternatives in some previous simulation examples (Pepe and Fleming, 1989). The superior performance of our test may be due to two factors: first, our test is based on nonparametric likelihood, so it can be expected to be more powerful than tests that depend on an ad hoc weight function; second, we are using a maximal deviation-type statistic, rather than a weighted average, so our test may be more sensitive to local differences in the survival functions.
Table 3:
Power at α = 0:05 based on 10,000 replications. Model A: survival functions as in Figure 1, upper left panel. Model B: survival functions as in Figure 1, upper right panel.
| model | group size |
test | exp. censoring |
unif. censoring |
||
|---|---|---|---|---|---|---|
| 10% | 25% | 10% | 25% | |||
| Model A | 0.851 | 0.833 | 0.849 | 0.834 | ||
| 50 | log-rank | 0.318 | 0.379 | 0.314 | 0.373 | |
| WKM | 0.328 | 0.391 | 0.330 | 0.431 | ||
| 0.975 | 0.968 | 0.975 | 0.971 | |||
| 80 | log-rank | 0.416 | 0.503 | 0.415 | 0.501 | |
| WKM | 0.426 | 0.507 | 0.433 | 0.569 | ||
| Model B | 0.689 | 0.672 | 0.688 | 0.676 | ||
| 50 | log-rank | 0.625 | 0.659 | 0.621 | 0.650 | |
| WKM | 0.521 | 0.583 | 0.521 | 0.613 | ||
| 0.876 | 0.862 | 0.877 | 0.869 | |||
| 80 | log-rank | 0.782 | 0.815 | 0.784 | 0.812 | |
| WKM | 0.660 | 0.729 | 0.675 | 0.775 | ||
We have also investigated power under proportional hazards configurations, and our test closely matches the performance of the log-rank and WKM tests (results available upon request). These results show that for stochastically ordered alternatives, the proposed EL test can compete effectively with the log-rank and WKM tests, especially when the hazard functions cross.
Table 4 gives size and power for various choices of x1 and x2 reflecting light or heavy truncation. It is clear from the last two rows that light truncation on the left results in both poor accuracy and power compared with the top row, which corresponds to our recommendation [x1, x2] = [0.2, 0.98]. Yet the performance is not very sensitive to the choice of x2, so our preference is to choose x2 close to 1 in order to reduce truncation.
Table 4:
Size and power for various choices of x1 and x2 based on 10,000 replications, α = 0:05, n1 = n2 = 50, and exponential censoring with censoring rate 10%. Model A: survival functions as in Figure 1, upper left panel. Model B: survival functions as in Figure 1, upper right panel. For size, only the solid survival functions are used.
| x1 | x2 | critical value |
size |
power |
||
|---|---|---|---|---|---|---|
| Model A | Model B | Model A | Model B | |||
| 0.2 | 0.98 | 8.028 | 0.040 | 0.040 | 0.851 | 0.689 |
| 0.2 | 0.8 | 6.879 | 0.037 | 0.039 | 0.890 | 0.703 |
| 0.02 | 0.98 | 8.829 | 0.029 | 0.028 | 0.806 | 0.628 |
| 0.02 | 0.8 | 8.048 | 0.023 | 0.025 | 0.838 | 0.612 |
4. Application
A RCT for treatment of severe alcoholic hepatitis (Nguyen-Khac et al., 2011) is analyzed. The data are obtained by digitizing the published KM curve and reconstructing survival and censoring information using the algorithm developed by Guyot et al. (2012). The purpose of the trial was to assess whether a combination therapy of prednisolone plus N-acetylcysteine is better than prednisolone alone (the currently recommended treatment). A total of 174 patients were randomized to taking the combination (n1 = 85) or only prednisolone (n2 = 89), and the primary endpoint is their 6-month survival. The KM curve (see the top panel of Figure 2) suggests a stochastic ordering between the two groups.
Figure 2:
Estimates of survival functions (top) and cumulative hazards (bottom) for prednisolone plus N-acetylcysteine (solid line) versus prednisolone alone (dashed line).
Application of the one-sided EL test indicates that the combination therapy group has stochastically larger survival pattern than patients receiving only prednisolone (, p = 0.018). In comparison, the WKM and the one-sided log-rank tests yield p-values of 0.021 and 0.037, respectively. Examining the cumulative hazards plot (see the bottom panel of Figure 2), we can see that the slopes (i.e. hazards) of the two curves only differ noticeably during the initial 40 days. Such a scenario of an initial hazard difference has been considered in Model B of Section 3.2, where we show our EL test is better adapted to detecting a difference between the two treatment groups.
Nguyen-Khac et al. (2011) actually used the two-sided log-rank test and reported a p-value of 0.07. They concluded that the combination therapy does not improve the 6-month survival. In contrast, our two-sided EL test shows that the two treatment groups are significantly different and there is a uniformly higher survival rate in one of the groups (p = 0.036, computed by the supplementary R program that implements the two-sided EL test). In this case the EL test shows a more significant result that leads to a completely different conclusion than the log-rank test.
5. Discussion
We have developed a class of EL-based tests for both one- and two-sided stochastically ordered alternatives under right censoring. The proposed test statistic for one-sided alternatives is the maximally selected local EL statistic and is asymptotically distribution-free. The test statistic for two-sided alternatives is taken as the maximum of the two one-sided test statistics. A simulation study shows that our test can be much more powerful than the log-rank and WKM tests under alternatives with crossing hazards. We applied our test to a RCT involving patients with severe alcoholic hepatitis and found a more significant result than the log-rank and WKM tests.
Our test statistics utilize a data-dependent interval [t1, t2], much like the data-dependent weight-function used in integral-type tests based on hazard or survival functions. Due to instability in the tails (caused by right-censoring), test statistics based on right-censored data invariably require such data-dependent tuning, and this feature cannot be avoided as far as we know. We could specify t1 and t2 in advance, but that would be inadvisable because of the instability in the test statistic arising when there are too few uncensored survival times outside the interval. However, in contrast to methods that rely on the selection of a complete weight function throughout follow-up (e.g., the WKM test), it is actually much easier and more transparent to select just the two tuning parameters (x1 and x2) needed in our case. Although t1 and t2 could be specified using a data-dependent rule (such as 5% of the data in each tail), this approach would have the disadvantage of needing tailor-made critical values for each dataset.
Our test targets stochastically ordered alternatives through construction of a non-parametric likelihood ratio (EL). It can be expected to be more powerful than commonly used two-sample tests that either are not tailored for such alternatives or depend on an ad hoc weight function. Moreover, it can provide more information about the nature of the difference between S1 and S2 compared to the omnibus alternative S1 ≠ S2, in which case the functional parameters S1 and S2 may be ordered in one direction at certain time points, but ordered in the reverse direction at other time points. Our test can be used to detect crossing survival functions by applying it in each possible direction; we recommend that the test be used in this way in order to distinguish stochastic ordering from crossing survival functions.
Our central contribution is the development of the first EL-based test for ordered survival functions in right-censored data settings, and we envision the test to be useful in clinical trials, in reliability engineering, and health policy applications. It would also be of interest to extend our approach to allow the testing of stochastic ordering in k-sample censored data settings, and to explore how it could be used for other types of ordering between distributions, such as increasing convex ordering, likelihood ratio ordering and uniform stochastic ordering (or hazard rate ordering).
Acknowledgements
Computing resources for this paper came from the Extreme Science and Engineering Discovery Environment (XSEDE) supported by NSF Grant OCI-1053575. Ian McKeague was partially supported by NIH Grant R01GM095722-01 and NSF Grant DMS-1307838. The authors thank Hammou El Barmi and the referees for numerous helpful comments.
Appendix A Derivation of the local EL statistic
We derive the local EL ratio (7) for the two-sample case. The one-sample case is similar and the proof is omitted.
First, we will obtain a closed-form expression for the denominator of (6) by the KKT method. After a log transformation, the optimization problem becomes minimizing
over (h11,…, hm11, h12,…, hm22) ∈ [0, 1]m (m = m1 + m2) subject to the constraints
Since the domain [0, 1]m is convex, the objective and constraint functions are convex and differentiable, and Slater’s condition is satisfied, the KKT conditions are necessary and sufficient for optimality. More specifically, the Lagrangian is defined as a function such that
The optimal solution is denoted as (, , ), with the superscript indicating the correspondence of the denominator with H1. The dependence of the solution on t is omitted here for simplicity but will appear in the proof of Theorem 2 (see Appendix B) when the EL ratio is considered as a process indexed by t. The optimal solution must satisfy the KKT conditions:
| (8) |
| (9) |
| (10) |
| (11) |
which are known as stationarity, primal feasibility, dual feasibility, and complementary slackness, respectively. The stationarity condition yields for i – Nj(t) + 1,… , mj and
for i = 1, …, Nj(t), for each j = 1, 2. Define Dj = maxi=1, …, Nj(t) (dij–rij). Since (, ) should be in the domain [0,1]m, we have that , where Dj ≤ 0 for j = 1, 2.
The numerator of can be handled in a similar fashion. Denoting the optimal solution to the Lagrangian by (, , ), it turns out has the same form as but with replaced by , and only needs to satisfy and
| (12) |
Note that the estimated hazards after time t under no constraints, namely for v — 0, 1 and i — Nj(t)+1,…, mj, are the same in the numerator and denominator, and so these terms cancel out. This leads to
| (13) |
We next further simplify by analyzing the relationship between and , namely by showing that when and when . Defining
for j = 1, 2 and
we can see that , satisfies , and satisfies . Notice that a(λ) is strictly increasing in λ on (D1, −D2), tending to 0 and ∞ as λ ↓ D1 and ↑ −D2, respectively. Also, condition (11) implies either or
| (14) |
must hold, and since (14) is equivalent to , we obtain that is either 0 or . These observations along with (9) and (10) imply the following:
Case 1: If , then by (10) we have . Since is either 0 or , we obtain that .
Case 2: If , then by monotonicity of a(λ) we have a(0) < 1. Suppose , then a(0) ≥ 1 by (9), which contradicts a(0) < 1. So we have .
Case 3: If , then because is either 0 or , we can see that . Then from (13) we have
This is exactly (7). We use the simplified notation and to replace and , respectively.
Another version of (7) will be used in the proof of Theorem 2: replacing and in (7) by and , respectively. This version is based on the equality of the events and , which can be seen by noting that a(λ) is strictly increasing, and .
Appendix B Proof of Theorem 2
We will need the following lemma giving an asymptotic expansion of the localized EL statistic in terms of and .
Lemma 3.
where the Op term holds uniformly in t over [t1, t2].
Proof. We first find the asymptotic order of uniformly for t ∈ [t1, t2], then we derive an asymptotic expansion of uniformly for t ∇ [t1, t2]. Next, by a Taylor series expansion, we approximate as a function of . Based on the two expansions, we obtain the desired result.
First, we find the asymptotic order of the Lagrange multiplier . Since comes from the numerator of the EL ratio (6), it satisfies the equality constraint (12). McKeague and Zhao (2002) studied the same Lagrange multiplier derived from optimizing the nonparametric likelihood under an equality constraint on the ratio of two survival functions, so by their Lemma A.1,
| (15) |
uniformly for t ∈ [t1, t2].
Next we derive an asymptotic expansion of . The expansion is obtained by Taylor expanding the l.h.s. of
and then rearranging terms. In detail, the j-th term (j = 1, 2) on the l.h.s., by a similar argument in Hollander et al. (1997, p. 225), has the expansion
where Δj = 1 for j = 1 and −1 for j = 2. Combining the two terms and using nj/n → pj gives
Rearranging the terms, we have
| (16) |
Next, we find an asymptotic expansion of as a function of . We begin, based on (7), by writing as
times an indicator . The j-th term above, by a similar argument in Li (1995, p.102), has the expansion
for j = 1, 2. Using nj/n → pj, and the fact that is equivalent to , we can combine the terms for j = 1, 2 and obtain
This and (16) give the desired result. □
Remark. Lemma 3 shows that is asymptotically equivalent to squaring the positive part of a scaled difference between the log of KM estimators from the two samples. The inclusion of only the positive part of the difference can be attributed to the stochastically ordered form of our alternative hypothesis. We have compared the small sample performance of Kn and its counterpart based on this squared difference (results not shown), and it turns out the latter tends to be too conservative.
The advantage of using the EL approach, as opposed to a test statistic derived from the first term in the expansion of Lemma 3, is that we expect higher-order accuracy (cf. Hall and La Scala, 1990). This is parallel to the parametric result in which the likelihood ratio test is asymptotically equivalent to the Wald test, but the former has better higher-order accuracy (see, e.g., Mukerjee, 1994).
We now complete the proof of Theorem 2.
We first obtain the weak convergence of as a process on [t1, t2], based on Lemma 3 and large sample properties of the KM estimator. Then by a transformation of the limiting process and the continuous mapping theorem, we get the limiting distribution of Kn.
To obtain the limit process of , we begin by finding the weak convergence of , as the asymptotic expansion of in Lemma 3 suggests. For each j = 1, 2, it has been shown (see, e.g., Andersen et al., 1993, p.191 and p.263) that
as n → ∞ on D[0, t2], where Uj(t) is a Gaussian martingale with Uj(0) = 0 and . Therefore, under H0, the continuous mapping theorem implies
| (17) |
where U(t) is a Gaussian martingale with U(0) = 0 and cov(U(s), U(t)) = σ2(min(s, t)).
Next, we establish the weak convergence of . By (17) and the continuous mapping theorem, we have
in D[t1, t2], where U+ = max(U, 0). Then by the uniform consistency of with respect to σ2(t) and Slutsky’s Lemma, we have
in D[t1, t2]. This and Lemma 3 imply
| (18) |
in D[t1, t2].
Lastly, the asymptotic null distribution of Kn is obtained as follows. First notice that
are both zero mean Gaussian processes with the same covariance function, so they have the same distribution. We then have equal in distribution to
This, together with (17) and the continuous mapping theorem, implies that converges in distribution to
The result follows from noticing that the r.h.s. of the above is the same as
where x1 = b(t1) and x2 = b(t2).
Appendix C Validating the calibration procedure
The following result justifies the approach of pre-specifying [x1, x2] and estimating [t1, t2], as outlined in Section 2.2.1.
Lemma 4. Suppose S0 is continuous. Then under H0, for 0 < x1 < x2 < 1,
provided b−1(·) is continuous at x1 and x2, where is just Kn with t1 and t2 replaced by and , respectively.
Proof. The idea is to obtain the joint convergence of , and , and then to apply the continuous mapping theorem.
First, we show the weak convergence of . We will apply (18) in the proof of Theorem 2, but we need to translate the conditions to be in terms of x1 and x2 instead of t1 and t2. Given 0 < x1 < x2 < 1 at which b−1(·) is continuous, it suffices to show that t1 = b−1(x1) and t2 = b−1(x2) satisfy the conditions S0(t1) < 1 and S0(t2)Gj(t2) > 0 for j = 1, 2. To show S0(t1) < 1, we simply use b(t1) = x1 > 0, which implies σ2(t1) > 0 and thus S0(t1) < 1. To show S0(t2)Gj(t2) > 0 for j = 1, 2, we argue by contradiction. Suppose S0(t2)Gj(t2) = 0 for some j = 1, 2. Since b is continuous (by continuity of S0) and nondecreasing, we can pick an ϵ < 1 – x2 and δ small enough such that x2 ≤ b(t2 + δ) < x2 + ϵ < 1. Because b−1 is continuous at x2, there is no “flat” of b around t2, and thus δ can be chosen so that b is strictly increasing in [t2, t2 + δ]. This and S0(t2)Gj(t2) = 0 lead to b(t2 + δ) = 1, which contradicts b(t2 + δ) < x2 + ϵ < 1. So we have S0(t2)Gj(t2) > 0 for j = 1, 2, as required.
Next, we show for j = 1, 2. The proof makes use of the theory of Z-estimators (see, e.g., van der Vaart, 2000, Theorem 5.9). Let , Ψ(t) = b(t) – x1, and Θ = [τ1, τ2] such that [t1, t2] ⊂ Θ ⊂ (0, ∞). We already know Ψn(t1) = op(1) and Ψ(t1) = 0. It suffices to show that and inft:∣t–t1∣≥ϵ∣Ψ(t)∣ > 0 for all ϵ > 0. The former is implied by the uniform consistency of (and thus b), and the latter by the continuity of b−1 at x1. Therefore we have . The same argument applies to .
Lastly, the asymptotic null distribution of is obtained as follows. From the weak convergence of and for j = 1, 2, we have the joint convergence in D[t1, t2] × Θ2 (see, e.g., van der Vaart, 2000, Theorem 18.10 (v)). Then applying a similar argument in the last part of the proof for Theorem 2 and the continuous mapping theorem, we get the desired result. □
Footnotes
Supplementary material
R programs implementing the procedures developed in this article are available online.
References
- Andersen PK, Borgan Ø, Gill RD, and Keiding N (1993). Statistical Models Based on Counting Processes. New York: Springer. [Google Scholar]
- Andrews DWK and Guggenberger P (2009). Validity of subsampling and plug-in asymptotic inference for parameters defined by moment inequalities. Econometric Theory, 25:669–709. [Google Scholar]
- Boyd S and Vandenberghe L (2004). Convex Optimization. Cambridge University Press. [Google Scholar]
- Canay IA (2010). EL inference for partially identified models: Large deviations optimality and bootstrap validity. Journal of Econometrics, 156(2):408–425. [Google Scholar]
- Davidov O, Fokianos K, and Iliopoulos G (2010). Order-restricted semiparametric inference for the power bias model. Biometrics, 66(2):549–557. [DOI] [PubMed] [Google Scholar]
- Davidov O and Herman A (2009). New tests for stochastic order with application to case control studies. Journal of Statistical Planning and Inference, 139(8):2614–2623. [Google Scholar]
- Davidov O and Herman A (2012). Ordinal dominance curve based inference for stochastically ordered distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(5):825–847. [Google Scholar]
- Dykstra RL (1982). Maximum likelihood estimation of the survival functions of stochastically ordered random variables. Journal of the American Statistical Association, 77(379):621–628. [Google Scholar]
- Einmahl JHJ and McKeague IW (2003). Empirical likelihood based hypothesis testing. Bernoulli, 9(2):267–290. [Google Scholar]
- El Barmi H (1996). Empirical likelihood ratio test for or against a set of inequality constraints. Journal of Statistical Planning and Inference, 55(2):191–204. [Google Scholar]
- El Barmi H and McKeague IW (2013). Empirical likelihood based tests for stochastic ordering. Bernoulli, 19:295–307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- El Barmi H and Mukerjee H (2005). Inferences under a stochastic ordering constraint: The k-sample case. Journal of the American Statistical Association, 100(469):252–261. [Google Scholar]
- Gill RD (1980). Censoring and Stochastic Integrals. Mathematisch Centrum. [Google Scholar]
- Guyot P, Ades A, Ouwens M, and Welton N (2012). Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan–Meier survival curves. BMC Medical Research Methodology, 12(1):9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hall P and La Scala B (1990). Methodology and algorithms of empirical likelihood. International Statistical Review / Revue Internationale de Statistique, 58(2):109–127. [Google Scholar]
- Hollander M, McKeague IW, and Yang J (1997). Likelihood ratio-based confidence bands for survival functions. Journal of the American Statistical Association, 92:215–226. [Google Scholar]
- Kitamura Y, Santos A, and Shaikh AM (2012). On the asymptotic optimality of empirical likelihood for testing moment restrictions. Econometrica, 80(1):413–423. [Google Scholar]
- Li G (1995). On nonparametric likelihood ratio estimation of survival probabilities for censored data. Statistics and Probability Letters, 25:95–104. [Google Scholar]
- McKeague IW and Zhao Y (2002). Simultaneous confidence bands for ratios of survival functions via empirical likelihood. Statistics & Probability Letters, 60:405–415. [Google Scholar]
- Mukerjee R (1994). Comparison of tests in their original forms. Sankhyā: The Indian Journal of Statistics, Series A (1961-2002), 56(1):118–127. [Google Scholar]
- Murphy SA (1995). Likelihood ratio-based confidence intervals in survival analysis. Journal of the American Statistical Association, 90(432):1399–1405. [Google Scholar]
- Nguyen-Khac E, Thevenot T, Piquet M-A, Benferhat S, Goria O, Chatelain D, Tramier B, Dewaele F, Ghrib S, Rudler M, Carbonell N, Tossou H, Bental A, Bernard-Chabert B, and Dupas J-L (2011). Glucocorticoids plus n-acetylcysteine in severe alcoholic hepatitis. New England Journal of Medicine, 365(19):1781–1789. [DOI] [PubMed] [Google Scholar]
- Owen AB (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2):237–249. [Google Scholar]
- Owen AB (2001). Empirical likelihood. Chapman & Hall/CRC. [Google Scholar]
- Park Y, Kalbfleisch JD, and Taylor JMG (2012a). Constrained nonparametric maximum likelihood estimation of stochastically ordered survivor functions. Canadian Journal of Statistics, 40(1):22–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Park Y, Taylor JMG, and Kalbfleisch JD (2012b). Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions. Biometrika, 99(2):327–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pepe MS and Fleming TR (1989). Weighted Kaplan–Meier statistics: a class of distance tests for censored survival data. Biometrics, 45(2):497–507. [PubMed] [Google Scholar]
- Thomas DR and Grunkemeier GL (1975). Confidence interval estimation of survival probabilities for censored data. Journal of the American Statistical Association, 70:865–871. [Google Scholar]
- van der Vaart AW (2000). Asymptotic Statistics Cambridge Series on Statistical and Probabilistic Mathematics. Cambridge University Press. [Google Scholar]
- Wang Q-H and Jing B-Y (2001). Empirical likelihood for a class of functionals of survival distribution with censored data. Annals of the Institute of Statistical Mathematics, 53:517–527. [Google Scholar]
- Yu W, El Barmi H, and Ying Z (2011). Restricted one way analysis of variance using the empirical likelihood ratio test. Journal of Multivariate Analysis, 102(3):629–640. [Google Scholar]


