Empirical likelihood based tests for stochastic ordering under right censorship

Hsin-Wen Chang; Ian W McKeague

doi:10.1214/16-EJS1180

. Author manuscript; available in PMC: 2019 Jun 5.

Published in final edited form as: Electron J Stat. 2016 Sep 8;10(2):2511–2536. doi: 10.1214/16-EJS1180

Empirical likelihood based tests for stochastic ordering under right censorship

Hsin-Wen Chang ¹, Ian W McKeague ¹

PMCID: PMC6550486 NIHMSID: NIHMS1029588 PMID: 31178947

Abstract

This paper develops an empirical likelihood approach to testing for stochastic ordering between two univariate distributions under right censorship. The proposed test is based on a maximally selected local empirical likelihood statistic. The asymptotic null distribution is expressed in terms of a Brownian bridge. The new procedure is shown via a simulation study to have superior power to the log-rank and weighted Kaplan–Meier tests under crossing hazard alternatives. The approach is illustrated using data from a randomized clinical trial involving the treatment of severe alcoholic hepatitis.

Keywords: Crossing survival/hazard functions, Order restricted inference, Survival analysis, Two-sample problem

1. Introduction

When comparing survival patterns between two treatment groups in a randomized clinical trial (RCT), it is often of interest to examine whether there is a uniformly higher survival rate in one of the groups. For example, in a recent RCT involving patients with severe alcoholic hepatitis, the objective is to compare a combination therapy of prednisolone plus N-acetylcysteine with prednisolone alone. Testing whether the combination therapy has a consistently higher/lower survival rate (throughout the follow-up period) addresses the issue directly, as opposed to the standard practice of using an omnibus alternative (i.e., any difference between the survival functions). This paper develops such a testing procedure that allows us to establish an ordering between two survival curves uniformly over time.

We frame our approach in terms of the classical notion of stochastic ordering. Namely, a survival function S₁ is said to be stochastically larger than another survival function S₂ if S₁(t) ≥ S₂(t) for all t ≥ 0; this is denoted as S₁ ⪰ S₂. We consider the problem of testing the two-sided alternative

H_{0} : S_{1} = S_{2} versus H_{1} : S_{1} ≻ S_{2} or S_{2} ≻ S_{1}

(1)

based on right-censored random samples from each population (≻ denotes ⪰ with strict inequality for some t). Our approach will first be developed for testing the one-sided alternative

H_{0} : S_{1} = S_{2} versus H_{1} : S_{1} ≻ S_{2}

(2)

and then extended to the two-sided alternative using the union-intersection principle. Our approach also leads to a test for the null hypothesis of stochastic ordering (S₁ ⪯ S₂ or S₁ ⪰ S₂) versus the alternative of crossing survival functions.

Commonly used two-sample tests for censored data include the log-rank test and weighted Kaplan–Meier (WKM) tests (Pepe and Fleming, 1989), and these tests can be one-sided and two-sided. The log-rank test is based on an integrated weighted difference between hazard functions, and is thus designed to detect ordered hazards instead of more general stochastic ordering. Other tests based on weighted differences between hazard functions, such as the $K$ -class of weighted log-rank statistics (Gill, 1980), also share this property. The WKM class of tests targets stochastically ordered alternatives by estimating an integrated weighted difference between survival functions, but such test statistics depend on an ad hoc weight function that needs to be specified throughout follow-up.

We derive our procedure using the empirical likelihood (EL) method. EL involves forming a ratio of two nonparametric likelihoods subject to constraints on the parameters of interest. The method originates with Thomas and Grunkemeier (1975), who constructed pointwise confidence intervals for survival functions from right-censored data. EL has also been used to provide confidence regions for parameters defined by estimating equations (Owen, 1988, 2001), in numerous censored and uncensored settings. EL enjoys many appealing properties: highly accurate confidence regions, self-studentization and the possibility of Bartlett correctability. There is also evidence that EL-based tests have optimal power (see, e.g., Kitamura et al., 2012). On the other hand, order restricted inference is known to be challenging for EL (see, e.g., Owen, 2001, Ch. 10), and much less has been done in this direction. El Barmi (1996) explored EL tests for order-restricted hypotheses of the form g(θ) ≤ 0, where g is some smooth function and θ is a finite-dimensional parameter specified by estimating equations (see also Yu et al., 2011). Other recent contributions in this direction have been made by Andrews and Guggenberger (2009) and Canay (2010). As for order restrictions on distribution functions, El Barmi and McKeague (2013) studied EL-based tests for stochastic ordering, while Davidov et al. (2010) investigated EL-based tests for likelihood ratio ordering under a semiparametric biased sampling model. However, these tests are limited to uncensored data.

Our contribution is to provide a class of EL-based tests for stochastic ordering for right-censored data. First consider the one-sided alternative in (2). The idea is to construct a localized EL statistic for $H_{0}^{t} : S_{1} (t) = S_{2} (t)$ versus $H_{1}^{t} : S_{1} (t) > S_{2} (t)$ at each given t. The key step in this construction is to recast the stochastic ordering constraint into an inequality involving a single Lagrange multiplier. Then the proposed test rejects H₀ for large values of the maximally selected EL statistic. A maximally selected test statistic is used (as opposed to integral-type) because it is more sensitive to local differences between the survival functions. Kolmogorov–Smirnov type test statistics (not based on EL) for stochastic ordering have been proposed by El Barmi and Mukerjee (2005) and Davidov and Herman (2009). Besides localization, another possible approach might be to use the full nonparametric likelihood (Dykstra, 1982; Park et al., 2012a) and compute its ratio under S₁ ≻ S₂ versus S₁ = S₂. However, we find the localization approach to be much more tractable. The localization approach has been used in Einmahl and McKeague (2003), Davidov and Herman (2012) and El Barmi and McKeague (2013) for testing various nonparametric hypotheses, except they considered an integral type test statistic and restricted attention to uncensored data. Park et al. (2012b) proposed a localized NPMLE under stochastic ordering (for right-censored data), but its asymptotic distribution is not known, so it is unclear how a formal test could be developed using their approach.

Various ways of formulating EL in right-censored data settings have been proposed. The standard approach for censored data (Thomas and Grunkemeier, 1975; Li, 1995) maximizes the censored data likelihood subject to contraint(s) on the parameter of interest. Wang and Jing (2001) instead used the nonparametric likelihood for uncensored data and plug-in of the Kaplan–Meier (KM) estimator of the censoring distribution. We use the former approach as it is tractable and more natural in our setting. There are in fact two different versions of EL for censored data, namely the binomial and Poisson versions (see, e.g., Murphy, 1995). We utilize the binomial version.

The paper is organized as follows. In Section 2.1 we set up the general framework and notation to be used throughout the paper. While our focus is on the two-sample test in Section 2.3, for clarity of exposition the one-sample test will be introduced first (in Section 2.2). Various extensions are discussed in Section 2.4: stochastic ordering in the null hypothesis, two-sided alternatives, and crossing survival functions. Section 3 presents results from a simulation study: the proposed two-sample EL test is shown to outperform the log-rank and WKM tests under different stochastically ordered alternatives, including alternatives with crossing hazards. Application of the proposed test to the randomized clinical trial (RCT) mentioned earlier is given in Section 4, and some concluding remarks are placed in Section 5.

2. EL tests for stochastic ordering under right censorship

2.1. Preliminaries

We begin by introducing notation for the one-sample case. Let X_i and C_i for i = 1,… , n be i.i.d. from unknown survival functions S and G, respectively; only min (X_i, C_i) and I(X_i ≤ C_i) are observed. The lifetimes X_i and the censoring times C_i are assumed to be independent. Also, S is assumed to be continuous and S(0) = G(0) = 1. Order the uncensored lifetimes as 0 < T₁ < … < T_m < ∞. For each T_i (i = 0,…, m), let r_i be the number alive just before T_i, d_i be the number of deaths at T_i and h_i be the hazard at T_i. Let N(t) be the number of observed lifetimes that are less than or equal to t. Then the nonparametric likelihood (depending on the unknown survival function) supported on the observed lifetimes is proportional to

L (S) \equiv \prod_{i = 1}^{m} h_{i}^{d_{i}} (1 - h_{i})^{r_{i} - d_{i}}

(3)

for h_i ∈ [0,1]. The NPMLE for S(t), namely the KM estimator $\hat{S} (t) = \prod_{i \leq N (t)} (1 - d_{i} ∕ r_{i})$ , is asymptotically normal with variance S²(t)S²(t), where $σ^{2} (t) = - \int_{0}^{t} d S (u) ∕ {S (u) S (u -) G (u -)}$ . This variance can be consistently estimated by the well-known Greenwood formula, ${\hat{S}}^{2} (t) {\hat{σ}}^{2} (t) ∕ n$ , where ${\hat{σ}}^{2} (t) = n \sum_{i \leq N (t)} [d_{i} ∕ {r_{i} (r_{i} - d_{i})}]$ .

For the two-sample case, we use a similar framework as in the one-sample setup with a further subscript j indicating the j-th sample in the corresponding notation. The nonparametric likelihood is proportional to L(S₁, S₂) ≡ L(S₁)L(S₂). Additionally, the sample proportion n_j/n is assumed to converge to some p_j > 0, where n = n₁ + n₂. The ${\hat{σ}}^{2} (t)$ now equals the weighted average $n {{\hat{σ}}_{1}^{2} (t) ∕ n_{1} + {\hat{σ}}_{2}^{2} (t) ∕ n_{2}}$ , consistently estimating $σ_{1}^{2} (t) ∕ p_{1} + σ_{2}^{2} (t) ∕ p_{2}$ .

2.2. One-sample case

Suppose we wish to compare the survival function S with a given survival function S₀ for evidence of stochastic ordering. Formally, consider testing the null hypothesis H₀: S = S₀ versus the alternative hypothesis H₁: S ≻ S₀. Our procedure is to first construct the test statistic for testing the “local” hypotheses $H_{0}^{t} : S (t) = S_{0} (t)$ versus $H_{1}^{t} : S (t) > S_{0} (t)$ for a given t, and then to deal with the general hypothesis based on some functional of the local statistics.

To construct the local test statistic at time t, consider the EL ratio

R (t) = \frac{sup {L (S) : S (t) = S_{0} (t)}}{sup {L (S) : S (t) \geq S_{0} (t)}},

where we use the conventions sup ∅ = 0 and 0/0 = 1. This follows the formulation of Thomas and Grunkemeier (1975) except with a one-sided alternative. Note that the numerator and denominator of $R (t)$ maximize (3) over (h₁,… , h_m) ∈ [0,1]^m subject to the constraints

\prod_{i \leq N (t)} (1 - h_{i}) = S_{0} (t) or \geq S_{0} (t),

(4)

respectively. We solve this constrained maximization problem using the Karush–Kuhn–Tucker (KKT) method (Boyd and Vandenberghe, 2004), a generalization of the Lagrange method that allows inequality constraints. As the constraints are placed only on the lifetimes up to t, the terms after t turn out to be the same in both the numerator and denominator and thus cancel out. Also, for some t the maximum is attained on the boundary of the constraint set, in which case $R (t)$ = 1. Specifically, in Appendix A we establish the following expression for the EL ratio:

R (t) = {\begin{matrix} 1, & \hat{λ} \geq 0, \\ \prod_{i \leq N (t)} \frac{{\hat{h}}_{i}^{d_{i}} {(1 - {\hat{h}}_{i})}^{r_{i} - d_{i}}}{{\overset{‒}{h}}_{i}^{d_{i}} (1 - {\overset{‒}{h}}_{i})^{r_{i} - d_{i}}}, & \hat{λ} < 0, \end{matrix}

where ${\overset{‒}{h}}_{i} = d_{i} ∕ r_{i}, {\hat{h}}_{i} = d_{i} ∕ (r_{i} + \hat{λ})$ , and the Lagrange multiplier $\hat{λ} > D = \max_{i \leq N (t)} (d_{i} - r_{i})$ is determined by the equality in (4) when h_i is replaced with ${\hat{h}}_{i}$ . Here we have suppressed the dependence of $\hat{λ}$ and ${\hat{h}}_{i}$ on t.

Based on the above expression, we can derive large sample properties of the local EL test statistic, $- 2 \log R (t)$ . This is done by approximating $- 2 \log R (t)$ via a Taylor expansion as a function of the difference between log $\hat{S} (t)$ (recall from Section 2.1 that $\hat{S} (t)$ is the KM estimator) and log S₀(t). We then make use of asymptotic properties of $\hat{S} (t)$ to establish the weak convergence of $- 2 \log R (t)$ . The asymptotic null distribution turns out to be chi-bar square. Namely, for t such that 0 < S₀(t) < 1 and G(t) > 0,

- 2 \log R (t) \overset{d}{\to} Z_{+}^{2}

under $H_{0}^{t}$ , where Z ~ N(0,1) and Z₊ = max(Z, 0). This result can be used to test the local hypotheses $H_{0}^{t}$ versus $H_{1}^{t}$ .

To test for the alternative of stochastic ordering, we propose the following maximally selected EL statistic:

K_{n} = sup_{t \in [t_{1}, t_{2}]} {- 2 \log R (t)},

(5)

where 0 < t₁ < t₂ < ∞ are to be specified. We suppress the dependence of K_n on t₁ and t₂. Guidance on the choice of [t₁, t₂] is provided later.

Our first result gives the asymptotic null distribution of K_n. The proof is omitted, because it is similar to the two-sample case (presented in Appendix B).

Theorem 1. Suppose S₀ is continuous. Then under H₀, for t₁ and t₂ satisfying S₀(t₁) < 1 and S₀(t₂)G(t₂) > 0,

K_{n} \overset{d}{\to} sup_{x \in [x_{1}, x_{2}]} {\frac{B_{+}^{2} (x)}{x (1 - x)}},

where B is a standard Brownian bridge on [0,1], B₊ = max(B, 0), x_j = b(t_j) for j = 1, 2, and b(t) = σ²(t)/{1 + σ²(t)}.

To implement the test, we pre-specify one of the intervals [t₁, t₂] or [x₁, x₂] = [b(t₁), b(t₂)] and determine the other via b(t) or b⁻¹(x) = inf{t : b(t) ≥ x}. However, b is unknown, so one of the two intervals has to be estimated. If we fix [t₁, t₂] and estimate [x₁, x₂] (by [ ${\hat{x}}_{1}$ , ${\hat{x}}_{2}$ ] say), then we cannot tabulate critical values in advance, because [ ${\hat{x}}_{1}$ , ${\hat{x}}_{2}$ ] varies across different data sets. On the other hand, pre-determining [x₁, x₂] allows “universal” critical values, and this is the approach we take. Both the choice of [x₁, x₂] and details of implementation will be provided in the next subsection.

2.2.1. Calibrating the test

This section discusses issues in calibrating the test. The first one is the choice of [x₁, x₂]. Secondly, having chosen [x₁, x₂], we explain how to estimate [t₁, t₂] and implement the proposed EL test. Justification for this calibration procedure will be provided for the two-sample case in Appendix C (the justification is similar for the one-sample case), where a statistic $K_{n}^{*}$ is defined for K_n with estimated [t₁, t₂]. Critical values for the test are then obtained via simulation in Section 3.

The choice of [x₁, x₂] is important because the interval width can affect power of the EL test. In a similar context, this issue has been discussed by Davidov and Herman (2009); they proposed a (non-EL-based) test of stochastic ordering for uncensored data via localization, and point out that a narrower [x₁, x₂] gives smaller critical values, but may fail to capture deviations (from H₀) outside the interval. Our simulation study (in Section 3) shows that the choice x₁ = 0.2 and x₂ = 0.98 performs well in terms of balancing power and accuracy, and this is what we recommend in practice.

Having specified [x₁, x₂], we need to estimate [t₁, t₂]. Under suitable conditions on b⁻¹, t_l can be consistently estimated by ${\hat{b}}^{- 1} (x_{l}) = inf {t : \hat{b} (t) \geq x_{l}}$ for l = 1, 2, where

\hat{b} (t) = \frac{{\hat{σ}}^{2} (t)}{1 + {\hat{σ}}^{2} (t)}

is a consistent estimator of b(t). We can then compute $K_{n}^{*}$ accordingly, based on the estimates ${\hat{t}}_{1}$ and ${\hat{t}}_{2}$ . To ensure stability of $K_{n}^{*}$ in small samples, we further modify [ ${\hat{t}}_{1}$ , ${\hat{t}}_{2}$ ] so that values of $- 2 \log R (t)$ outside the interval [T₁, T_m] (recall from Section 2.1 that these are the smallest and largest observed lifetimes) are discarded. Note that this modification makes no difference asymptotically, since $[T_{1}, T_{m}] \supset [{\hat{b}}^{- 1} (x_{1}), {\hat{b}}^{- 1} (x_{2})]$ eventually.

2.3. Two-sample case

We now adapt our approach to the two-sample case. The “local” hypotheses are $H_{0}^{t} : S_{1} (t) = S_{2} (t)$ versus $H_{1}^{t} : S_{1} (t) > S_{2} (t)$ for given t. The local EL ratio at time t is defined to be

R (t) = \frac{sup {L (S_{1}, S_{2}) : S_{1} (t) = S_{2} (t)}}{sup {L (S_{1}, S_{2}) : S_{1} (t) \geq S_{2} (t)}} .

(6)

The numerator and denominator optimize L(S₁)L(S₂) subject to the constraints on ∏_i≤N(i)(1 – h_i) for each sample. As before, an explicit form of the EL ratio can be obtained via the Lagrange method (see Appendix A for more details):

R (t) = {\begin{matrix} 1, & \hat{λ} \geq 0, \\ \prod_{j = 1}^{2} \prod_{i \leq N_{j} (t)} \frac{{\hat{h}}_{i j}^{d_{i j}} {(1 - {\hat{h}}_{i j})}^{r_{i j} - d_{i j}}}{{\overset{‒}{h}}_{i j}^{d_{i j}} (1 - {\overset{‒}{h}}_{i j})^{r_{i j} - d_{i j}}}, & \hat{λ} < 0, \end{matrix}

(7)

where ${\overset{‒}{h}}_{i j} = d_{i j} ∕ r_{i j}$ , and $\hat{λ}$ and ${\hat{h}}_{i j}$ are given in Appendix A. The local EL test statistic $- 2 \log R (t)$ is shown to converge in distribution to chi-bar square under $H_{0}^{t}$ , a direct consequence of (18) in the proof of the next Theorem.

To test H₀ vs. H₁, we propose the maximally selected EL statistic K_n as in (5), except $R (t)$ is now given in (7). The following result gives the asymptotic null distribution of K_n (see Appendix B for the proof).

Theorem 2. Suppose H₀ holds and the common survival function S₀ is continuous. For t₁ and t₂ satisfying S₀(t₁) < 1 and S₀(t₂)G_j(t₂) > 0 for j = 1, 2,

K_{n} \overset{d}{\to} sup_{x \in [x_{1}, x_{2}]} {\frac{B_{+}^{2} (x)}{x (1 - x)}},

where B is a standard Brownian bridge on [0, 1], B₊ = max(B, 0), x_j = b(t_j) for j = 1, 2, and b(t) = σ²(t)/{1 + σ²(t)}.

As in the one-sample case, we pre-specify [x₁, x₂] and estimate [t₁, t₂] when implementing the test. Justification for this calibration procedure will be provided in Appendix C, where a statistic $K_{n}^{*}$ is defined for K_n with estimated [t₁, t₂]. Issues discussed in Section 2.2.1 carry over as well.

2.4. Extensions

2.4.1. Stochastically ordered null

We have developed our EL test by treating S₁ = S₂ as the null hypothesis. In this section we describe how our approach also applies to the (broader) null hypothesis S₁ ⪯ S₂ versus the alternative S₁ ≻ S₂. The local EL ratio in this case (in contrast to $R (t)$ in (6)) takes the form

R^{'} (t) = \frac{sup {L (S_{1}, S_{2}) : S_{1} (t) \leq S_{2} (t)}}{sup {L (S_{1}, S_{2})}},

where now there is an inequality constraint in the numerator, and no constraint in the denominator because the union of the local null and alternative hypotheses removes any restriction on (S₁(t), S₂(t)) ∈ [0, 1]². As the NPMLE is the KM estimator, if ${\hat{S}}_{1} (t) \leq {\hat{S}}_{2} (t)$ the numerator of $R^{'} (t)$ coincides with the unconstrained maximum and thus equals the denominator. If ${\hat{S}}_{1} (t) > {\hat{S}}_{2} (t)$ , it can be shown that the numerator attains its maximum on the boundary S₁(t) = S₂(t) of the constraint set (using log-concavity of (3)). Therefore

R^{'} (t) = {\begin{matrix} 1, & {\hat{S}}_{1} (t) \leq {\hat{S}}_{2} (t), \\ \frac{sup {L (S_{1}, S_{2}) : S_{1} (t) = S_{2} (t)}}{sup {L (S_{1}, S_{2})}}, & {\hat{S}}_{1} (t) > {\hat{S}}_{2} (t) . \end{matrix}

It can then be shown that $R^{'} (t) = R (t)$ using (7) and the equivalence between $\hat{λ} \geq 0$ and ${\hat{S}}_{1} (t) \leq {\hat{S}}_{2} (t)$ proved in Appendix A. Hence the maximally selected EL statistic coincides with the statistic K_n constructed earlier. Also, the same null distribution can be used because S₁ = S₂ is least favorable under S₁ ⪯ S₂. Moreover, the test is consistent in the “interior” of the stochastically ordered null — if S₁(t) < S₂(t) for all t, then K_n tends to zero in probability (since the indicator term in Lemma 3 vanishes for all t ∈ [t₁, t₂] with probability tending to 1).

2.4.2. Two-sided testing

The one-sided tests introduced in the previous sections have immediate extensions to two-sided versions. The two-sided alternative in (1) is the union of the two one-sided alternatives (S₁, ≻ S₂ or S₂ ≻ S₁). Based on the union-intersection principle, the test statistic is the maximum of the two one-sided test statistics. The asymptotic null distribution of this test statistic is sup_{x∈[x₁, x₂]}[B²(x)/{x(1 – x)}], where B is a standard Brownian bridge, as in Theorem 2. The test can therefore be calibrated in much the same way as we did for the one-sided test.

2.4.3. Crossing survival functions

The possibility of crossing survival functions needs to be excluded for our (one-sided) test to be meaningful. This is because the one-sided test (asymptotically) rejects the null hypothesis if S₁(t) > S₂(t) for some t and S₁(t′) < S₂(t′) for some other t′. To address this issue, we recommend carrying out the one-sided test against the alternative in each direction (i.e., one test versus S₁ ≻ S₂ and a second test versus S₁ ≺ S₂). If both tests reject, then there is evidence of crossing survival functions, excluding stochastic ordering. If only one of the tests rejects, then the interpretation is that there is evidence of stochastic ordering in that specific direction.

A formal test for crossing survival functions (against the null S₁ ⪯ S₂ or S₁ ⪯ S₂) can be devised using the intersection-union principle, taking the minimum of the two one-sided test statistics as the test statistic. The R code (provided online) for implementing the one-sided test is readily adapted for this purpose, with critical values obtained from simulating

min [sup_{x \in [x_{1}, x_{2}} {\frac{B_{-}^{2} (x)}{x (1 - x)}}, sup_{x \in [x_{1}, x_{2}} {\frac{B_{+}^{2} (x)}{x (1 - x)}}],

where B₋ is the negative part of the Brownian bridge B.

3. Simulation study

In this section, we report the results of a simulation study for the two-sample case. We restrict our attention to one-sided tests, but results for the two-sided tests are similar. We first tabulate selected critical values, and then compare the performance of $K_{n}^{*}$ with the (one-sided) log-rank and WKM tests in terms of accuracy and power.

3.1. Critical values and accuracy

Quantiles of the limiting distribution in Lemma 4 of Appendix C are used as critical values for $K_{n}^{*}$ . These are computed by simulation based on 100,000 replications of standard Brownian bridge over a fine grid on [0, 1] (100,000 equidistant points), for selected values of x₁ and x₂ (see Table 1).

Table 1:

Critical values for $K_{n}^{*}$ for selected x₁, x₂ and α.

x₁	0.1			0.15			0.2
x₂\α	0.01	0.05	0.1	0.01	0.05	0.1	0.01	0.05	0.1

0.975	11.822	8.255	6.648	11.672	8.074	6.489	11.542	7.953	6.365
0.98	11.912	8.329	6.720	11.758	8.159	6.556	11.619	8.028	6.442
0.985	11.996	8.415	6.807	11.851	8.253	6.658	11.739	8.131	6.532

Open in a new tab

To compute empirical significance levels, we simulate lifetimes from the piecewise exponential distribution displayed as solid line in upper left panel of Figure 1. We consider exponential censoring distribution: G₁ = G₂ = Exp(λ), where λ is chosen to give a censoring rate (CR) of 10% or 25%. Our one-sided EL statistic ( $K_{n}^{*}$ ) is compared with the one-sided log-rank statistic. Another class of tests for comparison is the onesided WKM, and we follow recomendations of Pepe and Fleming (1989) and select the WKM statistic with the pooled variance estimator and the weight function denoted by ${\hat{w}}_{c} (t)$ in their paper.

Figure 1: — The piecewise exponential survival functions (top row) and the hazard functions (bottom row) in Model A (first column): S₁ (solid) and S₂ (dashed), and in Model B (second column): S₁ (solid) and S₂ (dashed).

Results on the size of our EL test are given in Table 2, where we use [x₁, x₂]=[0.2,0.98]. The test is slightly conservative in small samples but approaches the nominal level as the sample size increases. Such conservativeness has been seen in other maximal deviation-type statistics for stochastic ordering (Davidov and Herman, 2009). The empirical significance levels of the one-sided log-rank test and the WKM test under the same settings are closer to the nominal level, but sometimes on the anticonservative side.

Table 2:

Empirical significance levels based on 10,000 replications.

CR	group size	α = 0.05			α = 0.01
CR	group size	$K_{n}^{*}$	log-rank	WKM	$K_{n}^{*}$	log-rank	WKM
	50	0.040	0.057	0.055	0.007	0.013	0.011
10%	80	0.041	0.052	0.054	0.008	0.010	0.010
	200	0.045	0.051	0.048	0.009	0.011	0.011
	50	0.037	0.057	0.054	0.006	0.012	0.012
25%	80	0.041	0.051	0.056	0.008	0.009	0.010
	200	0.046	0.054	0.050	0.010	0.010	0.011

Open in a new tab

3.2. Power comparisons

In this section, we compare the small sample power of the proposed test with the onesided WKM and log-rank tests. Two models of lifetime distributions are considered, both with piecewise-constant hazards. In Model A, the hazard functions cross while the survival functions still remain stochastically ordered (see Figure 1, first column). In this case, the one-sided log-rank test can fail to detect the difference between the survival curves because it is designed to detect ordered hazards. In Model B, the two groups have different hazards initially but the same hazard later on, so the difference between the survival functions gradually wears off (see Figure 1, second column). This is a common phenomenon which is also seen in our real data example in Section 4. For both models, we consider exponential and uniform censoring distributions: G₁ = G₂ = Exp(λ₁) or Uniform(0, c₁), with λ₁ or c₁ chosen to give a CR of 10% or 25% for group 1.

Results are given in Table 3 for $K_{n}^{*}$ using [x₁, x₂]=[0.2, 0.98]. Note that $K_{n}^{*}$ outperforms the other tests in all the cases considered, especially in the crossing hazards scenario (Model A). The much lower power of WKM in Model A is surprising, because this test were shown to work well under crossing hazard alternatives in some previous simulation examples (Pepe and Fleming, 1989). The superior performance of our test may be due to two factors: first, our test is based on nonparametric likelihood, so it can be expected to be more powerful than tests that depend on an ad hoc weight function; second, we are using a maximal deviation-type statistic, rather than a weighted average, so our test may be more sensitive to local differences in the survival functions.

Table 3:

Power at α = 0:05 based on 10,000 replications. Model A: survival functions as in Figure 1, upper left panel. Model B: survival functions as in Figure 1, upper right panel.

model	group size	test	exp. censoring		unif. censoring
model	group size	test	10%	25%	10%	25%
Model A		$K_{n}^{*}$	0.851	0.833	0.849	0.834
	50	log-rank	0.318	0.379	0.314	0.373
		WKM	0.328	0.391	0.330	0.431
		$K_{n}^{*}$	0.975	0.968	0.975	0.971
	80	log-rank	0.416	0.503	0.415	0.501
		WKM	0.426	0.507	0.433	0.569
Model B		$K_{n}^{*}$	0.689	0.672	0.688	0.676
	50	log-rank	0.625	0.659	0.621	0.650
		WKM	0.521	0.583	0.521	0.613
		$K_{n}^{*}$	0.876	0.862	0.877	0.869
	80	log-rank	0.782	0.815	0.784	0.812
		WKM	0.660	0.729	0.675	0.775

Open in a new tab

We have also investigated power under proportional hazards configurations, and our test closely matches the performance of the log-rank and WKM tests (results available upon request). These results show that for stochastically ordered alternatives, the proposed EL test can compete effectively with the log-rank and WKM tests, especially when the hazard functions cross.

Table 4 gives size and power for various choices of x₁ and x₂ reflecting light or heavy truncation. It is clear from the last two rows that light truncation on the left results in both poor accuracy and power compared with the top row, which corresponds to our recommendation [x₁, x₂] = [0.2, 0.98]. Yet the performance is not very sensitive to the choice of x₂, so our preference is to choose x₂ close to 1 in order to reduce truncation.

Table 4:

Size and power for various choices of x₁ and x₂ based on 10,000 replications, α = 0:05, n₁ = n₂ = 50, and exponential censoring with censoring rate 10%. Model A: survival functions as in Figure 1, upper left panel. Model B: survival functions as in Figure 1, upper right panel. For size, only the solid survival functions are used.

x₁	x₂	critical value	size		power
x₁	x₂	critical value	Model A	Model B	Model A	Model B
0.2	0.98	8.028	0.040	0.040	0.851	0.689
0.2	0.8	6.879	0.037	0.039	0.890	0.703
0.02	0.98	8.829	0.029	0.028	0.806	0.628
0.02	0.8	8.048	0.023	0.025	0.838	0.612

Open in a new tab

4. Application

A RCT for treatment of severe alcoholic hepatitis (Nguyen-Khac et al., 2011) is analyzed. The data are obtained by digitizing the published KM curve and reconstructing survival and censoring information using the algorithm developed by Guyot et al. (2012). The purpose of the trial was to assess whether a combination therapy of prednisolone plus N-acetylcysteine is better than prednisolone alone (the currently recommended treatment). A total of 174 patients were randomized to taking the combination (n₁ = 85) or only prednisolone (n₂ = 89), and the primary endpoint is their 6-month survival. The KM curve (see the top panel of Figure 2) suggests a stochastic ordering between the two groups.

Figure 2: — Estimates of survival functions (top) and cumulative hazards (bottom) for prednisolone plus N-acetylcysteine (solid line) versus prednisolone alone (dashed line).

Application of the one-sided EL test indicates that the combination therapy group has stochastically larger survival pattern than patients receiving only prednisolone ( $K_{n}^{*} = 10.36$ , p = 0.018). In comparison, the WKM and the one-sided log-rank tests yield p-values of 0.021 and 0.037, respectively. Examining the cumulative hazards plot (see the bottom panel of Figure 2), we can see that the slopes (i.e. hazards) of the two curves only differ noticeably during the initial 40 days. Such a scenario of an initial hazard difference has been considered in Model B of Section 3.2, where we show our EL test is better adapted to detecting a difference between the two treatment groups.

Nguyen-Khac et al. (2011) actually used the two-sided log-rank test and reported a p-value of 0.07. They concluded that the combination therapy does not improve the 6-month survival. In contrast, our two-sided EL test shows that the two treatment groups are significantly different and there is a uniformly higher survival rate in one of the groups (p = 0.036, computed by the supplementary R program that implements the two-sided EL test). In this case the EL test shows a more significant result that leads to a completely different conclusion than the log-rank test.

5. Discussion

We have developed a class of EL-based tests for both one- and two-sided stochastically ordered alternatives under right censoring. The proposed test statistic for one-sided alternatives is the maximally selected local EL statistic and is asymptotically distribution-free. The test statistic for two-sided alternatives is taken as the maximum of the two one-sided test statistics. A simulation study shows that our test can be much more powerful than the log-rank and WKM tests under alternatives with crossing hazards. We applied our test to a RCT involving patients with severe alcoholic hepatitis and found a more significant result than the log-rank and WKM tests.

Our test statistics utilize a data-dependent interval [t₁, t₂], much like the data-dependent weight-function used in integral-type tests based on hazard or survival functions. Due to instability in the tails (caused by right-censoring), test statistics based on right-censored data invariably require such data-dependent tuning, and this feature cannot be avoided as far as we know. We could specify t₁ and t₂ in advance, but that would be inadvisable because of the instability in the test statistic arising when there are too few uncensored survival times outside the interval. However, in contrast to methods that rely on the selection of a complete weight function throughout follow-up (e.g., the WKM test), it is actually much easier and more transparent to select just the two tuning parameters (x₁ and x₂) needed in our case. Although t₁ and t₂ could be specified using a data-dependent rule (such as 5% of the data in each tail), this approach would have the disadvantage of needing tailor-made critical values for each dataset.

Our test targets stochastically ordered alternatives through construction of a non-parametric likelihood ratio (EL). It can be expected to be more powerful than commonly used two-sample tests that either are not tailored for such alternatives or depend on an ad hoc weight function. Moreover, it can provide more information about the nature of the difference between S₁ and S₂ compared to the omnibus alternative S₁ ≠ S₂, in which case the functional parameters S₁ and S₂ may be ordered in one direction at certain time points, but ordered in the reverse direction at other time points. Our test can be used to detect crossing survival functions by applying it in each possible direction; we recommend that the test be used in this way in order to distinguish stochastic ordering from crossing survival functions.

Our central contribution is the development of the first EL-based test for ordered survival functions in right-censored data settings, and we envision the test to be useful in clinical trials, in reliability engineering, and health policy applications. It would also be of interest to extend our approach to allow the testing of stochastic ordering in k-sample censored data settings, and to explore how it could be used for other types of ordering between distributions, such as increasing convex ordering, likelihood ratio ordering and uniform stochastic ordering (or hazard rate ordering).

Acknowledgements

Computing resources for this paper came from the Extreme Science and Engineering Discovery Environment (XSEDE) supported by NSF Grant OCI-1053575. Ian McKeague was partially supported by NIH Grant R01GM095722-01 and NSF Grant DMS-1307838. The authors thank Hammou El Barmi and the referees for numerous helpful comments.

Appendix A Derivation of the local EL statistic

We derive the local EL ratio (7) for the two-sample case. The one-sample case is similar and the proof is omitted.

First, we will obtain a closed-form expression for the denominator of (6) by the KKT method. After a log transformation, the optimization problem becomes minimizing

- \sum_{j = 1}^{2} \sum_{i = 1}^{m_{j}} {d_{i j} (\log h_{i j}) + (r_{i j} - d_{i j}) \log (1 - h_{i j})}

over (h₁₁,…, h_m₁1, h₁₂,…, h_m₂2) ∈ [0, 1]^m (m = m₁ + m₂) subject to the constraints

\sum_{i \leq N_{2} (t)} \log (1 - h_{i 2}) - \sum_{i \leq N_{1} (t)} \log (1 - h_{i 1}) \leq 0 .

Since the domain [0, 1]^m is convex, the objective and constraint functions are convex and differentiable, and Slater’s condition is satisfied, the KKT conditions are necessary and sufficient for optimality. More specifically, the Lagrangian is defined as a function $L : [0, 1]^{m} \times R \to R$ such that

L (h_{1}, \dots, h_{m}, λ) \equiv - \sum_{j = 1}^{2} \sum_{i = 1}^{m_{j}} {d_{i j} (\log h_{i j}) + (r_{i j} - d_{i j}) \log (1 - h_{i j})} + λ {\sum_{i \leq N_{2} (t)} \log (1 - h_{i 2}) - \sum_{i \leq N_{1} (t)} \log (1 - h_{i 1})} .

The optimal solution is denoted as ( ${\hat{h}}_{11}^{1}, \dots, {\hat{h}}_{m_{1} 1}^{1}$ , ${\hat{h}}_{12}^{1}, \dots, {\hat{h}}_{m_{2} 2}^{1}$ , ${\hat{λ}}^{1}$ ), with the superscript indicating the correspondence of the denominator with H₁. The dependence of the solution on t is omitted here for simplicity but will appear in the proof of Theorem 2 (see Appendix B) when the EL ratio is considered as a process indexed by t. The optimal solution must satisfy the KKT conditions:

\nabla_{h} L ({\hat{h}}_{11}, \dots, {\hat{h}}_{m_{1} 1}, {\hat{h}}_{12}, \dots, {\hat{h}}_{m_{2} 2}, {\hat{λ}}^{1}) = 0,

(8)

\prod_{i \leq N_{1} (t)} (1 - {\hat{h}}_{i 1}^{1}) \geq \prod_{i \leq N_{2} (t)} (1 - {\hat{h}}_{i 2}^{1}),

(9)

{\hat{λ}}^{1} \geq 0,

(10)

{\hat{λ}}^{1} {\sum_{i \leq N_{2} (t)} \log (1 - {\hat{h}}_{i 2}^{1}) - \sum_{i \leq N_{1} (t)} \log (1 - {\hat{h}}_{i 1}^{1})} = 0,

(11)

which are known as stationarity, primal feasibility, dual feasibility, and complementary slackness, respectively. The stationarity condition yields ${\hat{h}}_{i j}^{1} = d_{i j} ∕ r_{i j}$ for i – N_j(t) + 1,… , m_j and

{\hat{h}}_{i j}^{1} = \frac{d_{i j}}{r_{i j} + (- 1)^{j - 1} {\hat{λ}}^{1}}

for i = 1, …, N_j(t), for each j = 1, 2. Define D_j = max_{i=1, …, N_j(_t)} (d_ij–r_ij). Since ( ${\hat{h}}_{11}^{1}, \dots, {\hat{h}}_{m_{1} 1}^{1}$ , ${\hat{h}}_{12}^{1}, \dots, {\hat{h}}_{m_{2} 2}^{1}$ ) should be in the domain [0,1]^m, we have that $D_{1} \leq {\hat{λ}}^{1} \leq - D_{2}$ , where D_j ≤ 0 for j = 1, 2.

The numerator of $R (t)$ can be handled in a similar fashion. Denoting the optimal solution to the Lagrangian by ( ${\hat{h}}_{11}^{0}, \dots, {\hat{h}}_{m_{1} 1}^{0}$ , ${\hat{h}}_{12}^{0}, \dots, {\hat{h}}_{m_{2} 2}^{0}$ , ${\hat{λ}}^{0}$ ), it turns out ${\hat{h}}_{i j}^{0}$ has the same form as ${\hat{h}}_{i, j}^{1}$ but with ${\hat{λ}}^{1}$ replaced by ${\hat{λ}}^{0}$ , and ${\hat{λ}}^{0}$ only needs to satisfy $D_{1} \leq {\hat{λ}}^{0} \leq - D_{2}$ and

\prod_{i \leq N_{1} (t)} (1 - {\hat{h}}_{i 1}^{0}) = \prod_{i \leq N_{2} (t)} (1 - {\hat{h}}_{i 2}^{0}) .

(12)

Note that the estimated hazards after time t under no constraints, namely ${\hat{h}}_{i j}^{v}$ for v — 0, 1 and i — N_j(t)+1,…, m_j, are the same in the numerator and denominator, and so these terms cancel out. This leads to

R (t) = \prod_{j = 1}^{2} \prod_{i \leq N_{j} (t)} \frac{{({\hat{h}}_{i j}^{0})}^{d_{i j}} {(1 - {\hat{h}}_{i j}^{0})}^{r_{i j} - d_{i j}}}{{({\hat{h}}_{i j}^{1})}^{d_{i j}} {(1 - {\hat{h}}_{i j}^{1})}^{r_{i j} - d_{i j}}} .

(13)

We next further simplify $R (t)$ by analyzing the relationship between ${\hat{λ}}^{0}$ and ${\hat{λ}}^{1}$ , namely by showing that ${\hat{λ}}^{1} = 0$ when ${\hat{λ}}^{0} < 0$ and ${\hat{λ}}^{1} = {\hat{λ}}^{0}$ when ${\hat{λ}}^{0} \geq 0$ . Defining

a_{j} (λ) \equiv \prod_{i \leq N_{j} (t)} {1 - \frac{d_{i j}}{r_{i j} + (- 1)^{j - 1} λ}}

for j = 1, 2 and

a (λ) \equiv \frac{a_{1} (λ)}{a_{2} (λ)},

we can see that $a_{j} (0) = {\hat{S}}_{j} (t)$ , ${\hat{λ}}^{0}$ satisfies $a ({\hat{λ}}^{0}) = 1$ , and ${\hat{λ}}^{1}$ satisfies $a ({\hat{λ}}^{1}) \geq 1$ . Notice that a(λ) is strictly increasing in λ on (D₁, −D₂), tending to 0 and ∞ as λ ↓ D₁ and ↑ −D₂, respectively. Also, condition (11) implies either ${\hat{λ}}^{1} = 0$ or

\sum_{i \leq N_{2} (t)} \log (1 - h_{i 2}) - \sum_{i \leq N_{1} (t)} \log (1 - h_{i 1}) = 0

(14)

must hold, and since (14) is equivalent to ${\hat{λ}}^{1} = {\hat{λ}}^{0}$ , we obtain that ${\hat{λ}}^{1}$ is either 0 or ${\hat{λ}}^{0}$ . These observations along with (9) and (10) imply the following:

Case 1: If ${\hat{λ}}^{0} < 0$ , then by (10) we have ${\hat{λ}}^{1} \neq {\hat{λ}}^{0}$ . Since ${\hat{λ}}^{1}$ is either 0 or ${\hat{λ}}^{0}$ , we obtain that ${\hat{λ}}^{1} = 0$ .

Case 2: If ${\hat{λ}}^{0} > 0$ , then by monotonicity of a(λ) we have a(0) < 1. Suppose ${\hat{λ}}^{1} = 0$ , then a(0) ≥ 1 by (9), which contradicts a(0) < 1. So we have ${\hat{λ}}^{1} = {\hat{λ}}^{0}$ .

Case 3: If ${\hat{λ}}^{0} = 0$ , then because ${\hat{λ}}^{1}$ is either 0 or ${\hat{λ}}^{0}$ , we can see that ${\hat{λ}}^{1} = {\hat{λ}}^{0} = 0$ . Then from (13) we have

R (t) = {\begin{matrix} 1, & {\hat{λ}}^{0} \geq 0, \\ \prod_{j = 1}^{2} \prod_{i \leq N_{j} (t)} \frac{{({\hat{h}}_{i j}^{0})}^{d_{i j}} {(1 - {\hat{h}}_{i j}^{0})}^{r_{i j} - d_{i j}}}{{(\frac{d_{i j}}{r_{i j}})}^{d_{i j}} {(1 - \frac{d_{i j}}{r_{i j}})}^{r_{i j} - d_{i j}}}, & {\hat{λ}}^{0} < 0 . \end{matrix}

This is exactly (7). We use the simplified notation ${\hat{h}}_{i j}$ and $\hat{λ}$ to replace ${\hat{h}}_{i j}^{0}$ and ${\hat{λ}}^{0}$ , respectively.

Another version of (7) will be used in the proof of Theorem 2: replacing ${\hat{λ}}^{0} \geq 0$ and ${\hat{λ}}^{0} < 0$ in (7) by ${\hat{S}}_{1} (t) \leq {\hat{S}}_{2} (t)$ and ${\hat{S}}_{1} (t) > {\hat{S}}_{2} (t)$ , respectively. This version is based on the equality of the events ${\hat{λ}}^{0} < 0$ and ${\hat{S}}_{1} (t) > {\hat{S}}_{2} (t)$ , which can be seen by noting that a(λ) is strictly increasing, $a ({\hat{λ}}^{0}) = 1$ and $a (0) = {\hat{S}}_{1} (t) ∕ {\hat{S}}_{2} (t)$ .

Appendix B Proof of Theorem 2

We will need the following lemma giving an asymptotic expansion of the localized EL statistic in terms of ${\hat{S}}_{1} (t)$ and ${\hat{S}}_{2} (t)$ .

Lemma 3.

- 2 \log R (t) = \frac{n}{{\hat{σ}}^{2} (t)} {\log {\hat{S}}_{1} (t) - \log {\hat{S}}_{2} (t)}^{2} I {{\hat{S}}_{1} (t) > {\hat{S}}_{2} (t)} + O_{p} (n^{- 1 ∕ 2}),

where the O_p term holds uniformly in t over [t₁, t₂].

Proof. We first find the asymptotic order of $\hat{λ} (t)$ uniformly for t ∈ [t₁, t₂], then we derive an asymptotic expansion of $\hat{λ} (t)$ uniformly for t ∇ [t₁, t₂]. Next, by a Taylor series expansion, we approximate $- 2 \log R (t)$ as a function of $\hat{λ} (t)$ . Based on the two expansions, we obtain the desired result.

First, we find the asymptotic order of the Lagrange multiplier $\hat{λ} (t)$ . Since $\hat{λ} (t)$ comes from the numerator of the EL ratio (6), it satisfies the equality constraint (12). McKeague and Zhao (2002) studied the same Lagrange multiplier derived from optimizing the nonparametric likelihood under an equality constraint on the ratio of two survival functions, so by their Lemma A.1,

\hat{λ} (t) = O_{p} (\sqrt{n})

(15)

uniformly for t ∈ [t₁, t₂].

Next we derive an asymptotic expansion of $\hat{λ} (t)$ . The expansion is obtained by Taylor expanding the l.h.s. of

\sum_{i \leq N_{1} (t)} \log {1 - \frac{d_{i 1}}{r_{i 1} + \hat{λ} (t)}} - \sum_{i \leq N_{2} (t)} \log {1 - \frac{d_{i 2}}{r_{i 2} - \hat{λ} (t)}} = 0

and then rearranging terms. In detail, the j-th term (j = 1, 2) on the l.h.s., by a similar argument in Hollander et al. (1997, p. 225), has the expansion

\log {\hat{S}}_{j} (t) + Δ_{j} \hat{λ} (t) \frac{{\hat{σ}}_{j}^{2} (t)}{n_{j}} + O_{p} (n_{j}^{- 1}),

where Δ_j = 1 for j = 1 and −1 for j = 2. Combining the two terms and using n_j/n → p_j gives

\log {\hat{S}}_{1} (t) - \log {\hat{S}}_{2} (t) + \hat{λ} (t) \frac{{\hat{σ}}^{2} (t)}{n} + O_{p} (n^{- 1}) = 0 .

Rearranging the terms, we have

\hat{λ} (t) = - \frac{n}{{\hat{σ}}^{2} (t)} {\log {\hat{S}}_{1} (t) - \log {\hat{S}}_{2} (t) + O_{p} (n^{- 1})} .

(16)

Next, we find an asymptotic expansion of $- 2 \log R (t)$ as a function of $\hat{λ} (t)$ . We begin, based on (7), by writing $- 2 \log R (t)$ as

- 2 \sum_{j = 1}^{2} \sum_{i \leq N_{j} (t)} [(r_{i j} - d_{i j}) \log {1 + \frac{Δ_{j} \hat{λ} (t)}{r_{i j} - d_{i j}}}] + 2 \sum_{j = 1}^{2} \sum_{i \leq N_{j} (t)} [r_{i j} \log {1 + \frac{Δ_{j} \hat{λ} (t)}{r_{i j}}}]

times an indicator $I (\hat{λ} (t) < 0)$ . The j-th term above, by a similar argument in Li (1995, p.102), has the expansion

{\hat{λ}}^{2} (t) \sum_{i \leq N_{j} (t)} \frac{d_{i j}}{r_{i j} (r_{i j} - d_{i j})} + O_{p} (n_{j}^{- 1 ∕ 2})

for j = 1, 2. Using n_j/n → p_j, and the fact that $\hat{λ} (t) < 0$ is equivalent to ${\hat{S}}_{1} (t) > {\hat{S}}_{2} (t)$ , we can combine the terms for j = 1, 2 and obtain

- 2 \log R (t) = {{\hat{σ}}^{2} (t) \frac{{\hat{λ}}^{2} (t)}{n} + O_{p} (n^{- 1 ∕ 2})} I {{\hat{S}}_{1} (t) > {\hat{S}}_{2} (t)} .

This and (16) give the desired result. □

Remark. Lemma 3 shows that $- 2 \log R (t)$ is asymptotically equivalent to squaring the positive part of a scaled difference between the log of KM estimators from the two samples. The inclusion of only the positive part of the difference can be attributed to the stochastically ordered form of our alternative hypothesis. We have compared the small sample performance of K_n and its counterpart based on this squared difference (results not shown), and it turns out the latter tends to be too conservative.

The advantage of using the EL approach, as opposed to a test statistic derived from the first term in the expansion of Lemma 3, is that we expect higher-order accuracy (cf. Hall and La Scala, 1990). This is parallel to the parametric result in which the likelihood ratio test is asymptotically equivalent to the Wald test, but the former has better higher-order accuracy (see, e.g., Mukerjee, 1994).

We now complete the proof of Theorem 2.

We first obtain the weak convergence of $- 2 \log R (t)$ as a process on [t₁, t₂], based on Lemma 3 and large sample properties of the KM estimator. Then by a transformation of the limiting process and the continuous mapping theorem, we get the limiting distribution of K_n.

To obtain the limit process of $- 2 \log R (t)$ , we begin by finding the weak convergence of ${\hat{S}}_{1} - \log {\hat{S}}_{2}$ , as the asymptotic expansion of $- 2 \log R (t)$ in Lemma 3 suggests. For each j = 1, 2, it has been shown (see, e.g., Andersen et al., 1993, p.191 and p.263) that

\sqrt{n_{j}} (\log {\hat{S}}_{j} - \log S_{j}) \overset{d}{\to} U_{j}

as n → ∞ on D[0, t₂], where U_j(t) is a Gaussian martingale with U_j(0) = 0 and $cov (U_{j} (s), U_{j} (t)) = σ_{j}^{2} (min (s, t))$ . Therefore, under H₀, the continuous mapping theorem implies

\sqrt{n} (\log {\hat{S}}_{1} - \log {\hat{S}}_{2}) \overset{d}{\to} \frac{U_{1}}{\sqrt{p_{1}}} - \frac{U_{2}}{\sqrt{p_{2}}} \equiv U,

(17)

where U(t) is a Gaussian martingale with U(0) = 0 and cov(U(s), U(t)) = σ²(min(s, t)).

Next, we establish the weak convergence of $- 2 \log R (t)$ . By (17) and the continuous mapping theorem, we have

n {\log {\hat{S}}_{1} (t) - \log {\hat{S}}_{2} (t)}^{2} I {{\hat{S}}_{1} (t) > {\hat{S}}_{2} (t)} \overset{d}{\to} U_{+}^{2} (t)

in D[t₁, t₂], where U₊ = max(U, 0). Then by the uniform consistency of ${\hat{σ}}^{2} (t)$ with respect to σ²(t) and Slutsky’s Lemma, we have

\frac{n}{{\hat{σ}}^{2} (t)} {\log {\hat{S}}_{1} (t) - \log {\hat{S}}_{2} (t)}^{2} I {{\hat{S}}_{1} (t) > {\hat{S}}_{2} (t)} \overset{d}{\to} \frac{U_{+}^{2} (t)}{σ^{2} (t)}

in D[t₁, t₂]. This and Lemma 3 imply

- 2 \log R (t) \overset{d}{\to} \frac{U_{+}^{2} (t)}{σ^{2} (t)}

(18)

in D[t₁, t₂].

Lastly, the asymptotic null distribution of Kn is obtained as follows. First notice that

\frac{U (t)}{1 + σ^{2} (t)} and B (\frac{σ^{2} (t)}{1 + σ^{2} (t)})

are both zero mean Gaussian processes with the same covariance function, so they have the same distribution. We then have $U_{+}^{2} (t) ∕ σ^{2} (t)$ equal in distribution to

B_{+}^{2} (\frac{σ^{2} (t)}{1 + σ^{2} (t)}) \frac{(1 + σ^{2} (t))^{2}}{σ^{2} (t)} .

This, together with (17) and the continuous mapping theorem, implies that ${- 2 \log R (t)}$ converges in distribution to

sup_{t \in [t_{1}, t_{2}]} {B_{+}^{2} (\frac{σ^{2} (t)}{1 + σ^{2} (t)}) \frac{(1 + σ^{2} (t))^{2}}{σ^{2} (t)}} .

The result follows from noticing that the r.h.s. of the above is the same as

sup_{x \in [x_{1}, x_{2}]} {\frac{B_{+}^{2} (x)}{x (1 - x)}},

where x₁ = b(t₁) and x₂ = b(t₂).

Appendix C Validating the calibration procedure

The following result justifies the approach of pre-specifying [x₁, x₂] and estimating [t₁, t₂], as outlined in Section 2.2.1.

Lemma 4. Suppose S₀ is continuous. Then under H₀, for 0 < x₁ < x₂ < 1,

K_{n}^{*} \overset{d}{\to} sup_{x \in [x_{1}, x_{2}]} {\frac{B_{+}^{2} (x)}{x (1 - x)}},

provided b⁻¹(·) is continuous at x₁ and x₂, where $K_{n}^{*}$ is just K_n with t₁ and t₂ replaced by ${\hat{t}}_{1} = \max {{\hat{b}}^{- 1} (x_{1}), T_{11}, T_{12}}$ and ${\hat{t}}_{2} = \min {{\hat{b}}^{- 1} (x_{2}), T_{m_{1} 1}, T_{m_{2} 2}}$ , respectively.

Proof. The idea is to obtain the joint convergence of $- 2 \log R (t)$ , ${\hat{t}}_{1}$ and ${\hat{t}}_{2}$ , and then to apply the continuous mapping theorem.

First, we show the weak convergence of $- 2 \log R (t)$ . We will apply (18) in the proof of Theorem 2, but we need to translate the conditions to be in terms of x₁ and x₂ instead of t₁ and t₂. Given 0 < x₁ < x₂ < 1 at which b⁻¹(·) is continuous, it suffices to show that t₁ = b⁻¹(x₁) and t₂ = b⁻¹(x₂) satisfy the conditions S₀(t₁) < 1 and S₀(t₂)G_j(t₂) > 0 for j = 1, 2. To show S₀(t₁) < 1, we simply use b(t₁) = x₁ > 0, which implies σ²(t₁) > 0 and thus S₀(t₁) < 1. To show S₀(t₂)G_j(t₂) > 0 for j = 1, 2, we argue by contradiction. Suppose S₀(t₂)G_j(t₂) = 0 for some j = 1, 2. Since b is continuous (by continuity of S₀) and nondecreasing, we can pick an ϵ < 1 – x₂ and δ small enough such that x₂ ≤ b(t₂ + δ) < x₂ + ϵ < 1. Because b⁻¹ is continuous at x₂, there is no “flat” of b around t₂, and thus δ can be chosen so that b is strictly increasing in [t₂, t₂ + δ]. This and S₀(t₂)G_j(t₂) = 0 lead to b(t₂ + δ) = 1, which contradicts b(t₂ + δ) < x₂ + ϵ < 1. So we have S₀(t₂)G_j(t₂) > 0 for j = 1, 2, as required.

Next, we show ${\hat{t}}_{j} \overset{P}{\to} t_{j}$ for j = 1, 2. The proof makes use of the theory of Z-estimators (see, e.g., van der Vaart, 2000, Theorem 5.9). Let $Ψ_{n} (t) = \hat{b} (t) - x_{1}$ , Ψ(t) = b(t) – x₁, and Θ = [τ₁, τ₂] such that [t₁, t₂] ⊂ Θ ⊂ (0, ∞). We already know Ψ_n(t₁) = o_p(1) and Ψ(t₁) = 0. It suffices to show that ${sup}_{t \in Θ} ∣ Ψ_{n} (t) - Ψ (t) ∣ \overset{P}{\to} 0$ and inf_{t:∣t–t₁∣≥ϵ}∣Ψ(t)∣ > 0 for all ϵ > 0. The former is implied by the uniform consistency of ${\hat{σ}}^{2}$ (and thus b), and the latter by the continuity of b⁻¹ at x₁. Therefore we have ${\hat{t}}_{1} \overset{P}{\to} t_{1}$ . The same argument applies to ${\hat{t}}_{2}$ .

Lastly, the asymptotic null distribution of $K_{n}^{*}$ is obtained as follows. From the weak convergence of $- 2 \log R (t)$ and ${\hat{t}}_{j} \overset{P}{\to} t_{j}$ for j = 1, 2, we have the joint convergence $[- 2 \log R (t), {\hat{t}}_{1}, {\hat{t}}_{2}]^{T} \overset{d}{\to} [U_{+}^{2} (t) ∕ σ^{2} (t), t_{1}, t_{2}]^{T}$ in D[t₁, t₂] × Θ² (see, e.g., van der Vaart, 2000, Theorem 18.10 (v)). Then applying a similar argument in the last part of the proof for Theorem 2 and the continuous mapping theorem, we get the desired result. □

Footnotes

Supplementary material

R programs implementing the procedures developed in this article are available online.

References

Andersen PK, Borgan Ø, Gill RD, and Keiding N (1993). Statistical Models Based on Counting Processes. New York: Springer. [Google Scholar]
Andrews DWK and Guggenberger P (2009). Validity of subsampling and plug-in asymptotic inference for parameters defined by moment inequalities. Econometric Theory, 25:669–709. [Google Scholar]
Boyd S and Vandenberghe L (2004). Convex Optimization. Cambridge University Press. [Google Scholar]
Canay IA (2010). EL inference for partially identified models: Large deviations optimality and bootstrap validity. Journal of Econometrics, 156(2):408–425. [Google Scholar]
Davidov O, Fokianos K, and Iliopoulos G (2010). Order-restricted semiparametric inference for the power bias model. Biometrics, 66(2):549–557. [DOI] [PubMed] [Google Scholar]
Davidov O and Herman A (2009). New tests for stochastic order with application to case control studies. Journal of Statistical Planning and Inference, 139(8):2614–2623. [Google Scholar]
Davidov O and Herman A (2012). Ordinal dominance curve based inference for stochastically ordered distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(5):825–847. [Google Scholar]
Dykstra RL (1982). Maximum likelihood estimation of the survival functions of stochastically ordered random variables. Journal of the American Statistical Association, 77(379):621–628. [Google Scholar]
Einmahl JHJ and McKeague IW (2003). Empirical likelihood based hypothesis testing. Bernoulli, 9(2):267–290. [Google Scholar]
El Barmi H (1996). Empirical likelihood ratio test for or against a set of inequality constraints. Journal of Statistical Planning and Inference, 55(2):191–204. [Google Scholar]
El Barmi H and McKeague IW (2013). Empirical likelihood based tests for stochastic ordering. Bernoulli, 19:295–307. [DOI] [PMC free article] [PubMed] [Google Scholar]
El Barmi H and Mukerjee H (2005). Inferences under a stochastic ordering constraint: The k-sample case. Journal of the American Statistical Association, 100(469):252–261. [Google Scholar]
Gill RD (1980). Censoring and Stochastic Integrals. Mathematisch Centrum. [Google Scholar]
Guyot P, Ades A, Ouwens M, and Welton N (2012). Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan–Meier survival curves. BMC Medical Research Methodology, 12(1):9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hall P and La Scala B (1990). Methodology and algorithms of empirical likelihood. International Statistical Review / Revue Internationale de Statistique, 58(2):109–127. [Google Scholar]
Hollander M, McKeague IW, and Yang J (1997). Likelihood ratio-based confidence bands for survival functions. Journal of the American Statistical Association, 92:215–226. [Google Scholar]
Kitamura Y, Santos A, and Shaikh AM (2012). On the asymptotic optimality of empirical likelihood for testing moment restrictions. Econometrica, 80(1):413–423. [Google Scholar]
Li G (1995). On nonparametric likelihood ratio estimation of survival probabilities for censored data. Statistics and Probability Letters, 25:95–104. [Google Scholar]
McKeague IW and Zhao Y (2002). Simultaneous confidence bands for ratios of survival functions via empirical likelihood. Statistics & Probability Letters, 60:405–415. [Google Scholar]
Mukerjee R (1994). Comparison of tests in their original forms. Sankhyā: The Indian Journal of Statistics, Series A (1961-2002), 56(1):118–127. [Google Scholar]
Murphy SA (1995). Likelihood ratio-based confidence intervals in survival analysis. Journal of the American Statistical Association, 90(432):1399–1405. [Google Scholar]
Nguyen-Khac E, Thevenot T, Piquet M-A, Benferhat S, Goria O, Chatelain D, Tramier B, Dewaele F, Ghrib S, Rudler M, Carbonell N, Tossou H, Bental A, Bernard-Chabert B, and Dupas J-L (2011). Glucocorticoids plus n-acetylcysteine in severe alcoholic hepatitis. New England Journal of Medicine, 365(19):1781–1789. [DOI] [PubMed] [Google Scholar]
Owen AB (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2):237–249. [Google Scholar]
Owen AB (2001). Empirical likelihood. Chapman & Hall/CRC. [Google Scholar]
Park Y, Kalbfleisch JD, and Taylor JMG (2012a). Constrained nonparametric maximum likelihood estimation of stochastically ordered survivor functions. Canadian Journal of Statistics, 40(1):22–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
Park Y, Taylor JMG, and Kalbfleisch JD (2012b). Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions. Biometrika, 99(2):327–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pepe MS and Fleming TR (1989). Weighted Kaplan–Meier statistics: a class of distance tests for censored survival data. Biometrics, 45(2):497–507. [PubMed] [Google Scholar]
Thomas DR and Grunkemeier GL (1975). Confidence interval estimation of survival probabilities for censored data. Journal of the American Statistical Association, 70:865–871. [Google Scholar]
van der Vaart AW (2000). Asymptotic Statistics Cambridge Series on Statistical and Probabilistic Mathematics. Cambridge University Press. [Google Scholar]
Wang Q-H and Jing B-Y (2001). Empirical likelihood for a class of functionals of survival distribution with censored data. Annals of the Institute of Statistical Mathematics, 53:517–527. [Google Scholar]
Yu W, El Barmi H, and Ying Z (2011). Restricted one way analysis of variance using the empirical likelihood ratio test. Journal of Multivariate Analysis, 102(3):629–640. [Google Scholar]

[R1] Andersen PK, Borgan Ø, Gill RD, and Keiding N (1993). Statistical Models Based on Counting Processes. New York: Springer. [Google Scholar]

[R2] Andrews DWK and Guggenberger P (2009). Validity of subsampling and plug-in asymptotic inference for parameters defined by moment inequalities. Econometric Theory, 25:669–709. [Google Scholar]

[R3] Boyd S and Vandenberghe L (2004). Convex Optimization. Cambridge University Press. [Google Scholar]

[R4] Canay IA (2010). EL inference for partially identified models: Large deviations optimality and bootstrap validity. Journal of Econometrics, 156(2):408–425. [Google Scholar]

[R5] Davidov O, Fokianos K, and Iliopoulos G (2010). Order-restricted semiparametric inference for the power bias model. Biometrics, 66(2):549–557. [DOI] [PubMed] [Google Scholar]

[R6] Davidov O and Herman A (2009). New tests for stochastic order with application to case control studies. Journal of Statistical Planning and Inference, 139(8):2614–2623. [Google Scholar]

[R7] Davidov O and Herman A (2012). Ordinal dominance curve based inference for stochastically ordered distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(5):825–847. [Google Scholar]

[R8] Dykstra RL (1982). Maximum likelihood estimation of the survival functions of stochastically ordered random variables. Journal of the American Statistical Association, 77(379):621–628. [Google Scholar]

[R9] Einmahl JHJ and McKeague IW (2003). Empirical likelihood based hypothesis testing. Bernoulli, 9(2):267–290. [Google Scholar]

[R10] El Barmi H (1996). Empirical likelihood ratio test for or against a set of inequality constraints. Journal of Statistical Planning and Inference, 55(2):191–204. [Google Scholar]

[R11] El Barmi H and McKeague IW (2013). Empirical likelihood based tests for stochastic ordering. Bernoulli, 19:295–307. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] El Barmi H and Mukerjee H (2005). Inferences under a stochastic ordering constraint: The k-sample case. Journal of the American Statistical Association, 100(469):252–261. [Google Scholar]

[R13] Gill RD (1980). Censoring and Stochastic Integrals. Mathematisch Centrum. [Google Scholar]

[R14] Guyot P, Ades A, Ouwens M, and Welton N (2012). Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan–Meier survival curves. BMC Medical Research Methodology, 12(1):9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Hall P and La Scala B (1990). Methodology and algorithms of empirical likelihood. International Statistical Review / Revue Internationale de Statistique, 58(2):109–127. [Google Scholar]

[R16] Hollander M, McKeague IW, and Yang J (1997). Likelihood ratio-based confidence bands for survival functions. Journal of the American Statistical Association, 92:215–226. [Google Scholar]

[R17] Kitamura Y, Santos A, and Shaikh AM (2012). On the asymptotic optimality of empirical likelihood for testing moment restrictions. Econometrica, 80(1):413–423. [Google Scholar]

[R18] Li G (1995). On nonparametric likelihood ratio estimation of survival probabilities for censored data. Statistics and Probability Letters, 25:95–104. [Google Scholar]

[R19] McKeague IW and Zhao Y (2002). Simultaneous confidence bands for ratios of survival functions via empirical likelihood. Statistics & Probability Letters, 60:405–415. [Google Scholar]

[R20] Mukerjee R (1994). Comparison of tests in their original forms. Sankhyā: The Indian Journal of Statistics, Series A (1961-2002), 56(1):118–127. [Google Scholar]

[R21] Murphy SA (1995). Likelihood ratio-based confidence intervals in survival analysis. Journal of the American Statistical Association, 90(432):1399–1405. [Google Scholar]

[R22] Nguyen-Khac E, Thevenot T, Piquet M-A, Benferhat S, Goria O, Chatelain D, Tramier B, Dewaele F, Ghrib S, Rudler M, Carbonell N, Tossou H, Bental A, Bernard-Chabert B, and Dupas J-L (2011). Glucocorticoids plus n-acetylcysteine in severe alcoholic hepatitis. New England Journal of Medicine, 365(19):1781–1789. [DOI] [PubMed] [Google Scholar]

[R23] Owen AB (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2):237–249. [Google Scholar]

[R24] Owen AB (2001). Empirical likelihood. Chapman & Hall/CRC. [Google Scholar]

[R25] Park Y, Kalbfleisch JD, and Taylor JMG (2012a). Constrained nonparametric maximum likelihood estimation of stochastically ordered survivor functions. Canadian Journal of Statistics, 40(1):22–39. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Park Y, Taylor JMG, and Kalbfleisch JD (2012b). Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions. Biometrika, 99(2):327–343. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Pepe MS and Fleming TR (1989). Weighted Kaplan–Meier statistics: a class of distance tests for censored survival data. Biometrics, 45(2):497–507. [PubMed] [Google Scholar]

[R28] Thomas DR and Grunkemeier GL (1975). Confidence interval estimation of survival probabilities for censored data. Journal of the American Statistical Association, 70:865–871. [Google Scholar]

[R29] van der Vaart AW (2000). Asymptotic Statistics Cambridge Series on Statistical and Probabilistic Mathematics. Cambridge University Press. [Google Scholar]

[R30] Wang Q-H and Jing B-Y (2001). Empirical likelihood for a class of functionals of survival distribution with censored data. Annals of the Institute of Statistical Mathematics, 53:517–527. [Google Scholar]

[R31] Yu W, El Barmi H, and Ying Z (2011). Restricted one way analysis of variance using the empirical likelihood ratio test. Journal of Multivariate Analysis, 102(3):629–640. [Google Scholar]

PERMALINK

Empirical likelihood based tests for stochastic ordering under right censorship

Hsin-Wen Chang

Ian W McKeague

Abstract

1. Introduction

2. EL tests for stochastic ordering under right censorship

2.1. Preliminaries

2.2. One-sample case

2.2.1. Calibrating the test

2.3. Two-sample case

2.4. Extensions

2.4.1. Stochastically ordered null

2.4.2. Two-sided testing

2.4.3. Crossing survival functions

3. Simulation study

3.1. Critical values and accuracy

Table 1:

Figure 1:

Table 2:

3.2. Power comparisons

Table 3:

Table 4:

4. Application

Figure 2:

5. Discussion

Acknowledgements

Appendix A Derivation of the local EL statistic

Appendix B Proof of Theorem 2

Appendix C Validating the calibration procedure

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Empirical likelihood based tests for stochastic ordering under right censorship

Hsin-Wen Chang

Ian W McKeague

Abstract

1. Introduction

2. EL tests for stochastic ordering under right censorship

2.1. Preliminaries

2.2. One-sample case

2.2.1. Calibrating the test

2.3. Two-sample case

2.4. Extensions

2.4.1. Stochastically ordered null

2.4.2. Two-sided testing

2.4.3. Crossing survival functions

3. Simulation study

3.1. Critical values and accuracy

Table 1:

Figure 1:

Table 2:

3.2. Power comparisons

Table 3:

Table 4:

4. Application

Figure 2:

5. Discussion

Acknowledgements

Appendix A Derivation of the local EL statistic

Appendix B Proof of Theorem 2

Appendix C Validating the calibration procedure

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases