Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions

Yongseok Park; Jeremy M G Taylor; John D Kalbfleisch

doi:10.1093/biomet/ass006

. 2012 Jun;99(2):327–343. doi: 10.1093/biomet/ass006

Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions

Yongseok Park ¹, Jeremy M G Taylor ², John D Kalbfleisch ³

PMCID: PMC3635706 PMID: 23843661

Abstract

In this paper, we consider estimation of survivor functions from groups of observations with right-censored data when the groups are subject to a stochastic ordering constraint. Many methods and algorithms have been proposed to estimate distribution functions under such restrictions, but none have completely satisfactory properties when the observations are censored. We propose a pointwise constrained nonparametric maximum likelihood estimator, which is defined at each time t by the estimates of the survivor functions subject to constraints applied at time t only. We also propose an efficient method to obtain the estimator. The estimator of each constrained survivor function is shown to be nonincreasing in t, and its consistency and asymptotic distribution are established. A simulation study suggests better small and large sample properties than for alternative estimators. An example using prostate cancer data illustrates the method.

Keywords: Censored data, Constrained nonparametric maximum likelihood estimator, Kaplan–Meier estimator, Maximum likelihood estimator, Order restriction

1. Introduction

Stochastic ordering is an important concept and has a wide range of applications, in such fields as biomedical research, economics and system reliability. We often encounter situations where there is prior knowledge of stochastic ordering among distributions. For example, in a cancer study, we expect patients with a lower stage of cancer at diagnosis to have lower death rates at all times than those with a higher stage. In addition to the natural desire for estimators of the distributions to satisfy the same expected ordering restrictions as the underlying distributions, there is the potential for improved efficiency by applying the constraints in the estimation method.

For random variables T₁ and T₂ with corresponding survivor functions S₁(t) and S₂(t), T₁ is stochastically larger than T₂, T₁ ⩾_st T₂, if S₁(t) ⩾ S₂(t) for all t (Lehmann, 1955). For G groups, the concept can be generalized to partial ordering; specifically, we say that T_g (g = 1, …, G) satisfy the partial-ordering constraints defined by the constraint set E ⊂ {1, …, G}² if for any (i, j) ∈ E, T_i ⩾_st T_j. Special cases of this are simple ordering, in which T₁ ⩾_st ⋯ ⩾_st T_G, for which E = {(1, 2), (2, 3), …, (G − 1, G)}; tree ordering, in which T₁ ⩾_st T₂, T₁ ⩾_st T₃, …, T₁ ⩾_st T_G for which E = {(1, 2), (1, 3), …, (1, G)}; umbrella ordering, in which T₁ ⩾_st ⋯ ⩾_st T_i ⩾_st T_i+1 ⩾_st ⋯ ⩾_st T_G for which E = {(1, 2), (2, 3), …, (i − 1, i), (i + 1, i), (i + 2, i + 1), …, (G, G − 1)}, and factorial ordering such as T₁ ⩾_st T₂ ⩾_st T₄, T₁ ⩾_st T₃ ⩾_st T₄, for which E = {(1, 2), (2, 4), (1, 3), (3, 4)}.

We consider independent right-censored samples of the form (Y_gi, Δ_gi) (g = 1, …, G; i =1, …, n_g), where Y_gi is the observed time and Δ_gi is the event indicator. We assume that the censoring mechanism is independent, so that the generalized likelihood is

L {S_{1} (\cdot), \dots, S_{G} (\cdot)} = \prod_{g = 1}^{G} \prod_{i = 1}^{n_{g}} {S_{g} (Y_{g i} -) - S_{g} (Y_{g i})}^{Δ_{g i}} S_{g} {(Y_{g i})}^{1 - Δ_{g i}} .

(1)

The E-constrained nonparametric maximum likelihood estimator maximizes (1) subject to the partial-ordering constraint E. Brunk et al. (1966) studied the constrained nonparametric maximum likelihood estimator in the two-sample case without censoring. Dykstra (1982), as corrected by Park et al. (2012), extended this result to right-censored data. In the case of three or more populations with general partial-ordering constraints, Hoff (2003) and Lim et al. (2009) proposed different computational methods for obtaining the constrained nonparametric maximum likelihood estimator.

This estimator has the undesirable property that a violation of a constraint in the Kaplan–Meier estimators (Kaplan & Meier, 1958) at an earlier time affects the estimator at a later time, even if there is no violation at this later time. A number of authors have noted that the constrained nonparametric maximum likelihood estimator can have relatively large pointwise bias and mean squared error at a fixed t and have suggested alternatives (Rojo & Ma, 1996; Rojo, 2004; El Barmi & Mukerjee, 2005) that can have better mean squared error properties. Park et al. (2012) noted a correction to the constrained nonparametric maximum likelihood estimator presented by Dykstra (1982), which led to improved properties, but this corrected estimator still often has poorer pointwise properties than other estimators, some of which are relatively simple to define. In the two-sample problem, Lo (1987) suggested swapping the Kaplan–Meier estimates of the survivor functions when the constraint is violated. Rojo (2004) proposed estimating both survivor functions as the weighted average of the two Kaplan–Meier estimators at times when the constraint is violated, where the weights are based on the initial sample sizes. El Barmi & Mukerjee (2005) extended Rojo’s estimators to the simple ordering situation using isotonic regression. The simulation study in Park et al. (2012) shows that some of these estimators have smaller mean squared error than the constrained nonparametric maximum likelihood estimator when the censoring distributions are equal, but when the censoring distributions differ substantially between groups, the alternative estimators may have larger mean squared error than the constrained nonparametric maximum likelihood estimator. Moreover, these alternative estimators have not been explicitly extended to a general partial-ordering case.

When we consider finite sample properties of an estimator Ŝ(t), we typically consider point-wise criteria, such as pointwise bias or pointwise mean squared error at each fixed t. In contrast to pointwise estimators such as described in Rojo (2004) and Lo (1987), the constrained non-parametric maximum likelihood estimator estimates the whole survival curve. So it is perhaps not surprising that Rojo’s estimator typically has better properties when evaluated using metrics such as pointwise mean squared error. On the other hand, these pointwise estimators do not adapt well to unequal censoring distributions between groups, whereas the constrained nonparametric maximum likelihood estimator does. This motivated us to propose a new constrained estimator, a pointwise constrained nonparametric maximum likelihood estimator or pointwise constrained estimator for convenience.

Definition 1 (Pointwise constrained estimator). For each specified time x, let S̃_g(t; x) be the maximum likelihood estimator of S_g(t) under the constraint S_i (x) ⩾ S_j (x) for all (i, j) ∈ E. Then Ŝ_g(t) = S̃_g(t; t) (g = 1, …, G) for all t is the pointwise constrained estimator of the survivor function S_g under the partial stochastic ordering constraint E.

2. Estimation methods

2.1. Notation and likelihood

To obtain the pointwise constrained estimator as given in Definition 1, it is required to maximize the likelihood (1) subject to the constraints S_i (x) ⩾ S_j (x) for all (i, j) ∈ E for a fixed time x. This will give the estimates of S̃₁(t; x), …, S̃_G(t; x) and the constrained maximization will be repeated for all times x of interest.

Let X_gj (j = 1, …, m_g) be the distinct event times in group g and define X_g0 = 0 and X_{g(m_g+ 1)} = ∞ (g = 1, …, G). Let N_g(t) be the number at risk at time t in group g and let M_g(t) be the number of distinct events in (0, t] in group g. Let d_gj and n_gj be, respectively, the number of events and the number at risk in group g at time X_gj.

It is convenient to redefine the problem in terms of hazards. Let h_g(t) = log{S_g(t)/S_g(t−)}, so that 1 − exp{h_g(t)} is the discrete hazard in group g at time t. The loglikelihood of (1) is

\begin{array}{l} log L (h_{1}, \dots, h_{G}) = \sum_{g = 1}^{G} {\sum_{i = 1}^{m_{g}} (d_{g i} log [1 - exp {h_{g} (X_{g i})}] \\ + (n_{g i} - d_{g i}) h_{g} (X_{g i})) + N_{g} (x) h_{g}^{δ} (x)}, \end{array}

(2)

where h_g = {h_g(X_g1), …, h_g(X_{gm_g}), $h_{g}^{δ}$ (x)} (g = 1, …, G). The corresponding constraints are $\sum_{j = 1}^{M_{p} (x)} h_{p} (X_{p j}) + h_{p}^{δ} (x) ⩾ \sum_{j = 1}^{M_{r} (x)} h_{r} (X_{r j}) + h_{r}^{δ} (x)$ , for all (p, r) ∈ E, and $h_{g}^{δ}$ (x) ⩽ 0. In this, $h_{g}^{δ}$ (x) = I (x ≠ X_{gM_g(x)})h_g(x), which is included to account for the fact that if x = X_{gM_g(x)}, we do not have the extra term N_g(x)h_g(x) in the loglikelihood (2).

2.2. Linearly constrained convex minimization

There is a large literature on general approaches to linearly constrained convex minimization problems. There are essentially three types of algorithms: interior point, primal active set and dual active set methods. In general, our data contain many more observed event times than groups. Interior point and primal active set methods simultaneously optimize over the large number of quantities $h_{g}^{δ}$ (x) and h_g(X_gi) (g = 1, …, G; i = 1, …, m_g) at each time x of interest, and so are not computationally efficient in our setting. Dual active set methods may involve many fewer parameters, but the dual function itself is difficult to express as a function of Lagrange multipliers and the feasible range of these multipliers is difficult to specify in our problem. So the dual active set method is also difficult to implement in our context.

In § 2.4, we transform the problem of maximizing the loglikelihood (2) subject to the linear constraints to another simple concave maximization problem subject to linear constraints by using the profile likelihood. In preparation for this, we first discuss the constrained maximum likelihood estimator of the survivor function in the one-sample case.

2.3. Maximum likelihood estimator of the survivor function subject to a single constraint

In the one-sample case without constraints, the maximum likelihood estimator has probability mass only at the observed event times. The loglikelihood analogous to (2) is

log L (h) = \sum_{j = 1}^{m} [d_{j} log {1 - exp h (X_{j})} + (n_{j} - d_{j}) h (X_{j})],

(3)

where h = {h(X₁), …, h(X_m)} and (3) is maximized at h(X_j) = log(1 − d_j/n_j) (j = 1, …, m), which corresponds to the Kaplan–Meier estimator.

Consider now the maximum likelihood estimator subject to the constraint S(x) = exp(q). The maximum likelihood estimator of the survivor function will have positive probability mass at event times X_i and nonnegative probability mass at time x. The optimization problem is to maximize the loglikelihood of h = {h(X₁), …, h(X_m), h^δ(x)},

log L (h) = \sum_{i = 1}^{m} [d_{i} log {1 - exp h (X_{i})} + (n_{i} - d_{i}) h (X_{i})] + N (x) h^{δ} (x),

subject to $\sum_{j = 1}^{M (x)} h (X_{j}) + h^{δ} (x) = q$ and h^δ(x) ⩽ 0.

Let K (q; x) = −N(x) if M(x) = 0, and otherwise let K (q; x) = max(−N(x), k̂), where k̂ is the unique solution of the equation $\sum_{j = 1}^{M (x)} log {1 - d_{j} / (n_{j} + k)} = q$ . Here, k̂ = ∞ if q = 0 and k̂ = d_M(x) − n_M(x) if q = −∞. Let ${\hat{h}}^{δ} (q; x) = q - \sum_{j = 1}^{M (x)} \hat{h} (q; X_{j})$ , where

\hat{h} (q; X_{i}) = {\begin{array}{l} log {1 - \frac{d_{i}}{n_{i} + K (q; x)}}, & i ⩽ M (x), \\ log (1 - \frac{d_{i}}{n_{i}}), & i > M (x) . \end{array}

(4)

Theorem 1. The maximum likelihood estimator of S(t) subject to constraint S(x) = exp(q) at a given x is Ŝ(t) = exp {∑_{X_j⩽t} ĥ(q; X_j) + I (t ⩾ x)h^δ (q; x) } (t ⩽ τ), where τ is the last observed time.

Proof. See the Appendix.

Thomas & Grunkemeier (1975) and Li (1995) considered the maximization problem described above. However, Thomas & Grunkemeier (1975) solved the problem with the equality constraint $\sum_{j = 1}^{M (x)} h (X_{j}) = q$ , which implicitly assumes that ĥ(x) = 0 if x is not an observed event time, whereas Li (1995) mistakenly proved that ĥ(x) = 0 unless x is an observed event time. In fact, the maximization problem described above involves two constraints: $\sum_{j = 1}^{M (x)} h (X_{j}) + h^{δ} (x) = q$ and h^δ (x) ⩽ 0. It is possible that ĥ^δ (x) < 0 if K (q; x) = −N(x). The inequality constraint, h^δ (x) ⩽ 0, has been neglected in these approaches. It is necessary, however, to apply the Karush–Kuhn–Tucker conditions (Kuhn & Tucker, 1951) to all possible inequality constraints, including the bounds on the parameters, and only omit the redundant constraints.

2.4. Reformulation of the problem using profile likelihood

The profile loglikelihood of S(x) = exp(q) at a given x is

\begin{array}{l} ℓ (q; x) & = sup_{h \in R} log L (h) \\ = \sum_{i = 1}^{m} {d_{i} log [1 - exp {\hat{h} (q; X_{i})}] + (n_{i} - d_{i}) \hat{h} (q; X_{i})} + N (x) {\hat{h}}^{δ} (q; x), \end{array}

(5)

where $R = {h : \sum_{i = 1}^{M (x)} h (X_{i}) + h^{δ} (x) = q}$ , and ĥ(q; X_i) and ĥ^δ(q; x) are defined in (4).

Lemma 1. The derivative of the profile loglikelihood (5) with respect to q is − K (q; x).

Proof. See the Appendix.

For given x, maximizing the loglikelihood (2) subject to the constraints in E can be redefined as maximizing the profile loglikelihood ℓ (q₁, …, q_G; x), which equals

\begin{array}{l} \sum_{g = 1}^{G} ℓ_{g} (q_{g}; x) = & \sum_{g = 1}^{G} (\sum_{i = 1}^{M_{g} (x)} [(n_{g i} - d_{g i}) log {n_{g i} + K_{g} (q_{g}; x) - d_{g i}} \\ - n_{g i} log {n_{g i} + K_{g} (q_{g}; x)}] + I {K_{g} (q_{g}; x) = N_{g} (x)} N_{g} (x) \\ \times [q_{g} - \sum_{j = 1}^{M_{g} (x)} log {1 - \frac{d_{g i}}{n_{g i} + K_{g} (q_{g}; x)}}]), \end{array}

(6)

subject to constraints q_i ⩾ q_j, for all (i, j) ∈ E and q_g ⩽ 0 (g = 1, …, G). In this formulation, only G parameters q = (q₁, …, q_G) need to be estimated, and Ŝ_g(x) = exp(q̂_g), where q̂ = (q̂₁, …, q̂_G) is the maximum likelihood estimator of q.

Any of the general methods described in § 2.2 can be used to maximize the profile loglikelihood (6) under the corresponding linear constraints. The profile loglikelihood (6) and its derivative, dℓ (q; x)/dq^T = {−K₁(q₁; x), …, −K_G(q_G; x)}^T, are easily calculated.

To obtain the pointwise constrained estimator Ŝ_g(t) (g = 1, …, G) for all t, it is not necessary to maximize the profile likelihood at every t. It can be seen that the pointwise constrained estimator may jump only at observed event times and at times just after observed censoring times. Let {X′_j} be the union of all distinct times Y_gi if Δ_gi > 0 and $Y_{g i}^{+}$ if Δ_gi = 0. Here, $Y_{g i}^{+}$ can be taken as Y_gi + ∊ for a small ∊ > 0. We calculate Ŝ_g(X′_j), and then Ŝ_g(t) is a step function with jumps only at X′_j, i.e., Ŝ_g(t) = Ŝ_g(a), where a = max{X′_j : X′_j ⩽ t}.

The following theorem shows that Ŝ_g(t) (g = 1, …, G) is a valid survivor function.

Theorem 2. The pointwise constrained estimator Ŝ_g(t) obtained from maximizing the profile likelihood (6) is a nonincreasing function in t for each g = 1, …, G. That is, for any 0 ⩽ x < y ⩽ τ_g, Ŝ_g(x) ⩾ Ŝ_g(y).

Proof. See the Supplementary Material.

2.5. Generalized pool-adjacent-violators algorithm in the simple ordering case

Suppose that G survivor functions satisfy the simple stochastic ordering constraint T₁ ⩾_st ⋯ ⩾_st T_G, and we aim to estimate the pointwise constrained estimator at time x. A generalized pool-adjacent-violators algorithm can be used as developed in Best et al. (1999), because the profile loglikelihood is the sum of concave functions. The results of the generalized pool-adjacent-violators algorithm lead to a set of blocks, B₁, …, B_r, where r ⩾ 1, B_r = {u_r−1 + 1, …, u_r} and 0 = u₀ < ⋯ < u_r = G. This is described in an algorithm in the Supplementary Material. The final estimate of the survivor function for each group in a block B_r takes a common value exp(q̂_r), where q̂_r maximizes the profile loglikelihood, ℓ_{B_r} (q; x) = ∑_{i∈B_r} ℓ_i (q; x) and 0 ⩾ q̂₁ > ⋯ > q̂_r.

3. Consistency and asymptotic distribution

Let $S_{g}^{*}$ (t) be the Kaplan–Meier estimator of S_g(t) and let $S_{g}^{c}$ (t) be the censoring survivor function for group g. Further, let τ_g = inf{t : S_g(t) $S_{g}^{c}$ (t) = 0} (g = 1, …, G). Under the condition that there are no common jumps between the event and censoring distributions, Stute & Wang (1993) showed that the Kaplan–Meier estimator $S_{g}^{*}$ (t) is uniformly consistent for S_g(t) on [0, τ_g). A similar result holds for the pointwise constrained estimator. The following theorem is proved in the Supplementary Material.

Theorem 3. Let Ŝ_g(t) be the pointwise constrained estimator given in Definition 1. Under the condition of no common jumps of S_g(t) and $S_{g}^{c}$ (t), sup_{t<τ_g} | Ŝ_g(t) − S_g(t) | → 0 with probability 1 as n_g → ∞ (g = 1, …, G).

Let W_g(V_g) be a Brownian motion on [0, ∞) with variance function V_g(t). As shown in Gill (1983), n_g^1/2( $S_{g}^{*}$ − S_g) $S_{g}^{- 1}$ → W(V_g) in distribution on [0, τ_g] as n_g → ∞ where $V_{g} (t) = - \int_{0}^{t} {S_{g}^{2} (x -) S_{g}^{c} (x -)}^{- 1} d S_{g} (x)$ . For a fixed time x, n_g^1/2{ $S_{g}^{*}$ (x) − S_g(x)} → N{0, $σ_{g}^{2}$ (x)} in distribution, where $σ_{g}^{2}$ (x) = V_g(x) $S_{g}^{2}$ (x).

Let $n = \sum_{g = 1}^{G} n_{g}$ and assume that lim_n→∞ n_g/n = c_g > 0 and let $Z_{g}^{*}$ (x) = n^1/2 { $S_{g}^{*}$ (x) − S_g(x)} (g = 1, …, G). Then { $Z_{1}^{*}$ (x), …, $Z_{G}^{*}$ (x)}^T → {Z₁(x), …, Z_G(x)}^T in distribution, where Z_g(x) ∼ N{0, $σ_{g}^{2}$ (x)/c_g} and Z₁(x), …, Z_G(x) are independent.

Theorem 4. For a fixed time x < min{τ_k : L_g(x) ⩽ k ⩽ U_g(x)} and under the simple ordering constraint T₁ ⩾_st ⋯ ⩾_st T_G,

n_{g}^{1 / 2} {{\hat{S}}_{g} (x) - S_{g} (x)} \to c_{g}^{1 / 2} min_{L_{g} (x) ⩽ ℓ ⩽ g} max_{g ⩽ u ⩽ U_{g} (x)} \frac{\sum_{k = ℓ}^{u} {Z_{k} (x) w_{k} (x)}}{\sum_{k = ℓ}^{u} w_{k} (x)}

(7)

in distribution, where w_g(x) = c_g/ $σ_{g}^{2}$ (x), L_g(x) = min{i : S_i(x) = S_g(x)} and U_g(x) = max{i : S_i(x) = S_g(x)}.

Proof. See the Supplementary Material.

In the Supplementary Material, the asymptotic distribution of the pointwise constrained estimator is discussed for situations where the number at risk in some groups is zero.

Let Š_g(x) be the estimate of S_g(x) by applying the isotonic regression algorithm to $S_{g}^{*}$ (x) with weights w_g(x) (g = 1, …, G), subject to constraint S₁(x) ⩾_st ⋯ ⩾_st S_G(x). Then, Š_g(x) has a minimax form (Barlow et al., 1972)

{\hat{S}}_{g} (x) = min_{1 ⩽ ℓ ⩽ g} max_{g ⩽ u ⩽ G} \frac{\sum_{k = ℓ}^{u} {S_{k}^{*} (x) w_{k} (x)}}{\sum_{k = ℓ}^{u} w_{k} (x)} .

From El Barmi & Mukerjee (2005, Theorem 2), it can be seen that

n_{g}^{1 / 2} {{\hat{S}}_{g} (x) - S_{g} (x)} \to c_{g}^{1 / 2} min_{L_{g} (x) ⩽ ℓ ⩽ g} max_{g ⩽ u ⩽ U_{g} (x)} \frac{\sum_{k = ℓ}^{u} {Z_{k} (x) w_{k} (x)}}{\sum_{k = ℓ}^{u} w_{k} (x)}

in distribution. From (7), it follows that Ŝ_g(x) and Š_g(x) are asymptotically equivalent. We hypothesize that this equivalence to isotonic regression will also hold under the partial-ordering constraint. This yields the following conjecture for the asymptotic distribution of the pointwise constrained estimator.

Conjecture 1. For a fixed time x,

n_{g}^{1 / 2} {{\hat{S}}_{g} (x) - S_{g} (x)} \to c_{g}^{1 / 2} f_{g} {Z_{1} (x), \dots, Z_{G} (x); \frac{c_{1}}{σ_{1}^{2}}, \dots, \frac{c_{G}}{σ_{G}^{2}}, x}

in distribution as n → ∞ for all x given S_g(x) $S_{g}^{c}$ (x) > 0. Here Ψ_g(x) = {i : S_i(x) = S_g(x)}, E_g(x) = {(i, j) ∈ E : i, j ∈ Ψ_g(x)} and f_g(z₁, …, z_G; w₁, …, w_G, x) is the solution function for μ_g that minimizes $\sum_{i = 1}^{G} w_{i} {(z_{i} - μ_{i})}^{2}$ subject to μ_i ⩾ μ_j for all (i, j) ∈ E_g(x).

If this conjecture is correct, inference methods developed for isotonic regression could also be useful for the pointwise constrained estimator.

4. Comparison with the Kaplan–Meier estimator when sample size is large

4.1. Simple ordering case

In the simple ordering case with no censoring, El Barmi & Mukerjee (2005) showed that their isotonic regression estimator has smaller asymptotic mean squared error than the unrestricted Kaplan–Meier estimator. A similar result holds for the pointwise constrained estimator compared with the Kaplan–Meier estimator when there is right censoring.

Theorem 5. Consider the simple ordering constraint T₁ ⩾_st ⋯ ⩾_st T_G. For a fixed x with $S_{k}^{c}$ (x)S_k(x) > 0 for all k = 1, …, G, let $n_{k}^{1 / 2}$ {Ŝ_k(x) − S_k(x)} → Ẑ_k and $n_{k}^{1 / 2}$ { $S_{k}^{*}$ (x) − S_k(x)} → Z_k in distribution. If there exists at least one g′ with S_g′(x) = S_g(x), then E( ${\hat{Z}}_{g}^{2}$ ) < E( $Z_{g}^{2}$ ). If no such g′ exists, then Ŝ_g(x) and $S_{g}^{*}$ (x) are asymptotically equivalent.

Thus, the pointwise constrained estimator has smaller asymptotic mean squared error than the Kaplan–Meier estimator. In fact, a stronger inequality relation holds. Namely pr(| Ẑ_g | ⩽ ∊) > pr(| Z_g | ⩽ ∊) for all ∊ > 0. In § 4.2, we calculate the asymptotic bias and asymptotic mean squared error of the pointwise constrained estimator in the two-sample case.

4.2. The two sample case, G = 2

If S₁(x) > S₂(x), then, asymptotically, the constraint is irrelevant and $n_{1}^{1 / 2}$ {Ŝ₁(x) − S₁(x)} → σ₁(x)Z̄₁ and $n_{2}^{1 / 2}$ {Ŝ₂(x) − S₂(x)} → σ₂(x) Z̄₂ in distribution, as n₁, n₂ → ∞, where Z̄₁ and Z̄₂ are independent standard normal random variables.

Let n₂/n₁ → c as n₁, n₂ → ∞. We consider asymptotic properties when S₁(x) = S₂(x). From Theorem 4, we can show that

\begin{array}{l} n_{1}^{1 / 2} {{\hat{S}}_{1} (x) - S_{1} (x)} \to σ_{1} (x) max {{\bar{Z}}_{1}, \frac{{\bar{Z}}_{1} + c {(x)}^{1 / 2} {\bar{Z}}_{2}}{1 + c (x)}}, \\ n_{2}^{1 / 2} {{\hat{S}}_{2} (x) - S_{2} (x)} \to σ_{2} (x) min {{\bar{Z}}_{2}, \frac{c (x) {\bar{Z}}_{2} + c {(x)}^{1 / 2} {\bar{Z}}_{1}}{1 + c (x)}}, \end{array}

(8)

in distribution, where c(x) = c $σ_{1}^{2}$ (x)/ $σ_{2}^{2}$ (x). Direct calculation from (8) shows that the asymptotic mean squared errors are

\begin{array}{l} lim_{n_{1} \to \infty} E [n_{1} {{\hat{S}}_{1} (x) - S_{1} (x)}^{2}] = \frac{{2 + c (x)} σ_{1}^{2} (x)}{2 {1 + c (x)}}, \\ lim_{n_{2} \to \infty} E [n_{2} {{\hat{S}}_{2} (x) - S_{2} (x)}^{2}] = \frac{{1 + 2 c (x)} σ_{2}^{2} (x)}{2 {1 + c (x)}} . \end{array}

(9)

These are always smaller than the unrestricted counterparts $σ_{1}^{2}$ (x) and $σ_{2}^{2}$ (x).

Let S̃₁(x) and S̃₂(x) be the estimators of Rojo (2004) or El Barmi & Mukerjee (2005). On the basis of definitions of their estimators, when S₁(x) = S₂(x), the asymptotic mean squared errors are given by

\begin{array}{l} E [n_{1} {{\tilde{S}}_{1} (x) - S_{1} (x)}^{2}] = σ_{1}^{2} (x) + \frac{c {σ_{2}^{2} (x) - (2 + c) σ_{1}^{2} (x)}}{2 {1 + c}^{2}}, \\ E [n_{2} {{\tilde{S}}_{2} (x) - S_{2} (x)}^{2}] = σ_{2}^{2} (x) + \frac{c {σ_{1}^{2} (x) - (1 + 2 c) σ_{2}^{2} (x)}}{2 {1 + c}^{2}} . \end{array}

(10)

It can be shown that the asymptotic mean squared error of Ŝ_g(x) is less than or equal to that of S̃_g(x) (g = 1, 2), with equality only when $σ_{1}^{2}$ (x) = $σ_{2}^{2}$ (x), in which case S̃_g(x) and Ŝ_g(x) are asymptotically equivalent. From (10) we see that when $σ_{2}^{2}$ (x)/ $σ_{1}^{2}$ (x) > c₂/c₁ + 2, Rojo’s estimator S̃₁(x) is asymptotically less efficient than the Kaplan–Meier estimator $S_{1}^{*}$ (x) and when $σ_{1}^{2}$ (x)/ $σ_{2}^{2}$ (x) > c₁/c₂ + 2, S̃₂(x) is asymptotically less efficient than $S_{2}^{*}$ (x).

From (8), the asymptotic biases of Ŝ₁(x) and Ŝ₂(x) are

\begin{array}{l} lim_{n_{1} \to \infty} E [n_{1}^{1 / 2} {{\hat{S}}_{1} (x) - S_{1} (x)}] & = σ_{1} (x) \int_{- \infty}^{\infty} \int_{c^{1 / 2} (x)}^{\infty} \frac{c^{1 / 2} z_{2} - c (x) z_{1}}{1 + c (x)} f_{{\bar{Z}}_{2}} (z_{2}) f_{{\bar{Z}}_{1}} (z_{1}) d z_{2} d z_{1} \\ = σ_{1} (x) {[\frac{c (x)}{2 π {1 + c (x)}}]}^{1 / 2} < σ_{1} (x) {(\frac{1}{2 π})}^{1 / 2}, \\ lim_{n_{2} \to \infty} E [n_{2}^{1 / 2} {{\hat{S}}_{2} (x) - S_{2} (x)}] & = σ_{2} (x) {[\frac{1}{2 π {1 + c (x)}}]}^{1 / 2} < σ_{2} (x) {(\frac{1}{2 π})}^{1 / 2} . \end{array}

5. Confidence intervals

5.1. Asymptotic approaches

While there is a substantial literature on the estimation of survivor functions under stochastic ordering constraints, there has been little discussion of constructing confidence intervals for ordered survivor functions. Rojo (2004) demonstrated weak convergence to a Gaussian process of his estimator, from which confidence bands could be constructed. For the most part, however, asymptotic results are not particularly useful since, if the true inequalities at time t are strict, then the asymptotic distribution of Ŝ_g(t) is the same as that of the Kaplan–Meier estimator and the corresponding approximate confidence interval would be unaffected by the restrictions. In our opinion, the most promising approach to constructing confidence intervals in these problems is through resampling methods that reflect the finite sample aspects. We consider some such approaches in the next section.

5.2. Bootstrap methods

We used a nonparametric resampling scheme, in which survival time and censoring indicator pairs are drawn with replacement from the data separately for each group. For each bootstrap sample, a bootstrap estimate ${\hat{S}}_{g}^{b}$ (t) (b = 1, …, B) is obtained by applying the pointwise constrained estimator. Simple confidence intervals based on these bootstrap estimates can be constructed using the percentile or the basic bootstrap method (Efron & Tibshirani, 1993; Davison & Hinkley, 1997). For a nominal level of (1 − 2α), the percentile confidence interval for S_g(t) is { ${\hat{S}}_{g, α}^{B}$ (t), ${\hat{S}}_{g, 1 - α}^{B}$ (t)}, where ${\hat{S}}_{g, α}^{B}$ (t) is αth percentile of the bootstrap distribution. The basic bootstrap method utilizes ideas of pivotal statistics, and can also be improved by use of transformations such as h(s) = arcsin(s^1/2). The confidence interval for the basic bootstrap method is given by (h⁻¹[2h{Ŝ_g(t)} − h{ ${\hat{S}}_{g, 1 - α}^{B}$ (t)}], h⁻¹[2h{Ŝ_g(t)} − h{ ${\hat{S}}_{g, α}^{B}$ (t)}]).

While these simple methods are easy to apply, a number of different methods have been developed which have improved properties. The work in Andrews (2000) suggests that the use of the bootstrap for inference problems with order restrictions on the parameters may be particularly challenging. We investigated a number of different alternatives to the two simple bootstrap methods and present below a method which had reasonably good properties for the cases considered.

For the restricted estimation problem, the distribution of Ŝ_g(t) − S_g(t) will generally not be symmetric or centred around zero and will differ from one group g to the next. It is to be expected that the bootstrap distribution ${\hat{S}}_{g}^{b}$ (t) − Ŝ_g(t) will be similarly biased. The method we propose uses the bootstrap distribution to correct the bias, but is adjusted so as not to over-correct. Consider pointwise estimators Ŝ₁(t) and Ŝ₂(t), where S₁(t) ⩾ S₂(t). Let ${\bar{S}}_{g}^{B}$ (t) be the mean of the bootstrap estimates ${\hat{S}}_{g}^{b}$ (t). The basic bootstrap method considers a pseudo estimator given by [2h{Ŝ_g(t)} − h{ ${\hat{S}}_{g, α}^{B}$ (t)}]. While ${\bar{S}}_{1}^{B} (t) ⩾ {\bar{S}}_{2}^{B} (t)$ , the mean of the pseudo estimator may not satisfy the order constraint, i.e., it is possible that 2h{Ŝ₁(t)} − h{ ${\bar{S}}_{1}^{B}$ (t)} < 2h{Ŝ₂(t)} − h{ ${\bar{S}}_{2}^{B}$ (t)}. This can be considered as an overcorrection and it might be expected that the properties of the confidence interval could be improved if this overcorrection is modified. Let ${\tilde{S}}_{g}^{B}$ (t, a_g) = h{Ŝ_g(t)} + a_g[h{Ŝ_g(t)} − h{ ${\bar{S}}_{g}^{B}$ (t)}], where 0 ⩽ a_g ⩽ 1. Although ${\tilde{S}}_{g}^{B}$ (t, a_g) will satisfy the ordering constraint for a_g = 0, for a_g = 1, this may not be true. Given a set of a_g such that the ${\tilde{S}}_{g}^{B}$ (t, a_g) (g = 1, …, G) satisfy the ordering constraints, the proposed adjusted basic bootstrap confidence interval is (h⁻¹[2h{Ŝ_g(t)} − h{ ${\hat{S}}_{g, 1 - α}^{B}$ (t)} + δ_g], h⁻¹[2h{Ŝ_g(t)} − h{ ${\hat{S}}_{g, α}^{B}$ (t)} + δ_g]), where δ_g = (a_g − 1)[h{Ŝ_g(t)} − h{ ${\bar{S}}_{g}^{B}$ (t)}].

We propose the following method to obtain a set of a_g that satisfy the constraints. Let a₁ = a₂ = ⋯ = a_G = a and find the largest a that does not result in a violation of an order restriction. Use this value of a for the groups i and j for which ${\tilde{S}}_{i}^{B}$ (t, a_i) = ${\tilde{S}}_{j}^{B}$ (t, a_j). For the remaining groups increase a until a new violation is about to occur, and use the new value of a for the groups that have the active constraint for ${\tilde{S}}_{i}^{B}$ (t, a) and have not already had a fixed value of a_i. Continue in this way, gradually increasing a, using the value of a when constraints become active, until all values of a_g have been set or a = 1. An algorithm to obtain a_g (g = 1, …, G) is given in the Supplementary Material.

5.3. Confidence interval centred on a constrained estimator

Hwang & Peddada (1994) suggested a method in which a confidence interval is computed for the unrestricted estimator and then shifted and centred on the constrained estimator. They showed that, under fairly general conditions, the coverage probability for the shifted interval will exceed the nominal level. For the survivor function, we apply this to intervals on a log transformed scale and consider the approximate 100(1 − 2α)% confidence interval, Ŝ_g(x) exp{±z_α $σ_{g}^{*}$ (x)}, where $σ_{g}^{*}$ (x) is the standard error estimate of log $S_{g}^{*}$ (x) (Kalbfleisch & Prentice 2002, p. 17), and z_α is the αth percentile of the standard normal distribution.

6. Simulation studies

6.1. Two-sample case when sample size is small

We have conducted numerous simulation studies to compare the finite sample properties of three different constrained estimators, Rojo’s estimator (Rojo, 2004), the constrained nonparametric maximum likelihood estimator (Park et al., 2012) and the pointwise constrained estimator, and compared them to the unconstrained Kaplan–Meier estimator in the two-sample case. In this paper, we show results for scenarios where G = 2 and S₁(t) ⩾ S₂(t) for all t.

The upper and lower plots of each panel in Fig. 1 show differences of root mean squared errors of estimators of S₁(t) and S₂(t) over a range of values of t, compared with the pointwise constrained estimator. In cases with the same censoring distributions, Fig 1(a), Rojo’s estimator and the pointwise constrained estimator have smaller root mean squared error than the other estimators. However, if populations 1 and 2 have different censoring distributions, the pointwise constrained estimator has smallest root mean squared error among all estimators at almost all times. Rojo’s estimator does not adjust well to the unequal censoring distributions, Figs. 1(b)–(f), even when the censoring rates are close to each other, Fig. 1(d). The pointwise constrained estimator is the only estimator that dominates the Kaplan–Meier estimator at almost all times in all situations considered. Each simulation is based on 10 000 replications.

6.2. Two-sample case: asymptotic properties

We define the asymptotic relative efficiency as the inverse ratio of the mean squared errors and compare the asymptotic relative efficiency of the three constrained estimators to the Kaplan–Meier estimator in the two sample case in Fig. 2. The underlying distributions are S₁(t) = S₂(t) = exp(−t), $S_{1}^{c}$ (t) = 1 and $S_{2}^{c}$ (t) = exp(−2t). The constraint is asymptotically relevant at all times. We set lim_{n₁,n₂→∞} n₁/n₂ = 1. The asymptotic relative efficiency of the full constrained nonparametric maximum likelihood estimator is based on simulated data with a very large sample size. Asymptotic relative efficiencies of the pointwise constrained estimator and Rojo’s estimator are calculated using (9) and (10).

Fig. 2 — Comparison of asymptotic relative efficiencies under the constraint T₁ ⩾_st T₂. The underlying distributions are: S₁(t) = S₂(t) = exp(−t), $S_{1}^{c}$ (t) = 1 and $S_{2}^{c}$ (t) = exp(−2t). Kaplan–Meier estimator (thin dashed); constrained nonparametric maximum likelihood estimator (thick solid); Rojo’s estimator (thick dashed); pointwise constrained estimator (thin solid).

The pointwise constrained estimator dominates all other estimators for all t, whereas Rojo’s estimator could be inefficient for some t, as seen in Fig. 2(a). Compared with the Kaplan–Meier estimator, the full constrained nonparametric maximum likelihood estimator is less efficient at all times in this setting.

6.3. Simple ordering case

In this section, we compare finite sample properties of the pointwise constrained estimator with the Kaplan–Meier estimator in the simple ordering case and investigate the confidence intervals described in § 5. We consider three groups with underlying distributions T₁ ∼ exp(1), T₂ ∼ exp(1.1) and T₃ ∼ exp(1.4) and a uniform censoring distribution C ∼ U(0, 4.3), which gives an overall censoring rate of about 20%. Sample sizes are n₁ = n₃ = 40 and n₂ = 20. The simulation is based on 10 000 replicates.

Figure 3 shows the mean squared error of the pointwise constrained estimator and the Kaplan–Meier estimator. The figure shows efficiency gains for the pointwise constrained estimator at all times for all groups, with the largest gains for the estimation of S₂(t), where the mean squared error of the pointwise constrained estimator is less than half of the mean squared error of the Kaplan–Meier estimator at almost all times.

Fig. 3 — Comparison of the Kaplan–Meier estimator and pointwise constrained estimator in the three sample case. Comparison of the Kaplan–Meier estimates for groups 1 (black solid), 2 (thick solid) and 3 (thin solid) and pointwise constrained estimator for group 1 (black dashed), 2 (thick dashed) and 3 (thin dashed) case.

Bootstrap intervals are based on 1999 bootstrap estimates. We evaluate confidence intervals at time 0.26 and 0.63, where the survival rates of group 2 are 0.75 and 0.5, respectively. In addition, we also conducted a simulation study for additional two cases with different distributions; see Table 1. The coverage rates and average widths of the confidence intervals described in § 5 are shown in Table 1. As expected, the confidence interval centred on the pointwise constrained estimator, Ŝ_g exp(±1.96 $σ_{g}^{*}$ ), is overly conservative with large average width and has higher coverage rate. The bootstrap methods give confidence intervals with significantly reduced widths, but the coverage rates can be somewhat low for some groups, especially when using the percentile or the basic bootstrap methods. The transformation and the adjusted methods described in § 5 both give slightly better coverage rates. The overall best results are obtained with the combination of the basic bootstrap with arcsin(s^1/2) transformation and controlling for bias overcorrection.

Table 1.

Percent coverage, average width, of nominal 95% confidence intervals

	t = 0.26			t = 0.63
Distribution	exp(1)	exp(1.1)	exp(1.4)	exp(1)	exp(1.1)	exp(1.4)
Percentile	91 (20.8)	95 (22.9)	95 (24.9)	94 (27.2)	95 (28.2)	94 (27.4)
Basic	89 (20.8)	90 (22.9)	91 (24.9)	91 (27.2)	88 (28.2)	90 (27.4)
With adjustment	90 (20.8)	92 (22.9)	93 (24.9)	92 (27.2)	91 (28.2)	91 (27.4)
arcsin(s^1/2)	94 (22.5)	93 (23.1)	93 (24.4)	93 (27.3)	90 (28.0)	92 (27.8)
With adjustment	95 (22.4)	94 (23.2)	94 (24.5)	94 (27.3)	93 (28.1)	94 (27.8)
Ŝ_g exp{± 1.96 $σ_{g}^{*}$ }	92 (26.7)	98 (37.8)	97 (28.3)	95 (33.6)	99 (46.5)	97 (31.7)
Distribution	exp(1)	exp(1.05)	exp(1.2)	exp(1)	exp(1.05)	exp(1.2)
Percentile	90 (20.2)	96 (21.5)	94 (23.4)	92 (26.2)	96 (26.5)	94 (26.6)
Basic	90 (20.2)	92 (21.5)	93 (23.4)	92 (26.2)	91 (26.5)	92 (26.6)
With adjustment	91 (20.2)	94 (21.5)	94 (23.4)	92 (26.2)	93 (26.5)	93 (26.6)
arcsin(s^1/2)	95 (21.9)	94 (21.8)	94 (22.7)	94 (26.4)	93 (26.4)	94 (26.8)
With adjustment	96 (21.7)	95 (21.9)	95 (22.9)	95 (26.4)	95 (26.4)	95 (26.8)
Ŝ_g exp{± 1.96 $σ_{g}^{*}$ }	93 (26.9)	98 (37.6)	98 (27.1)	95 (34)	99 (47.2)	98 (31.5)
Distribution	exp(1)	exp(1.2)	exp(1.6)	exp(1)	exp(1.2)	exp(1.6)
Percentile	92 (21.6)	95 (24.6)	95 (26.2)	94 (28.1)	95 (29.8)	94 (27.5)
Basic	90 (21.6)	87 (24.6)	90 (26.2)	91 (28.1)	86 (29.8)	89 (27.5)
With adjustment	91 (21.6)	90 (24.6)	92 (26.2)	92 (28.1)	89 (29.8)	91 (27.5)
arcsin (s^1/2)	93 (23.2)	90 (24.7)	92 (25.8)	93 (28.2)	88 (29.6)	92 (28.0)
With adjustment	95 (23.1)	93 (24.8)	94 (25.9)	94 (28.2)	92 (29.7)	94 (28.0)
Ŝ_g exp{± 1.96 $σ_{g}^{*}$ }	93 (26.6)	98 (38.9)	96 (29.4)	95 (33.2)	99 (47.0)	96 (31.3)

Open in a new tab

Sample sizes are n₁ = 40, n₂ = 20 and n₃ = 40 and censoring distribution is Un(0, 4.3). The five bootstrap confidence intervals are the percentile method, and the basic bootstrap method with or without arcsin(s^1/2) transformation and with or without an adjustment for bias overcorrection. Ŝ_g exp{±1.96 $σ_{g}^{*}$ } is the centred method of Hwang & Peddada (1994). Results are based on 10 000 simulation samples.

7. Example

The data are from prostate cancer patients who received radiation therapy at the University of Michigan Hospital, a portion of the data used in Proust-Lima & Taylor (2009). Five hundred and three patients without planned hormonal therapy are used to estimate the survivor function of time to first recurrence of prostate cancer. For this analysis, recurrence is defined as the first of local recurrence, distant metastasis or initiation of salvage hormone therapy.

It is expected that patients with higher baseline prostate-specific antigen levels have a higher recurrence rate than those with lower baseline prostate-specific antigen values. The Gleason grade is a measure of the aggressiveness of the tumour cells obtained from microscopic inspection of a biopsy prior to the treatment. It is also expected that patients with a lower Gleason grade will have a lower recurrence rate. In this example, we divided the patients into six groups labelled A1, A2, A3, B1, B2 and B3 based on whether or not their baseline prostate-specific antigen is less than 10, and whether their Gleason grade is ⩽6, =7 or ⩾8. Patients with baseline prostate-specific antigen <10 and Gleason ⩽6 are labelled as A1, patients with baseline prostate-specific antigen <10 and Gleason =7 as A2 etc. The natural set of constraints for the survivor functions are A1 ⩾ A2 ⩾ A3, B1 ⩾ B2 ⩾ B3, A1 ⩾ B1, A2 ⩾ B2 and A3 ⩾ B3.

The Kaplan–Meier estimates of each groups are shown in Fig. 4(a). The unrestricted Kaplan–Meier estimates do not satisfy the stochastic ordering constraints. Specifically, between 1 and 2.5 years, the groups A2, B2 and B3 do not satisfy the ordering constraints and after 5 years the orderings of A2 and A3, and B2 and B3 are incorrect.

The pointwise constrained estimates, shown in Fig. 4(b), satisfy the stochastic ordering constraints at all times. Between 1 and 2.5 years, the survivor functions take a common value in groups A2, B2 and B3 and after 5 years, groups A2 and A3 and groups B2 and B3 have common estimates. At around 12.5 years, there is a jump in the survivor function estimate for groups B2 and B3, even though there are no observed events at that time. This happens because the number of individuals at risk in the stochastically smaller group B3 at time t = 12.5 changes, which results in ĥ^δ(t) < 0, as discussed in § 2.3.

Detailed results of point estimates and corresponding confidence intervals for some selected times are shown in Table 2.

Table 2.

Estimates and confidence intervals (%) of survivor functions for some selected times in the prostate cancer example

	Time (years)	1.5	5	8
A1	Kaplan–Meier estimator	99.4 (97.6, 100)	93.9 (90.0, 97.2)	83.6 (76.4, 91.1)
A1	Pointwise constrained estimator	99.4 (97.6, 99.9)	93.9 (90.0, 97.2)	83.6 (76.4, 90.2)
A2	Kaplan–Meier estimator	99.1 (96.6, 99.9)	83.4 (75.6, 90.2)	73.0 (63.2, 82.3)
A2	Pointwise constrained estimator	99.1 (96.5, 99.9)	83.4 (76.6, 89.9)	73.0 (64.0, 83.0)
A3	Kaplan–Meier estimator	80.0 (36.0, 98.0)	70.0 (44.8, 92.7)	70.0 (44.8, 100)
A3	Pointwise constrained estimator	88.7 (70.9, 94.9)	70.0 (53.2, 91.6)	70.0 (52.6, 93.6)
B1	Kaplan–Meier estimator	98.0 (92.2, 99.7)	78.3 (71.2, 86.1)	67.0 (57.8, 77.8)
B1	Pointwise constrained estimator	98.0 (95.2, 99.7)	78.3 (71.2, 86.1)	67.0 (57.8, 77.8)
B2	Kaplan–Meier estimator	86.8 (79.6, 93.6)	48.8 (38.9, 62.2)	34.2 (22.9, 52.6)
B2	Pointwise constrained estimator	88.7 (83.0, 93.9)	48.8 (39.6, 59.9)	39.8 (30.3, 54.0)
B3	Kaplan–Meier estimator	96.4 (86.2, 99.8)	47.9 (33.4, 67.5)	47.9 (33.5, 69.4)
B3	Pointwise constrained estimator	88.7 (83.5, 93.8)	47.9 (38.4, 69.2)	39.8 (29.9, 54.2)

Open in a new tab

Nominal 95% bootstrap confidence intervals using arcsin(s^1/2) transformation and controlling bias overcorrection are shown in parentheses.

8. Discussion

The pointwise constrained estimator is a likelihood based pointwise estimator. Unlike the full constrained nonparametric maximum likelihood estimator, the violation of a constraint at one time does not affect the estimates at other times. The pointwise constrained estimator gives a common estimate based on maximizing the likelihood when the constraints are violated and compared with other estimators that use averaging based on initial sample sizes (Rojo, 2004; El Barmi & Mukerjee, 2005), it has better properties when censoring exists.

When there is no censoring, Rojo’s estimator in the two-sample case and El Barmi and Mukerjee’s estimator in the simple ordering case are identical to the pointwise constrained estimator. However, if censoring exists, these estimators can be quite different, especially when the censoring distributions differ significantly between groups. Another feature of El Barmi and Mukerjee’s estimator is the range of times for which the estimator is defined. Specifically, it is defined only until the minimum of the times of the last observations in all groups. Thus, if the last observed time in one group is much earlier than in other groups, then estimates in all other groups are undefined at subsequent times even though there may be a large number of observations at risk. On the other hand, the pointwise constrained estimator for a group is defined up to the last observed time of that group.

The pointwise constrained estimator can have jumps at nonevent times. Thus, the likelihood ratio statistics of the restricted survivor function, first introduced by Thomas & Grunkemeier (1975) and discussed by Li (1995) and Murphy (1995) are not exactly correct, because they assume the jumps can only occur at event times. Thus, the likelihood ratio test and confidence interval based on the likelihood ratio test may need to be revised.

Methods to construct confidence intervals in order restricted problems are not well developed. Bootstrap methods generally work better when the distributions are approximately normal after some transformations. When constraints are present, it is not clear whether there exists any such transformation. We proposed a method to control overcorrection of bias when using the basic bootstrap methods and found improved properties of confidence intervals. Further investigation of this approach on other applications could be useful.

Acknowledgments

This research was partially supported by the National Institutes of Health, U.S.A. We thank the editor, associate editor and two referees for their many valuable comments.

Appendix.

Proof of Theorem 1. Let λ₁ and λ₂ be Lagrange multipliers. The corresponding Lagrangian function is

\begin{array}{l} Λ (h, λ) = & \sum_{i = 1}^{m} [d_{i} log {1 - exp h (X_{i})} + (n_{i} - d_{i}) h (X_{i})] + N (x) h^{δ} (x) \\ + λ_{1} {\sum_{j = 1}^{M (x)} h (X_{j}) + h^{δ} (x) - q} - λ_{2} h^{δ} (x) . \end{array}

The Karush–Kuhn–Tucker conditions that must be satisfied at the solution ĥ are:

- \frac{d_{i} exp \hat{h} (X_{i})}{1 - exp \hat{h} (X_{i})} + (n_{i} - d_{i}) + {\hat{λ}}_{1} = 0, i ⩽ M (x),

(A1)

- \frac{d_{i} exp \hat{h} (X_{i})}{1 - exp \hat{h} (X_{i})} + (n_{i} - d_{i}) = 0, i > M (x),

(A2)

N (x) + {\hat{λ}}_{1} - {\hat{λ}}_{2} = 0,

(A3)

\sum_{j = 1}^{M (x)} \hat{h} (X_{j}) + {\hat{h}}^{δ} (x) - q = 0,

(A4)

{\hat{h}}^{δ} (x) ⩽ 0,

(A5)

{\hat{λ}}_{2} {\hat{h}}^{δ} (x) = 0,

(A6)

{\hat{λ}}_{2} ⩾ 0.

(A7)

From (A1), we have ĥ₁ (X_i) = log{1 − d_i/(n_i + λ̂₁)} for i ⩽ M₁(x). Either λ̂₂ = 0 or ĥ^δ (x) = 0 from (A6). If λ̂₂ = 0, then λ̂₁ = −N(x) from (A3), which is only valid when ${\hat{h}}^{δ} (x) = q - \sum_{j = 1}^{M (x)} log [1 - d_{i} / {n_{i} - N (x)}] ⩽ 0$ ; otherwise ĥ^δ (x) = 0 and λ̂₁ is the solution of the equation, $q - \sum_{j = 1}^{M (x)} log {1 - d_{i} / (n_{i} + λ)} = 0$ , from (A4), which is only valid when λ̂₁ ⩾ − N(x) from (A3). Since $\sum_{j = 1}^{M (x)} log {1 - d_{i} / (n_{i} + k)}$ is an increasing function in k, we can see that λ̂₁ = max{k̂, −N(x)}, where k̂ is the solution of the equation $\sum_{j = 1}^{M (x)} log {1 - d_{i} / (n_{i} + k)} - q = 0$ . It follows that λ̂₁ is exactly the same as K (q; x) defined in Theorem 1. Therefore, the unique solution from solving (A2)–(A7) is as given in Equation (4).

Proof of Lemma 1. We consider separately two cases where K (q; x) > −N(x) and K (q; x) = −N(x).

If K (q; x) > −N(x), then ĥ^δ (x) = 0 and ĥ(q; X_i) = log(1 − d_i/n_i) for i > M(x), which does not depend on q. For any i ⩽ M(x),

\begin{array}{l} \frac{d}{d \hat{h} (q; X_{i})} [d_{i} log {1 - exp \hat{h} (q; X_{i})} + (n_{i} - d_{i}) \hat{h} (q; X_{i})] & = \frac{- d_{i} exp {\hat{h} (q; X_{i})}}{1 - exp {\hat{h} (q; X_{i})}} + n_{i} - d_{i} \\ = n_{i} - \frac{d_{i}}{1 - exp \hat{h} (q; X_{i})}} \\ = n_{i} - {n_{i} + K (q; x)} = - K (q; x) . \end{array}

Thus,

\begin{array}{l} \frac{d}{d q} p l_{h} (q; x) & = \frac{d}{d q} (\sum_{i = 1}^{m} [d_{i} log {1 - exp \hat{h} (q; X_{i})} + (n_{i} - d_{i}) \hat{h} (q; X_{i})] + N (x) {\hat{h}}^{δ} (x)) \\ = \sum_{i = 1}^{M (x)} - K (q; x) \frac{d \hat{h} (q; X_{i})}{d q} = - K (q; x) \frac{d}{d q} \sum_{i = 1}^{M (x)} \hat{h} (q; X_{i}) = - K (q; x) . \end{array}

If K (q; x) = −N(x), then ĥ(q; X_i) = log[1 − d_i/{n_i − N(x)}] for i ⩾ M(x) and ĥ(q; X_i) = log(1 − d_i/n_i) for i > M(x) are not functions of q. It follows that

\begin{array}{l} \frac{d}{d q} ℓ (q; x) & = \frac{d}{d q} (\sum_{i = 1}^{m} [d_{i} log {1 - exp \hat{h} (q; X_{i})} + (n_{i} - d_{i}) \hat{h} (q; X_{i})] + N (x) {\hat{h}}^{δ} (q; x)) \\ = N (x) = - K (q; x) . \end{array}

Supplementary material

Supplementary material available at Biometrika online includes the generalized pool-adjacent-violators algorithm for the simple ordering case in § 2.5, proofs of Theorems 2–4, and an algorithm to calculate a_g described in § 5.2.

References

Andrews DWK. Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica. 2000;68:399–405. [Google Scholar]
Barlow RE, Bartholomew D, Bremner JM, Brunk HD. Statistical Inference under Order Restrictions. New York: Wiley; 1972. [Google Scholar]
Best MJ, Chakravarti N, Ubhaya VA. Minimizing separable convex functions subject to simple chain constraints. SIAM J Optimiz. 1999;10:658–72. [Google Scholar]
Brunk HD, Franck WE, Hanson DL, Hogg RV. Maximum likelihood estimation of the distributions of two stochastically ordered random variables. J Am Statist Assoc. 1966;61:1067–80. [Google Scholar]
Davison AC, Hinkley DV. Bootstrap Methods and their Application. Cambridge: Cambridge University Press; 1997. [Google Scholar]
Dykstra RL. Maximum likelihood estimations of the survival functions of stochastically ordered random variables. J Am Statist Assoc. 1982;77:621–8. [Google Scholar]
Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman and Hall; 1993. [Google Scholar]
El Barmi H, Mukerjee H. Inferences under a stochastic ordering constraint: the k-sample case. J Am Statist Assoc. 2005;100:252–61. [Google Scholar]
Gill R. Large sample behaviour of the product-limit estimator on the whole line. Ann Statist. 1983;11:49–58. [Google Scholar]
Hoff PD. Nonparametric estimation of convex models via mixtures. Ann Statist. 2003;31:174–200. [Google Scholar]
Hwang JTG, Peddada SD. Confidence interval estimation subject to order restrictions. Ann Statist. 1994;22:67–93. [Google Scholar]
Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley; 2002. [Google Scholar]
Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Statist Assoc. 1958;53:457–81. [Google Scholar]
Kuhn HW, Tucker AW. Nonlinear programming. In: Neyman J, editor. Proc 2nd Berkeley Symp. Berkeley: University of California Press; 1951. pp. 481–92. [Google Scholar]
Lehmann EL. Ordered families of distributions. Ann Math Statist. 1955;26:399–419. [Google Scholar]
Li G. On nonparametric likelihood ratio estimation of survival probabilities for censored data. Ann Statist. 1995;25:95–104. [Google Scholar]
Lim J, Kim SJ, Wang X. Estimating stochastically ordered survival functions via geometric programming. J Comp Graph Statist. 2009;18:978–94. [Google Scholar]
Lo SH. Estimation of distribution functions under order restrictions. Statist Dec. 1987;5:251–62. [Google Scholar]
Murphy SA. Likelihood ratio-based confidence intervals in survival analysis. J Am Statist Assoc. 1995;90:1399–405. [Google Scholar]
Park Y, Kalbfleisch JD, Taylor JMG. Constrained nonparametric maximum likelihood estimation of stochastically ordered survivor functions. Can J Statist. 2012;40:22–39. doi: 10.1093/biomet/ass006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Proust-Lima C, Taylor JMG. Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach. Biostatistics. 2009;10:535–49. doi: 10.1093/biostatistics/kxp009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rojo J. On the estimation of survival functions under a stochastic order constraint. In: Perez-Abreu V, Rojo J, editors. The First Erich L Lehmann Symposium: Optimality. Vol. 44. Beachwood, OH: Institute of Mathematical Statistics; 2004. pp. 37–61. [Google Scholar]
Rojo J, Ma Z. On the estimation of stochastically ordered survival functions. J Statist Comp Simul. 1996;55:1–21. [Google Scholar]
Stute W, Wang J-L. The strong law under random censorship. Ann Statist. 1993;21:1591–607. [Google Scholar]
Thomas DR, Grunkemeier GL. Confidence interval estimation of survival probabilities for censored data. J Am Statist Assoc. 1975;70:865–71. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[b1-ass006] Andrews DWK. Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica. 2000;68:399–405. [Google Scholar]

[b2-ass006] Barlow RE, Bartholomew D, Bremner JM, Brunk HD. Statistical Inference under Order Restrictions. New York: Wiley; 1972. [Google Scholar]

[b3-ass006] Best MJ, Chakravarti N, Ubhaya VA. Minimizing separable convex functions subject to simple chain constraints. SIAM J Optimiz. 1999;10:658–72. [Google Scholar]

[b4-ass006] Brunk HD, Franck WE, Hanson DL, Hogg RV. Maximum likelihood estimation of the distributions of two stochastically ordered random variables. J Am Statist Assoc. 1966;61:1067–80. [Google Scholar]

[b5-ass006] Davison AC, Hinkley DV. Bootstrap Methods and their Application. Cambridge: Cambridge University Press; 1997. [Google Scholar]

[b6-ass006] Dykstra RL. Maximum likelihood estimations of the survival functions of stochastically ordered random variables. J Am Statist Assoc. 1982;77:621–8. [Google Scholar]

[b7-ass006] Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman and Hall; 1993. [Google Scholar]

[b8-ass006] El Barmi H, Mukerjee H. Inferences under a stochastic ordering constraint: the k-sample case. J Am Statist Assoc. 2005;100:252–61. [Google Scholar]

[b9-ass006] Gill R. Large sample behaviour of the product-limit estimator on the whole line. Ann Statist. 1983;11:49–58. [Google Scholar]

[b10-ass006] Hoff PD. Nonparametric estimation of convex models via mixtures. Ann Statist. 2003;31:174–200. [Google Scholar]

[b11-ass006] Hwang JTG, Peddada SD. Confidence interval estimation subject to order restrictions. Ann Statist. 1994;22:67–93. [Google Scholar]

[b12-ass006] Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley; 2002. [Google Scholar]

[b13-ass006] Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Statist Assoc. 1958;53:457–81. [Google Scholar]

[b14-ass006] Kuhn HW, Tucker AW. Nonlinear programming. In: Neyman J, editor. Proc 2nd Berkeley Symp. Berkeley: University of California Press; 1951. pp. 481–92. [Google Scholar]

[b15-ass006] Lehmann EL. Ordered families of distributions. Ann Math Statist. 1955;26:399–419. [Google Scholar]

[b16-ass006] Li G. On nonparametric likelihood ratio estimation of survival probabilities for censored data. Ann Statist. 1995;25:95–104. [Google Scholar]

[b17-ass006] Lim J, Kim SJ, Wang X. Estimating stochastically ordered survival functions via geometric programming. J Comp Graph Statist. 2009;18:978–94. [Google Scholar]

[b18-ass006] Lo SH. Estimation of distribution functions under order restrictions. Statist Dec. 1987;5:251–62. [Google Scholar]

[b19-ass006] Murphy SA. Likelihood ratio-based confidence intervals in survival analysis. J Am Statist Assoc. 1995;90:1399–405. [Google Scholar]

[b20-ass006] Park Y, Kalbfleisch JD, Taylor JMG. Constrained nonparametric maximum likelihood estimation of stochastically ordered survivor functions. Can J Statist. 2012;40:22–39. doi: 10.1093/biomet/ass006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b21-ass006] Proust-Lima C, Taylor JMG. Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach. Biostatistics. 2009;10:535–49. doi: 10.1093/biostatistics/kxp009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b22-ass006] Rojo J. On the estimation of survival functions under a stochastic order constraint. In: Perez-Abreu V, Rojo J, editors. The First Erich L Lehmann Symposium: Optimality. Vol. 44. Beachwood, OH: Institute of Mathematical Statistics; 2004. pp. 37–61. [Google Scholar]

[b23-ass006] Rojo J, Ma Z. On the estimation of stochastically ordered survival functions. J Statist Comp Simul. 1996;55:1–21. [Google Scholar]

[b24-ass006] Stute W, Wang J-L. The strong law under random censorship. Ann Statist. 1993;21:1591–607. [Google Scholar]

[b25-ass006] Thomas DR, Grunkemeier GL. Confidence interval estimation of survival probabilities for censored data. J Am Statist Assoc. 1975;70:865–71. [Google Scholar]

PERMALINK

Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions

Yongseok Park

Jeremy M G Taylor

John D Kalbfleisch

Abstract

1. Introduction

2. Estimation methods

2.1. Notation and likelihood

2.2. Linearly constrained convex minimization

2.3. Maximum likelihood estimator of the survivor function subject to a single constraint

2.4. Reformulation of the problem using profile likelihood

2.5. Generalized pool-adjacent-violators algorithm in the simple ordering case

3. Consistency and asymptotic distribution

4. Comparison with the Kaplan–Meier estimator when sample size is large

4.1. Simple ordering case

4.2. The two sample case, G = 2

5. Confidence intervals

5.1. Asymptotic approaches

5.2. Bootstrap methods

5.3. Confidence interval centred on a constrained estimator

6. Simulation studies

6.1. Two-sample case when sample size is small

Fig. 1.

6.2. Two-sample case: asymptotic properties

Fig. 2.

6.3. Simple ordering case

Fig. 3.

Table 1.

7. Example

Fig. 4.

Table 2.

8. Discussion

Acknowledgments

Appendix.

Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases