Abstract
In this paper, we consider estimation of survivor functions from groups of observations with right-censored data when the groups are subject to a stochastic ordering constraint. Many methods and algorithms have been proposed to estimate distribution functions under such restrictions, but none have completely satisfactory properties when the observations are censored. We propose a pointwise constrained nonparametric maximum likelihood estimator, which is defined at each time t by the estimates of the survivor functions subject to constraints applied at time t only. We also propose an efficient method to obtain the estimator. The estimator of each constrained survivor function is shown to be nonincreasing in t, and its consistency and asymptotic distribution are established. A simulation study suggests better small and large sample properties than for alternative estimators. An example using prostate cancer data illustrates the method.
Keywords: Censored data, Constrained nonparametric maximum likelihood estimator, Kaplan–Meier estimator, Maximum likelihood estimator, Order restriction
1. Introduction
Stochastic ordering is an important concept and has a wide range of applications, in such fields as biomedical research, economics and system reliability. We often encounter situations where there is prior knowledge of stochastic ordering among distributions. For example, in a cancer study, we expect patients with a lower stage of cancer at diagnosis to have lower death rates at all times than those with a higher stage. In addition to the natural desire for estimators of the distributions to satisfy the same expected ordering restrictions as the underlying distributions, there is the potential for improved efficiency by applying the constraints in the estimation method.
For random variables T1 and T2 with corresponding survivor functions S1(t) and S2(t), T1 is stochastically larger than T2, T1 ⩾st T2, if S1(t) ⩾ S2(t) for all t (Lehmann, 1955). For G groups, the concept can be generalized to partial ordering; specifically, we say that Tg (g = 1, …, G) satisfy the partial-ordering constraints defined by the constraint set E ⊂ {1, …, G}2 if for any (i, j) ∈ E, Ti ⩾st Tj. Special cases of this are simple ordering, in which T1 ⩾st ⋯ ⩾st TG, for which E = {(1, 2), (2, 3), …, (G − 1, G)}; tree ordering, in which T1 ⩾st T2, T1 ⩾st T3, …, T1 ⩾st TG for which E = {(1, 2), (1, 3), …, (1, G)}; umbrella ordering, in which T1 ⩾st ⋯ ⩾st Ti ⩾st Ti+1 ⩾st ⋯ ⩾st TG for which E = {(1, 2), (2, 3), …, (i − 1, i), (i + 1, i), (i + 2, i + 1), …, (G, G − 1)}, and factorial ordering such as T1 ⩾st T2 ⩾st T4, T1 ⩾st T3 ⩾st T4, for which E = {(1, 2), (2, 4), (1, 3), (3, 4)}.
We consider independent right-censored samples of the form (Ygi, Δgi) (g = 1, …, G; i =1, …, ng), where Ygi is the observed time and Δgi is the event indicator. We assume that the censoring mechanism is independent, so that the generalized likelihood is
(1) |
The E-constrained nonparametric maximum likelihood estimator maximizes (1) subject to the partial-ordering constraint E. Brunk et al. (1966) studied the constrained nonparametric maximum likelihood estimator in the two-sample case without censoring. Dykstra (1982), as corrected by Park et al. (2012), extended this result to right-censored data. In the case of three or more populations with general partial-ordering constraints, Hoff (2003) and Lim et al. (2009) proposed different computational methods for obtaining the constrained nonparametric maximum likelihood estimator.
This estimator has the undesirable property that a violation of a constraint in the Kaplan–Meier estimators (Kaplan & Meier, 1958) at an earlier time affects the estimator at a later time, even if there is no violation at this later time. A number of authors have noted that the constrained nonparametric maximum likelihood estimator can have relatively large pointwise bias and mean squared error at a fixed t and have suggested alternatives (Rojo & Ma, 1996; Rojo, 2004; El Barmi & Mukerjee, 2005) that can have better mean squared error properties. Park et al. (2012) noted a correction to the constrained nonparametric maximum likelihood estimator presented by Dykstra (1982), which led to improved properties, but this corrected estimator still often has poorer pointwise properties than other estimators, some of which are relatively simple to define. In the two-sample problem, Lo (1987) suggested swapping the Kaplan–Meier estimates of the survivor functions when the constraint is violated. Rojo (2004) proposed estimating both survivor functions as the weighted average of the two Kaplan–Meier estimators at times when the constraint is violated, where the weights are based on the initial sample sizes. El Barmi & Mukerjee (2005) extended Rojo’s estimators to the simple ordering situation using isotonic regression. The simulation study in Park et al. (2012) shows that some of these estimators have smaller mean squared error than the constrained nonparametric maximum likelihood estimator when the censoring distributions are equal, but when the censoring distributions differ substantially between groups, the alternative estimators may have larger mean squared error than the constrained nonparametric maximum likelihood estimator. Moreover, these alternative estimators have not been explicitly extended to a general partial-ordering case.
When we consider finite sample properties of an estimator Ŝ(t), we typically consider point-wise criteria, such as pointwise bias or pointwise mean squared error at each fixed t. In contrast to pointwise estimators such as described in Rojo (2004) and Lo (1987), the constrained non-parametric maximum likelihood estimator estimates the whole survival curve. So it is perhaps not surprising that Rojo’s estimator typically has better properties when evaluated using metrics such as pointwise mean squared error. On the other hand, these pointwise estimators do not adapt well to unequal censoring distributions between groups, whereas the constrained nonparametric maximum likelihood estimator does. This motivated us to propose a new constrained estimator, a pointwise constrained nonparametric maximum likelihood estimator or pointwise constrained estimator for convenience.
Definition 1 (Pointwise constrained estimator). For each specified time x, let S̃g(t; x) be the maximum likelihood estimator of Sg(t) under the constraint Si (x) ⩾ Sj (x) for all (i, j) ∈ E. Then Ŝg(t) = S̃g(t; t) (g = 1, …, G) for all t is the pointwise constrained estimator of the survivor function Sg under the partial stochastic ordering constraint E.
2. Estimation methods
2.1. Notation and likelihood
To obtain the pointwise constrained estimator as given in Definition 1, it is required to maximize the likelihood (1) subject to the constraints Si (x) ⩾ Sj (x) for all (i, j) ∈ E for a fixed time x. This will give the estimates of S̃1(t; x), …, S̃G(t; x) and the constrained maximization will be repeated for all times x of interest.
Let Xgj (j = 1, …, mg) be the distinct event times in group g and define Xg0 = 0 and Xg(mg+ 1) = ∞ (g = 1, …, G). Let Ng(t) be the number at risk at time t in group g and let Mg(t) be the number of distinct events in (0, t] in group g. Let dgj and ngj be, respectively, the number of events and the number at risk in group g at time Xgj.
It is convenient to redefine the problem in terms of hazards. Let hg(t) = log{Sg(t)/Sg(t−)}, so that 1 − exp{hg(t)} is the discrete hazard in group g at time t. The loglikelihood of (1) is
(2) |
where hg = {hg(Xg1), …, hg(Xgmg), (x)} (g = 1, …, G). The corresponding constraints are , for all (p, r) ∈ E, and (x) ⩽ 0. In this, (x) = I (x ≠ XgMg(x))hg(x), which is included to account for the fact that if x = XgMg(x), we do not have the extra term Ng(x)hg(x) in the loglikelihood (2).
2.2. Linearly constrained convex minimization
There is a large literature on general approaches to linearly constrained convex minimization problems. There are essentially three types of algorithms: interior point, primal active set and dual active set methods. In general, our data contain many more observed event times than groups. Interior point and primal active set methods simultaneously optimize over the large number of quantities (x) and hg(Xgi) (g = 1, …, G; i = 1, …, mg) at each time x of interest, and so are not computationally efficient in our setting. Dual active set methods may involve many fewer parameters, but the dual function itself is difficult to express as a function of Lagrange multipliers and the feasible range of these multipliers is difficult to specify in our problem. So the dual active set method is also difficult to implement in our context.
In § 2.4, we transform the problem of maximizing the loglikelihood (2) subject to the linear constraints to another simple concave maximization problem subject to linear constraints by using the profile likelihood. In preparation for this, we first discuss the constrained maximum likelihood estimator of the survivor function in the one-sample case.
2.3. Maximum likelihood estimator of the survivor function subject to a single constraint
In the one-sample case without constraints, the maximum likelihood estimator has probability mass only at the observed event times. The loglikelihood analogous to (2) is
(3) |
where h = {h(X1), …, h(Xm)} and (3) is maximized at h(Xj) = log(1 − dj/nj) (j = 1, …, m), which corresponds to the Kaplan–Meier estimator.
Consider now the maximum likelihood estimator subject to the constraint S(x) = exp(q). The maximum likelihood estimator of the survivor function will have positive probability mass at event times Xi and nonnegative probability mass at time x. The optimization problem is to maximize the loglikelihood of h = {h(X1), …, h(Xm), hδ(x)},
subject to and hδ(x) ⩽ 0.
Let K (q; x) = −N(x) if M(x) = 0, and otherwise let K (q; x) = max(−N(x), k̂), where k̂ is the unique solution of the equation . Here, k̂ = ∞ if q = 0 and k̂ = dM(x) − nM(x) if q = −∞. Let , where
(4) |
Theorem 1. The maximum likelihood estimator of S(t) subject to constraint S(x) = exp(q) at a given x is Ŝ(t) = exp {∑Xj⩽t ĥ(q; Xj) + I (t ⩾ x)hδ (q; x) } (t ⩽ τ), where τ is the last observed time.
Proof. See the Appendix.
Thomas & Grunkemeier (1975) and Li (1995) considered the maximization problem described above. However, Thomas & Grunkemeier (1975) solved the problem with the equality constraint , which implicitly assumes that ĥ(x) = 0 if x is not an observed event time, whereas Li (1995) mistakenly proved that ĥ(x) = 0 unless x is an observed event time. In fact, the maximization problem described above involves two constraints: and hδ (x) ⩽ 0. It is possible that ĥδ (x) < 0 if K (q; x) = −N(x). The inequality constraint, hδ (x) ⩽ 0, has been neglected in these approaches. It is necessary, however, to apply the Karush–Kuhn–Tucker conditions (Kuhn & Tucker, 1951) to all possible inequality constraints, including the bounds on the parameters, and only omit the redundant constraints.
2.4. Reformulation of the problem using profile likelihood
The profile loglikelihood of S(x) = exp(q) at a given x is
(5) |
where , and ĥ(q; Xi) and ĥδ(q; x) are defined in (4).
Lemma 1. The derivative of the profile loglikelihood (5) with respect to q is − K (q; x).
Proof. See the Appendix.
For given x, maximizing the loglikelihood (2) subject to the constraints in E can be redefined as maximizing the profile loglikelihood ℓ (q1, …, qG; x), which equals
(6) |
subject to constraints qi ⩾ qj, for all (i, j) ∈ E and qg ⩽ 0 (g = 1, …, G). In this formulation, only G parameters q = (q1, …, qG) need to be estimated, and Ŝg(x) = exp(q̂g), where q̂ = (q̂1, …, q̂G) is the maximum likelihood estimator of q.
Any of the general methods described in § 2.2 can be used to maximize the profile loglikelihood (6) under the corresponding linear constraints. The profile loglikelihood (6) and its derivative, dℓ (q; x)/dqT = {−K1(q1; x), …, −KG(qG; x)}T, are easily calculated.
To obtain the pointwise constrained estimator Ŝg(t) (g = 1, …, G) for all t, it is not necessary to maximize the profile likelihood at every t. It can be seen that the pointwise constrained estimator may jump only at observed event times and at times just after observed censoring times. Let {X′j} be the union of all distinct times Ygi if Δgi > 0 and if Δgi = 0. Here, can be taken as Ygi + ∊ for a small ∊ > 0. We calculate Ŝg(X′j), and then Ŝg(t) is a step function with jumps only at X′j, i.e., Ŝg(t) = Ŝg(a), where a = max{X′j : X′j ⩽ t}.
The following theorem shows that Ŝg(t) (g = 1, …, G) is a valid survivor function.
Theorem 2. The pointwise constrained estimator Ŝg(t) obtained from maximizing the profile likelihood (6) is a nonincreasing function in t for each g = 1, …, G. That is, for any 0 ⩽ x < y ⩽ τg, Ŝg(x) ⩾ Ŝg(y).
Proof. See the Supplementary Material.
2.5. Generalized pool-adjacent-violators algorithm in the simple ordering case
Suppose that G survivor functions satisfy the simple stochastic ordering constraint T1 ⩾st ⋯ ⩾st TG, and we aim to estimate the pointwise constrained estimator at time x. A generalized pool-adjacent-violators algorithm can be used as developed in Best et al. (1999), because the profile loglikelihood is the sum of concave functions. The results of the generalized pool-adjacent-violators algorithm lead to a set of blocks, B1, …, Br, where r ⩾ 1, Br = {ur−1 + 1, …, ur} and 0 = u0 < ⋯ < ur = G. This is described in an algorithm in the Supplementary Material. The final estimate of the survivor function for each group in a block Br takes a common value exp(q̂r), where q̂r maximizes the profile loglikelihood, ℓBr (q; x) = ∑i∈Br ℓi (q; x) and 0 ⩾ q̂1 > ⋯ > q̂r.
3. Consistency and asymptotic distribution
Let (t) be the Kaplan–Meier estimator of Sg(t) and let (t) be the censoring survivor function for group g. Further, let τg = inf{t : Sg(t) (t) = 0} (g = 1, …, G). Under the condition that there are no common jumps between the event and censoring distributions, Stute & Wang (1993) showed that the Kaplan–Meier estimator (t) is uniformly consistent for Sg(t) on [0, τg). A similar result holds for the pointwise constrained estimator. The following theorem is proved in the Supplementary Material.
Theorem 3. Let Ŝg(t) be the pointwise constrained estimator given in Definition 1. Under the condition of no common jumps of Sg(t) and (t), supt<τg | Ŝg(t) − Sg(t) | → 0 with probability 1 as ng → ∞ (g = 1, …, G).
Let Wg(Vg) be a Brownian motion on [0, ∞) with variance function Vg(t). As shown in Gill (1983), ng1/2( − Sg) → W(Vg) in distribution on [0, τg] as ng → ∞ where . For a fixed time x, ng1/2{ (x) − Sg(x)} → N{0, (x)} in distribution, where (x) = Vg(x) (x).
Let and assume that limn→∞ ng/n = cg > 0 and let (x) = n1/2 { (x) − Sg(x)} (g = 1, …, G). Then { (x), …, (x)}T → {Z1(x), …, ZG(x)}T in distribution, where Zg(x) ∼ N{0, (x)/cg} and Z1(x), …, ZG(x) are independent.
Theorem 4. For a fixed time x < min{τk : Lg(x) ⩽ k ⩽ Ug(x)} and under the simple ordering constraint T1 ⩾st ⋯ ⩾st TG,
(7) |
in distribution, where wg(x) = cg/ (x), Lg(x) = min{i : Si(x) = Sg(x)} and Ug(x) = max{i : Si(x) = Sg(x)}.
Proof. See the Supplementary Material.
In the Supplementary Material, the asymptotic distribution of the pointwise constrained estimator is discussed for situations where the number at risk in some groups is zero.
Let Šg(x) be the estimate of Sg(x) by applying the isotonic regression algorithm to (x) with weights wg(x) (g = 1, …, G), subject to constraint S1(x) ⩾st ⋯ ⩾st SG(x). Then, Šg(x) has a minimax form (Barlow et al., 1972)
From El Barmi & Mukerjee (2005, Theorem 2), it can be seen that
in distribution. From (7), it follows that Ŝg(x) and Šg(x) are asymptotically equivalent. We hypothesize that this equivalence to isotonic regression will also hold under the partial-ordering constraint. This yields the following conjecture for the asymptotic distribution of the pointwise constrained estimator.
Conjecture 1. For a fixed time x,
in distribution as n → ∞ for all x given Sg(x) (x) > 0. Here Ψg(x) = {i : Si(x) = Sg(x)}, Eg(x) = {(i, j) ∈ E : i, j ∈ Ψg(x)} and fg(z1, …, zG; w1, …, wG, x) is the solution function for μg that minimizes subject to μi ⩾ μj for all (i, j) ∈ Eg(x).
If this conjecture is correct, inference methods developed for isotonic regression could also be useful for the pointwise constrained estimator.
4. Comparison with the Kaplan–Meier estimator when sample size is large
4.1. Simple ordering case
In the simple ordering case with no censoring, El Barmi & Mukerjee (2005) showed that their isotonic regression estimator has smaller asymptotic mean squared error than the unrestricted Kaplan–Meier estimator. A similar result holds for the pointwise constrained estimator compared with the Kaplan–Meier estimator when there is right censoring.
Theorem 5. Consider the simple ordering constraint T1 ⩾st ⋯ ⩾st TG. For a fixed x with (x)Sk(x) > 0 for all k = 1, …, G, let {Ŝk(x) − Sk(x)} → Ẑk and { (x) − Sk(x)} → Zk in distribution. If there exists at least one g′ with Sg′(x) = Sg(x), then E( ) < E( ). If no such g′ exists, then Ŝg(x) and (x) are asymptotically equivalent.
Thus, the pointwise constrained estimator has smaller asymptotic mean squared error than the Kaplan–Meier estimator. In fact, a stronger inequality relation holds. Namely pr(| Ẑg | ⩽ ∊) > pr(| Zg | ⩽ ∊) for all ∊ > 0. In § 4.2, we calculate the asymptotic bias and asymptotic mean squared error of the pointwise constrained estimator in the two-sample case.
4.2. The two sample case, G = 2
If S1(x) > S2(x), then, asymptotically, the constraint is irrelevant and {Ŝ1(x) − S1(x)} → σ1(x)Z̄1 and {Ŝ2(x) − S2(x)} → σ2(x) Z̄2 in distribution, as n1, n2 → ∞, where Z̄1 and Z̄2 are independent standard normal random variables.
Let n2/n1 → c as n1, n2 → ∞. We consider asymptotic properties when S1(x) = S2(x). From Theorem 4, we can show that
(8) |
in distribution, where c(x) = c (x)/ (x). Direct calculation from (8) shows that the asymptotic mean squared errors are
(9) |
These are always smaller than the unrestricted counterparts (x) and (x).
Let S̃1(x) and S̃2(x) be the estimators of Rojo (2004) or El Barmi & Mukerjee (2005). On the basis of definitions of their estimators, when S1(x) = S2(x), the asymptotic mean squared errors are given by
(10) |
It can be shown that the asymptotic mean squared error of Ŝg(x) is less than or equal to that of S̃g(x) (g = 1, 2), with equality only when (x) = (x), in which case S̃g(x) and Ŝg(x) are asymptotically equivalent. From (10) we see that when (x)/ (x) > c2/c1 + 2, Rojo’s estimator S̃1(x) is asymptotically less efficient than the Kaplan–Meier estimator (x) and when (x)/ (x) > c1/c2 + 2, S̃2(x) is asymptotically less efficient than (x).
From (8), the asymptotic biases of Ŝ1(x) and Ŝ2(x) are
5. Confidence intervals
5.1. Asymptotic approaches
While there is a substantial literature on the estimation of survivor functions under stochastic ordering constraints, there has been little discussion of constructing confidence intervals for ordered survivor functions. Rojo (2004) demonstrated weak convergence to a Gaussian process of his estimator, from which confidence bands could be constructed. For the most part, however, asymptotic results are not particularly useful since, if the true inequalities at time t are strict, then the asymptotic distribution of Ŝg(t) is the same as that of the Kaplan–Meier estimator and the corresponding approximate confidence interval would be unaffected by the restrictions. In our opinion, the most promising approach to constructing confidence intervals in these problems is through resampling methods that reflect the finite sample aspects. We consider some such approaches in the next section.
5.2. Bootstrap methods
We used a nonparametric resampling scheme, in which survival time and censoring indicator pairs are drawn with replacement from the data separately for each group. For each bootstrap sample, a bootstrap estimate (t) (b = 1, …, B) is obtained by applying the pointwise constrained estimator. Simple confidence intervals based on these bootstrap estimates can be constructed using the percentile or the basic bootstrap method (Efron & Tibshirani, 1993; Davison & Hinkley, 1997). For a nominal level of (1 − 2α), the percentile confidence interval for Sg(t) is { (t), (t)}, where (t) is αth percentile of the bootstrap distribution. The basic bootstrap method utilizes ideas of pivotal statistics, and can also be improved by use of transformations such as h(s) = arcsin(s1/2). The confidence interval for the basic bootstrap method is given by (h−1[2h{Ŝg(t)} − h{ (t)}], h−1[2h{Ŝg(t)} − h{ (t)}]).
While these simple methods are easy to apply, a number of different methods have been developed which have improved properties. The work in Andrews (2000) suggests that the use of the bootstrap for inference problems with order restrictions on the parameters may be particularly challenging. We investigated a number of different alternatives to the two simple bootstrap methods and present below a method which had reasonably good properties for the cases considered.
For the restricted estimation problem, the distribution of Ŝg(t) − Sg(t) will generally not be symmetric or centred around zero and will differ from one group g to the next. It is to be expected that the bootstrap distribution (t) − Ŝg(t) will be similarly biased. The method we propose uses the bootstrap distribution to correct the bias, but is adjusted so as not to over-correct. Consider pointwise estimators Ŝ1(t) and Ŝ2(t), where S1(t) ⩾ S2(t). Let (t) be the mean of the bootstrap estimates (t). The basic bootstrap method considers a pseudo estimator given by [2h{Ŝg(t)} − h{ (t)}]. While , the mean of the pseudo estimator may not satisfy the order constraint, i.e., it is possible that 2h{Ŝ1(t)} − h{ (t)} < 2h{Ŝ2(t)} − h{ (t)}. This can be considered as an overcorrection and it might be expected that the properties of the confidence interval could be improved if this overcorrection is modified. Let (t, ag) = h{Ŝg(t)} + ag[h{Ŝg(t)} − h{ (t)}], where 0 ⩽ ag ⩽ 1. Although (t, ag) will satisfy the ordering constraint for ag = 0, for ag = 1, this may not be true. Given a set of ag such that the (t, ag) (g = 1, …, G) satisfy the ordering constraints, the proposed adjusted basic bootstrap confidence interval is (h−1[2h{Ŝg(t)} − h{ (t)} + δg], h−1[2h{Ŝg(t)} − h{ (t)} + δg]), where δg = (ag − 1)[h{Ŝg(t)} − h{ (t)}].
We propose the following method to obtain a set of ag that satisfy the constraints. Let a1 = a2 = ⋯ = aG = a and find the largest a that does not result in a violation of an order restriction. Use this value of a for the groups i and j for which (t, ai) = (t, aj). For the remaining groups increase a until a new violation is about to occur, and use the new value of a for the groups that have the active constraint for (t, a) and have not already had a fixed value of ai. Continue in this way, gradually increasing a, using the value of a when constraints become active, until all values of ag have been set or a = 1. An algorithm to obtain ag (g = 1, …, G) is given in the Supplementary Material.
5.3. Confidence interval centred on a constrained estimator
Hwang & Peddada (1994) suggested a method in which a confidence interval is computed for the unrestricted estimator and then shifted and centred on the constrained estimator. They showed that, under fairly general conditions, the coverage probability for the shifted interval will exceed the nominal level. For the survivor function, we apply this to intervals on a log transformed scale and consider the approximate 100(1 − 2α)% confidence interval, Ŝg(x) exp{±zα (x)}, where (x) is the standard error estimate of log (x) (Kalbfleisch & Prentice 2002, p. 17), and zα is the αth percentile of the standard normal distribution.
6. Simulation studies
6.1. Two-sample case when sample size is small
We have conducted numerous simulation studies to compare the finite sample properties of three different constrained estimators, Rojo’s estimator (Rojo, 2004), the constrained nonparametric maximum likelihood estimator (Park et al., 2012) and the pointwise constrained estimator, and compared them to the unconstrained Kaplan–Meier estimator in the two-sample case. In this paper, we show results for scenarios where G = 2 and S1(t) ⩾ S2(t) for all t.
The upper and lower plots of each panel in Fig. 1 show differences of root mean squared errors of estimators of S1(t) and S2(t) over a range of values of t, compared with the pointwise constrained estimator. In cases with the same censoring distributions, Fig 1(a), Rojo’s estimator and the pointwise constrained estimator have smaller root mean squared error than the other estimators. However, if populations 1 and 2 have different censoring distributions, the pointwise constrained estimator has smallest root mean squared error among all estimators at almost all times. Rojo’s estimator does not adjust well to the unequal censoring distributions, Figs. 1(b)–(f), even when the censoring rates are close to each other, Fig. 1(d). The pointwise constrained estimator is the only estimator that dominates the Kaplan–Meier estimator at almost all times in all situations considered. Each simulation is based on 10 000 replications.
6.2. Two-sample case: asymptotic properties
We define the asymptotic relative efficiency as the inverse ratio of the mean squared errors and compare the asymptotic relative efficiency of the three constrained estimators to the Kaplan–Meier estimator in the two sample case in Fig. 2. The underlying distributions are S1(t) = S2(t) = exp(−t), (t) = 1 and (t) = exp(−2t). The constraint is asymptotically relevant at all times. We set limn1,n2→∞ n1/n2 = 1. The asymptotic relative efficiency of the full constrained nonparametric maximum likelihood estimator is based on simulated data with a very large sample size. Asymptotic relative efficiencies of the pointwise constrained estimator and Rojo’s estimator are calculated using (9) and (10).
The pointwise constrained estimator dominates all other estimators for all t, whereas Rojo’s estimator could be inefficient for some t, as seen in Fig. 2(a). Compared with the Kaplan–Meier estimator, the full constrained nonparametric maximum likelihood estimator is less efficient at all times in this setting.
6.3. Simple ordering case
In this section, we compare finite sample properties of the pointwise constrained estimator with the Kaplan–Meier estimator in the simple ordering case and investigate the confidence intervals described in § 5. We consider three groups with underlying distributions T1 ∼ exp(1), T2 ∼ exp(1.1) and T3 ∼ exp(1.4) and a uniform censoring distribution C ∼ U(0, 4.3), which gives an overall censoring rate of about 20%. Sample sizes are n1 = n3 = 40 and n2 = 20. The simulation is based on 10 000 replicates.
Figure 3 shows the mean squared error of the pointwise constrained estimator and the Kaplan–Meier estimator. The figure shows efficiency gains for the pointwise constrained estimator at all times for all groups, with the largest gains for the estimation of S2(t), where the mean squared error of the pointwise constrained estimator is less than half of the mean squared error of the Kaplan–Meier estimator at almost all times.
Bootstrap intervals are based on 1999 bootstrap estimates. We evaluate confidence intervals at time 0.26 and 0.63, where the survival rates of group 2 are 0.75 and 0.5, respectively. In addition, we also conducted a simulation study for additional two cases with different distributions; see Table 1. The coverage rates and average widths of the confidence intervals described in § 5 are shown in Table 1. As expected, the confidence interval centred on the pointwise constrained estimator, Ŝg exp(±1.96 ), is overly conservative with large average width and has higher coverage rate. The bootstrap methods give confidence intervals with significantly reduced widths, but the coverage rates can be somewhat low for some groups, especially when using the percentile or the basic bootstrap methods. The transformation and the adjusted methods described in § 5 both give slightly better coverage rates. The overall best results are obtained with the combination of the basic bootstrap with arcsin(s1/2) transformation and controlling for bias overcorrection.
Table 1.
t = 0.26 | t = 0.63 | |||||
---|---|---|---|---|---|---|
Distribution | exp(1) | exp(1.1) | exp(1.4) | exp(1) | exp(1.1) | exp(1.4) |
Percentile | 91 (20.8) | 95 (22.9) | 95 (24.9) | 94 (27.2) | 95 (28.2) | 94 (27.4) |
Basic | 89 (20.8) | 90 (22.9) | 91 (24.9) | 91 (27.2) | 88 (28.2) | 90 (27.4) |
With adjustment | 90 (20.8) | 92 (22.9) | 93 (24.9) | 92 (27.2) | 91 (28.2) | 91 (27.4) |
arcsin(s1/2) | 94 (22.5) | 93 (23.1) | 93 (24.4) | 93 (27.3) | 90 (28.0) | 92 (27.8) |
With adjustment | 95 (22.4) | 94 (23.2) | 94 (24.5) | 94 (27.3) | 93 (28.1) | 94 (27.8) |
Ŝg exp{± 1.96 } | 92 (26.7) | 98 (37.8) | 97 (28.3) | 95 (33.6) | 99 (46.5) | 97 (31.7) |
Distribution | exp(1) | exp(1.05) | exp(1.2) | exp(1) | exp(1.05) | exp(1.2) |
Percentile | 90 (20.2) | 96 (21.5) | 94 (23.4) | 92 (26.2) | 96 (26.5) | 94 (26.6) |
Basic | 90 (20.2) | 92 (21.5) | 93 (23.4) | 92 (26.2) | 91 (26.5) | 92 (26.6) |
With adjustment | 91 (20.2) | 94 (21.5) | 94 (23.4) | 92 (26.2) | 93 (26.5) | 93 (26.6) |
arcsin(s1/2) | 95 (21.9) | 94 (21.8) | 94 (22.7) | 94 (26.4) | 93 (26.4) | 94 (26.8) |
With adjustment | 96 (21.7) | 95 (21.9) | 95 (22.9) | 95 (26.4) | 95 (26.4) | 95 (26.8) |
Ŝg exp{± 1.96 } | 93 (26.9) | 98 (37.6) | 98 (27.1) | 95 (34) | 99 (47.2) | 98 (31.5) |
Distribution | exp(1) | exp(1.2) | exp(1.6) | exp(1) | exp(1.2) | exp(1.6) |
Percentile | 92 (21.6) | 95 (24.6) | 95 (26.2) | 94 (28.1) | 95 (29.8) | 94 (27.5) |
Basic | 90 (21.6) | 87 (24.6) | 90 (26.2) | 91 (28.1) | 86 (29.8) | 89 (27.5) |
With adjustment | 91 (21.6) | 90 (24.6) | 92 (26.2) | 92 (28.1) | 89 (29.8) | 91 (27.5) |
arcsin (s1/2) | 93 (23.2) | 90 (24.7) | 92 (25.8) | 93 (28.2) | 88 (29.6) | 92 (28.0) |
With adjustment | 95 (23.1) | 93 (24.8) | 94 (25.9) | 94 (28.2) | 92 (29.7) | 94 (28.0) |
Ŝg exp{± 1.96 } | 93 (26.6) | 98 (38.9) | 96 (29.4) | 95 (33.2) | 99 (47.0) | 96 (31.3) |
Sample sizes are n1 = 40, n2 = 20 and n3 = 40 and censoring distribution is Un(0, 4.3). The five bootstrap confidence intervals are the percentile method, and the basic bootstrap method with or without arcsin(s1/2) transformation and with or without an adjustment for bias overcorrection. Ŝg exp{±1.96 } is the centred method of Hwang & Peddada (1994). Results are based on 10 000 simulation samples.
7. Example
The data are from prostate cancer patients who received radiation therapy at the University of Michigan Hospital, a portion of the data used in Proust-Lima & Taylor (2009). Five hundred and three patients without planned hormonal therapy are used to estimate the survivor function of time to first recurrence of prostate cancer. For this analysis, recurrence is defined as the first of local recurrence, distant metastasis or initiation of salvage hormone therapy.
It is expected that patients with higher baseline prostate-specific antigen levels have a higher recurrence rate than those with lower baseline prostate-specific antigen values. The Gleason grade is a measure of the aggressiveness of the tumour cells obtained from microscopic inspection of a biopsy prior to the treatment. It is also expected that patients with a lower Gleason grade will have a lower recurrence rate. In this example, we divided the patients into six groups labelled A1, A2, A3, B1, B2 and B3 based on whether or not their baseline prostate-specific antigen is less than 10, and whether their Gleason grade is ⩽6, =7 or ⩾8. Patients with baseline prostate-specific antigen <10 and Gleason ⩽6 are labelled as A1, patients with baseline prostate-specific antigen <10 and Gleason =7 as A2 etc. The natural set of constraints for the survivor functions are A1 ⩾ A2 ⩾ A3, B1 ⩾ B2 ⩾ B3, A1 ⩾ B1, A2 ⩾ B2 and A3 ⩾ B3.
The Kaplan–Meier estimates of each groups are shown in Fig. 4(a). The unrestricted Kaplan–Meier estimates do not satisfy the stochastic ordering constraints. Specifically, between 1 and 2.5 years, the groups A2, B2 and B3 do not satisfy the ordering constraints and after 5 years the orderings of A2 and A3, and B2 and B3 are incorrect.
The pointwise constrained estimates, shown in Fig. 4(b), satisfy the stochastic ordering constraints at all times. Between 1 and 2.5 years, the survivor functions take a common value in groups A2, B2 and B3 and after 5 years, groups A2 and A3 and groups B2 and B3 have common estimates. At around 12.5 years, there is a jump in the survivor function estimate for groups B2 and B3, even though there are no observed events at that time. This happens because the number of individuals at risk in the stochastically smaller group B3 at time t = 12.5 changes, which results in ĥδ(t) < 0, as discussed in § 2.3.
Detailed results of point estimates and corresponding confidence intervals for some selected times are shown in Table 2.
Table 2.
Time (years) | 1.5 | 5 | 8 | |
---|---|---|---|---|
A1 | Kaplan–Meier estimator | 99.4 (97.6, 100) | 93.9 (90.0, 97.2) | 83.6 (76.4, 91.1) |
Pointwise constrained estimator | 99.4 (97.6, 99.9) | 93.9 (90.0, 97.2) | 83.6 (76.4, 90.2) | |
A2 | Kaplan–Meier estimator | 99.1 (96.6, 99.9) | 83.4 (75.6, 90.2) | 73.0 (63.2, 82.3) |
Pointwise constrained estimator | 99.1 (96.5, 99.9) | 83.4 (76.6, 89.9) | 73.0 (64.0, 83.0) | |
A3 | Kaplan–Meier estimator | 80.0 (36.0, 98.0) | 70.0 (44.8, 92.7) | 70.0 (44.8, 100) |
Pointwise constrained estimator | 88.7 (70.9, 94.9) | 70.0 (53.2, 91.6) | 70.0 (52.6, 93.6) | |
B1 | Kaplan–Meier estimator | 98.0 (92.2, 99.7) | 78.3 (71.2, 86.1) | 67.0 (57.8, 77.8) |
Pointwise constrained estimator | 98.0 (95.2, 99.7) | 78.3 (71.2, 86.1) | 67.0 (57.8, 77.8) | |
B2 | Kaplan–Meier estimator | 86.8 (79.6, 93.6) | 48.8 (38.9, 62.2) | 34.2 (22.9, 52.6) |
Pointwise constrained estimator | 88.7 (83.0, 93.9) | 48.8 (39.6, 59.9) | 39.8 (30.3, 54.0) | |
B3 | Kaplan–Meier estimator | 96.4 (86.2, 99.8) | 47.9 (33.4, 67.5) | 47.9 (33.5, 69.4) |
Pointwise constrained estimator | 88.7 (83.5, 93.8) | 47.9 (38.4, 69.2) | 39.8 (29.9, 54.2) |
Nominal 95% bootstrap confidence intervals using arcsin(s1/2) transformation and controlling bias overcorrection are shown in parentheses.
8. Discussion
The pointwise constrained estimator is a likelihood based pointwise estimator. Unlike the full constrained nonparametric maximum likelihood estimator, the violation of a constraint at one time does not affect the estimates at other times. The pointwise constrained estimator gives a common estimate based on maximizing the likelihood when the constraints are violated and compared with other estimators that use averaging based on initial sample sizes (Rojo, 2004; El Barmi & Mukerjee, 2005), it has better properties when censoring exists.
When there is no censoring, Rojo’s estimator in the two-sample case and El Barmi and Mukerjee’s estimator in the simple ordering case are identical to the pointwise constrained estimator. However, if censoring exists, these estimators can be quite different, especially when the censoring distributions differ significantly between groups. Another feature of El Barmi and Mukerjee’s estimator is the range of times for which the estimator is defined. Specifically, it is defined only until the minimum of the times of the last observations in all groups. Thus, if the last observed time in one group is much earlier than in other groups, then estimates in all other groups are undefined at subsequent times even though there may be a large number of observations at risk. On the other hand, the pointwise constrained estimator for a group is defined up to the last observed time of that group.
The pointwise constrained estimator can have jumps at nonevent times. Thus, the likelihood ratio statistics of the restricted survivor function, first introduced by Thomas & Grunkemeier (1975) and discussed by Li (1995) and Murphy (1995) are not exactly correct, because they assume the jumps can only occur at event times. Thus, the likelihood ratio test and confidence interval based on the likelihood ratio test may need to be revised.
Methods to construct confidence intervals in order restricted problems are not well developed. Bootstrap methods generally work better when the distributions are approximately normal after some transformations. When constraints are present, it is not clear whether there exists any such transformation. We proposed a method to control overcorrection of bias when using the basic bootstrap methods and found improved properties of confidence intervals. Further investigation of this approach on other applications could be useful.
Acknowledgments
This research was partially supported by the National Institutes of Health, U.S.A. We thank the editor, associate editor and two referees for their many valuable comments.
Appendix.
Proof of Theorem 1. Let λ1 and λ2 be Lagrange multipliers. The corresponding Lagrangian function is
The Karush–Kuhn–Tucker conditions that must be satisfied at the solution ĥ are:
(A1) |
(A2) |
(A3) |
(A4) |
(A5) |
(A6) |
(A7) |
From (A1), we have ĥ1 (Xi) = log{1 − di/(ni + λ̂1)} for i ⩽ M1(x). Either λ̂2 = 0 or ĥδ (x) = 0 from (A6). If λ̂2 = 0, then λ̂1 = −N(x) from (A3), which is only valid when ; otherwise ĥδ (x) = 0 and λ̂1 is the solution of the equation, , from (A4), which is only valid when λ̂1 ⩾ − N(x) from (A3). Since is an increasing function in k, we can see that λ̂1 = max{k̂, −N(x)}, where k̂ is the solution of the equation . It follows that λ̂1 is exactly the same as K (q; x) defined in Theorem 1. Therefore, the unique solution from solving (A2)–(A7) is as given in Equation (4).
Proof of Lemma 1. We consider separately two cases where K (q; x) > −N(x) and K (q; x) = −N(x).
If K (q; x) > −N(x), then ĥδ (x) = 0 and ĥ(q; Xi) = log(1 − di/ni) for i > M(x), which does not depend on q. For any i ⩽ M(x),
Thus,
If K (q; x) = −N(x), then ĥ(q; Xi) = log[1 − di/{ni − N(x)}] for i ⩾ M(x) and ĥ(q; Xi) = log(1 − di/ni) for i > M(x) are not functions of q. It follows that
Supplementary material
References
- Andrews DWK. Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica. 2000;68:399–405. [Google Scholar]
- Barlow RE, Bartholomew D, Bremner JM, Brunk HD. Statistical Inference under Order Restrictions. New York: Wiley; 1972. [Google Scholar]
- Best MJ, Chakravarti N, Ubhaya VA. Minimizing separable convex functions subject to simple chain constraints. SIAM J Optimiz. 1999;10:658–72. [Google Scholar]
- Brunk HD, Franck WE, Hanson DL, Hogg RV. Maximum likelihood estimation of the distributions of two stochastically ordered random variables. J Am Statist Assoc. 1966;61:1067–80. [Google Scholar]
- Davison AC, Hinkley DV. Bootstrap Methods and their Application. Cambridge: Cambridge University Press; 1997. [Google Scholar]
- Dykstra RL. Maximum likelihood estimations of the survival functions of stochastically ordered random variables. J Am Statist Assoc. 1982;77:621–8. [Google Scholar]
- Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman and Hall; 1993. [Google Scholar]
- El Barmi H, Mukerjee H. Inferences under a stochastic ordering constraint: the k-sample case. J Am Statist Assoc. 2005;100:252–61. [Google Scholar]
- Gill R. Large sample behaviour of the product-limit estimator on the whole line. Ann Statist. 1983;11:49–58. [Google Scholar]
- Hoff PD. Nonparametric estimation of convex models via mixtures. Ann Statist. 2003;31:174–200. [Google Scholar]
- Hwang JTG, Peddada SD. Confidence interval estimation subject to order restrictions. Ann Statist. 1994;22:67–93. [Google Scholar]
- Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley; 2002. [Google Scholar]
- Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Statist Assoc. 1958;53:457–81. [Google Scholar]
- Kuhn HW, Tucker AW. Nonlinear programming. In: Neyman J, editor. Proc 2nd Berkeley Symp. Berkeley: University of California Press; 1951. pp. 481–92. [Google Scholar]
- Lehmann EL. Ordered families of distributions. Ann Math Statist. 1955;26:399–419. [Google Scholar]
- Li G. On nonparametric likelihood ratio estimation of survival probabilities for censored data. Ann Statist. 1995;25:95–104. [Google Scholar]
- Lim J, Kim SJ, Wang X. Estimating stochastically ordered survival functions via geometric programming. J Comp Graph Statist. 2009;18:978–94. [Google Scholar]
- Lo SH. Estimation of distribution functions under order restrictions. Statist Dec. 1987;5:251–62. [Google Scholar]
- Murphy SA. Likelihood ratio-based confidence intervals in survival analysis. J Am Statist Assoc. 1995;90:1399–405. [Google Scholar]
- Park Y, Kalbfleisch JD, Taylor JMG. Constrained nonparametric maximum likelihood estimation of stochastically ordered survivor functions. Can J Statist. 2012;40:22–39. doi: 10.1093/biomet/ass006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Proust-Lima C, Taylor JMG. Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach. Biostatistics. 2009;10:535–49. doi: 10.1093/biostatistics/kxp009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rojo J. On the estimation of survival functions under a stochastic order constraint. In: Perez-Abreu V, Rojo J, editors. The First Erich L Lehmann Symposium: Optimality. Vol. 44. Beachwood, OH: Institute of Mathematical Statistics; 2004. pp. 37–61. [Google Scholar]
- Rojo J, Ma Z. On the estimation of stochastically ordered survival functions. J Statist Comp Simul. 1996;55:1–21. [Google Scholar]
- Stute W, Wang J-L. The strong law under random censorship. Ann Statist. 1993;21:1591–607. [Google Scholar]
- Thomas DR, Grunkemeier GL. Confidence interval estimation of survival probabilities for censored data. J Am Statist Assoc. 1975;70:865–71. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.