Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 Jan 27.
Published in final edited form as: Stat Methods Med Res. 2016 Mar 16;27(2):384–397. doi: 10.1177/0962280216630342

A robust semi-parametric warping estimator of the survivor function with an application to two-group comparisons

Alan D Hutson 1
PMCID: PMC5786501  NIHMSID: NIHMS933108  PMID: 26988931

Abstract

In this note, we develop a new and novel semi-parametric estimator of the survival curve that is comparable to the product-limit estimator under very relaxed assumptions. The estimator is based on a beta parametrization that warps the empirical distribution of the observed censored and uncensored data. The parameters are obtained using a pseudo-maximum likelihood approach adjusting the survival curve accounting for the censored observations. In the univariate setting, the new estimator tends to better extend the range of the survival estimation given a high degree of censoring. However, the key feature of this paper is that we develop a new two-group semi-parametric exact permutation test for comparing survival curves that is generally superior to the classic log-rank and Wilcoxon tests and provides the best global power across a variety of alternatives. The new test is readily extended to the k group setting.

Keywords: Product-limit estimator, log-rank test, Wilcoxon-rank sum test, censored data

1 Introduction

The product-limit estimator1 of the survival function is one of the most popular and well-studied methods for estimating the survival distribution. The original paper1 is one of the most cited statistical papers of all time.2 Its deficiencies are also well-documented, particular as it pertains to a large proportion of extreme observations being right censored.3,4 Even so, the product limit estimator has great utility as a descriptor of data with support on the positive real line given a right censoring mechanism and reduces nicely to the classic empirical distribution function estimator given no censoring.

The convergence properties of the product-limit estimator are technically difficult to derive, but have been thoroughly investigated, e.g. see Chen and Lo5 who proves the product-limit estimator converges in probability to the population survival function under certain conditions within the range of the observed data. In this note we will introduce a new and novel semi-parametric “warping” estimator of survival function, which mimics very closely the behavior of the product-limit estimator and serves as a basis for a new method for estimating the survival function and generating a new two-group test for comparing two survival curves.

As background in terms of outlining the derivation of the product limit estimator of the survival function we start with standard notation. Towards this end, let X1, X2, …, Xn denote i.i.d. failure times and let C1, C2, …, Cn denote the corresponding i.i.d. noninformative right censoring times, i = 1, 2, …, n. Given right censoring we only observe gn of the X’s. Now let 0 < x(1) < x(2) < ⋯x(g) be the distinct (no ties) ordered observed failure times. The classic maximum likelihood based derivation of the product-limit estimator starts by assuming the underlying distribution is discrete with probabilities πj = P(X = x(j)), j = 1, 2, …, g for gn. Given a discrete hazard of hj = P(X = x(j)|Xx(j)) for 0 ≤ hj ≤ 1 we have that π1 = h1, π2 = (1 − h1)h2, ⋯, πg = (1 − h1) (1 – h2)⋯ (1 − hg−1)hg. Then the estimator of S(x)=P(X>x)=x(j)x(1hj) in the discrete case is given as

Sˆ(x)=x(j)x(1hˆj) (1)

where the estimates of the discrete hazard parameters follow from the likelihood6

logL=j=1gdjloghj+(rjdj)log(1hj) (2)

where dj denotes the number of events and rj denotes the number at risk at time x(j), j = 1, 2, ⋯, g. The maximization of (2) with respect to the parameters hj, j = 1, 2, ⋯, g yields the estimates hˆj=dj/rj, where rj denotes the number of subjects at risk at time x(j) and dj denotes the number of subjects who fail at time x(j) For a technical treatment with respect to the behavior of the product-limit estimator and how it translates to continuous case see Chen and Lo.5 It is well-known, but not immediately obvious, that the product-limit estimator reduces to the classic empirical estimator of the survival function Sˆ(x)=1i=1nI(x(i)x)/n when there are no censored observations, where I(⋅) denotes the indicator function. We will be comparing our new estimator to that of (2) throughout this note.

Two common methods for comparing survival curves that go hand-in-hand with the product-limit estimator are the ubiquitous log-rank test7 and the generalization of the Wilcoxon rank-sum test.8 The log-rank test is fully efficient against alternatives in which the hazard rates are proportional across time. The generalized Wilcoxon test weighs comparisons of the survival curve at an earlier point in time. In this note we develop a new two-group rank based comparison based on our semi-parametric warped estimator of the quantile function and show that in many common scenarios it outperforms the classic tests in terms of relative efficiency and hence could reduce the cost of carrying forth large-scale phase III trials, which incorporate a time-to-event endpoint.

In section 2, we develop the new warped univariate survival function estimator. In section 3, we provide a small simulation study comparing the product-limit estimator to the new estimator. In section 4, we provide a basic data example using a well-known dataset. In section 5, we develop an exact permutation test and its corresponding approximation based on pseudo-likelihood methods for comparing two survival curves. In section 6, we provide a two-group example. Finally, in section 7, we provide a simulation study for the new test and compare it to the classic log-rank and Wilcoxon tests most commonly found in statistical software packages.

2 Survival function estimator

Prior to developing our method we first define the common survival analysis notation. As with the product-limit estimator let X1, X2, ⋯, Xn denote the i.i.d. absolutely continuous failure times and let C1, C2, ⋯, Cn denote the corresponding i.i.d. absolutely continuous noninformative right censoring times such that we observe Ti = min(Xi, Ci) and censoring indicator δi=I(Xi<Ci), i = 1, 2, ⋯, n. Within the context of the methods of this section we assume X and C have support over the entire positive real line. It is well known that the survival function for T is equal to the product of the survival functions for X and C, and given as

ST(t)=SX(t)SC(t) (3)

where the corresponding distribution functions are denoted as FT(t) = 1 − ST(t), FX(t) = 1 – SX(t) and FC(t) = 1 – SC(t), respectively. Hence in one sense the distance between ST(t) and SX(t), and FT(t) and FX(t), is dictated by censoring survivor function SC(t) (or FC), i.e. as the P(SX(t) < SC(t)) increases the closer ST(t) and SX(t), and FT(t) and FX(t), are in terms of absolute distances over the range of t. As is well known, only the Ti’s are completely observed in the standard survival analysis framework as compared to the Xi’s unless there are no censored observations (or vice versa with the Ci’s), which is the basis of our approach, i.e. we can use the observed Ti’s to estimate SX(t) directly and accurately, and is there any benefit for taking such an approach given the known estimation methods already in existence such as the product-limit estimator. The foundation of our approach is that theoretically, for a known distribution function FT(t), we should be able to warp FT(t) such that

FX(t)g(FT(t)) (4)

for some function g under some basic assumptions.

In terms of the choice for g there are several candidates that maintain the mapping of the (0,1) to (0,1) probability space in a monotone continuous one-to-one fashion. For our purpose we chose the beta distribution function g(x)=Bp,q(x)=0xtp1(1t)q1/β(p,q)dt, where βp,q(x)=01tp1(1t)q1dt. In this context, for known FT(t), we can define a class of parametric distribution and survival functions as follows

FX(t)=Bp,q[FT(t)] (5)
SX(t)=1Bp,q[FT(t)] (6)

with corresponding density and quantile functions given as

fX(t)=bp,q[FT(t)]fT(t) (7)
PX(u)=QT[Bp,q1(u)] (8)

where bp,q (⋅) denotes a beta density, Bp,q1(u) denotes the beta distribution quantile function, and QT(u)=FT1(u), 0 < u < 1. The parameters of interest that are to be estimated are p and q. The case of no censored observations corresponds to p = 1 and q = 1.

As an example used to illustrate our foundational concept, we simulated pairs of X’s and C’s from sample sizes n = 10, 100, 1000, 10,000 each from independent standard exponential distribution functions, where SX(t) = SC(t) = exp(−t), which yields a known ST(t) = exp(−2t) as the underlying truth. Suppose we modeled the data assuming in a standard fashion SX(t) = exp(−θt) using standard maximum likelihood techniques versus warping an assumed known distribution function for T given as FT(t) = 1 − exp(−2t) with density fT(t) = 2 exp (−2t) such that the warping distribution and density functions for X using the definitions above are given as FX(t) = Bp,q(1 − exp(−2t)) and fX(t) = 2bp,q(1 − exp(−2t))exp(−2t), respectively. We then can estimate p and q using maximum likelihood using the likelihood function

L=i=1nfX(ti)δi(1FX(ti))1δi (9)
=i=1n{bp,q[FT(ti)]fT(ti)}δi(1Bp,q[FT(ti)])1δi (10)

For this scenario, with 1000 Monte Carlo simulations per sample size, we calculated the average maximum absolute difference and simulation standard error between SˆX(t)=exp(θˆt) and SˆX(t)=1Bpˆ,qˆ(1exp(2t)) for t = 0 to 2 by 0.01. The results are presented in Table 1. We can see heuristically that the two distributions are converging over the range of t we selected as the sample size increases. This is a typical result for this model across various parametric families. Philosophically, one does not know which model may be driving the underlying stochastic process. It is certainly reasonable to consider the functional parametric form of the density gX(t) = 2bp,q(1−exp(−2t)) exp(−2t) as a reasonable choice among many choices of parametric models. However, this point is not the overarching goal of this note, where in this model can be modified to develop a strong semi-parametric approximation approach for general and flexible modeling of a survival process as given below.

Table 1.

Simulation results: Average absolute maximum difference between the two parametric survival curve estimators for t = 0 to 2 by 0.01.

n Mean ± Std Error
10 0.086 ± 0.079
100 0.020 ± 0.016
1000 0.006 ± 0.004
10,000 0.001 ± 0.001

Semi-parametric warped survival model

Instead of assuming a known continuous parametric form for FT(t) as in equations (5) to (8) we now estimate FT(t) using a rescaled version of the empirical distribution function estimator given as FˆT(t)=i=1nI(tti)/(n+1), which maintains the same large sample properties as the classic empirical distribution function estimator. We also utilize fˆT(t)=1/n in a standard fashion. Rescaling FˆT(t) avoids the difficulties of applying it as the argument within the beta density and distribution functions as part of our pseudo-likelihood estimation procedure, described below, relative to obtaining estimates for p and q. In the trivial case of no censoring this approach results in distribution and survival function estimators with a 1/(n + 1) step between ordered observations, corresponding to the expected value between successive ordered uniform (0,1) observations.

Now we have our semi-parametric forms for the estimators of the distribution, survival and density functions given as

FˆX(t)=Bpˆ,qˆ(FˆT(t)) (11)
SˆX(t)=1Bpˆ,qˆ(FˆT(t)) (12)
fˆX(t)=bpˆ,qˆ(FˆT(t))/n (13)

respectively, where as mentioned earlier Bp,q(⋅) denotes a beta distribution function and bp,q(⋅) denotes a beta density. We will illustrate that this model behaves quite similarly to the product-limit estimator when the censoring distribution has positive support. Some modifications are necessary for shifted censoring distributions and are given later in this section. The next step is to outline how to obtain estimates for p and q.

Pseudo-likelihood estimator for the parameters p and q

In essence, we have a model for the rankits of X through T accounting for censoring versus using the raw data, e.g. suppose T = (3, 6, 4, 2*), where * denotes a censored observation, then the corresponding transformed data is given by the uniform rankits through FˆT(t) as 2/5, 4/5, 3/5, 1*/5. Basically, we are fitting a beta density on the rankits accounting for censored observations and that is why we term this model a semi-parametric type model. The pseudo-likelihood function for our semi-parametric rankit-based approach therefore has the form

L=i=1nbp,q[FˆT(ti)]δi(1Bp,q[FˆT(ti)])1δi (14)

since fˆT(ti)=1/n is constant over all i. The maximization of L at (14) or log L relative to finding pˆ and qˆ follows as per standard maximum likelihood methods. Standard numerical software routines such as SAS PROC NLMIXED (SAS Institute, Cary, NC) may be used for the purpose of finding the values for pˆ and qˆ. In general, we recommend a grid search of a range of starting values corresponding to p and q. The properties of the pseudo-likelihood estimators are given as follows.

Theorem 1

As n → ∞, n(pˆp,qˆq) has a centered bivariate normal distribution with variance–covariance matrix B−1ΣB−1, where B is the standard maximum likelihood based information matrix associated with the standard beta, bp,q, density and Σ is the variance–covariance matrix of a two-dimensional random vector whose components are given by

δ1log{bp,q[FT(T1)]}+(1δ1)log{(1Bp,q[FT(T1)])}/p+Wp(T1) (15)
δ1log{bp,q[FT(T1)]}+(1δ1)log{(1Bp,q[FT(T1)])}/q+Wq(T1) (16)

where δ1=I(X1<C1) and

Wp(T1)=I(F(X1)<u)2pulog(bp,q(u))dBp,q(u) (17)
Wq(T1)=I(F(X1)<u)2qulog(bp,q(u))dBp,q(u) (18)

Proof

The technical details have been worked out in an elegant fashion for the case of a semi-parametric copula model with marginal distribution functions estimated by the empirical distribution function estimator.9 The result in Theorem 1 follows directly from the theoretical developments used in the copula approach in section 4 of the copula paper9 by simply replacing the multivariate copula function with the univariate beta density, which is essentially a special case of the higher dimension copula model. Estimates of the variance-covariance matrix B−1ΣB−1 are not as straightforward to obtain and we recommend bootstrap resampling for this purpose.

Note that pseudo-likelihood approach is not that dissimilar to the likelihood approach to estimation for the product-limit estimator when considering the alternative approach to that given at (2). If we denote the ordered observed Ti’s as t(1) < t(2) < ⋯ < t(n) and let πi=P(Xt(i)|δi=1)P(Xt(i)|δi=1). The likelihood for the product-limit estimator is rank based in the T’s and can be alternative expressed as

L=i=1n1πiδi(1j=1iπj)1δi×(1j=1n1πj) (19)

In fact the product-limit estimator and the semi-parametric warped estimator behave very similarly, which should not be that surprising given the functional forms of the likelihoods above. However, the semi-parametric warped based estimator will have some advantages over the product-limit estimator in terms of inference and in the face of heavy censoring proportions.

Example

As an illustration of the new semi-parametric warp estimation method we simulated n = 1000 pairs of failure and censoring times with X ~ exp(1) and C ~ exp(1/2). From equation (14) we obtained pˆ=0.96 and qˆ=0.34. We plotted the survival curve estimators for the semi-parametric warped estimator overlayed with the product-limit estimator and the true underlying survival function at S(t(i))=exp(t(i)), i = 1, 2, ⋯, 1000, where t(i) denotes the ith ordered observation. The plot depicted in Figure 1 provides a glimpse into the large sample behavior of the estimator being similar to the product-limit estimator. Careful inspection of the tail shows that the last value for Sˆ(t) is at t = 1.74, while the semi-parametric warp estimator has a value for Sˆ(t) over the range of the t’s at t = 2.3. This gives a glimpse in terms of the potential advantage of the new estimator in the face of heavy censoring in terms of extending the estimator a bit further of the range of T. The reason for this behavior is that the new estimator is based on a transformation of a proper empirical distribution function FˆT. Hence, there will always be corresponding values for SˆX(t) even if a proportion of the last observations are censored as compared to the product limit estimator. The larger the number of censored observations at the tail of the distribution the larger the range of estimates for SˆX(t) are available for the semi-parametric warped estimator as compared to the product limit estimator. As a simple illustration, let our contrived data set consist of observed times 1+,2,3,4+,5+ with “+” denoting a right-censored observation. Then the product limit estimator has estimates of SˆX(1)=1, SˆX(2)=.75, SˆX(3)=.5, SˆX(4)=. : (or .5 depending on the definition) and SˆX(5)=. : (or 0 depending upon the definition). The semi-parametric warped estimator has estimates SˆX(1)=.96, SˆX(2)=.88, SˆX(3)=.75, SˆX(4)=.58 and SˆX(5)=.35.

Figure 1.

Figure 1

Estimate of Sˆ(x) from simulated data comparing the semi-parametric warped model versus the product-limit estimator versus true underlying survival distribution from a sample size of n = 1000 with X ~ exp(1) and C ~ exp(1/2).

Modification for shifted censoring distribution

A modification of the distribution function, survival function and density function at equations (5) to (8) is necessary when C has known support (d, ∞]. This may occur for example in a clinical trial with a survival endpoint with a minimum defined follow-up time, e.g. 5 years, such that d = 5. Modification for shifted censoring distributions for the distribution function, survival function and density function for a known d take the forms

FX(t)={FT(t),iftdβBp,q(FT(t)),ift>d (20)
SX(t)={ST(t),iftd1Bp,q(FT(t)),ift>d (21)
fX(t)={fT(t),iftdβbp,q(FT(t))fT(t),ift>d (22)

where

β=FT(d)Bp,q(FT(d)). (23)

The semi-parametric estimator follows as before by first replacing FT(t) with Fˆ(t)=i=1nI(tti)/(n+1), fˆT(t)=1/n and applying the same likelihood function as given at (14) and noting that the warping only occurs after d. This is similar to fitting a truncated distribution to the observations greater than d.

Example

As an illustration of the new semi-parametric warp estimation method relative to a shifted distribution we simulated n = 1000 pairs of failure and censoring times with X ~ exp(1) and C ~ U(1, 2), such that d = 1. From equation (14) and using the forms of the density and survivor functions at (22) and (21) we obtained pˆ=0.48 and qˆ=0.78 to account for the warped part of the distribution. We plotted the survival curve estimators for the semi-parametric warped estimator overlayed with the product-limit estimator. The plot depicted in Figure 2 provides a glimpse into the large sample behavior of the estimator being similar to the product-limit estimator. As in the nonshift example earlier the warped estimator provides a more extended estimator relative to the range of T. The two distributions are virtually equivalent for T < d with step sizes of 1/n versus 1/(n + 1), respectively.

Figure 2.

Figure 2

Estimate of Sˆ(x) from simulated data comparing the semi-parametric warped model versus the product-limit estimator from a sample size of n = 1000 with X ~ exp(1) and C ~ U(1,2).

3 Univariate simulation study

In this section, we illustrate how well our semi-parametric warped survival model fits to a general set of distributions when compared to the product-limit estimator versus simulating directly from a known true model defined at (5). The point being is to demonstrate the new semi-parametric warped survival estimator is well-suited for general estimation purposes.

We chose to simulate data of sample sizes n = 50, 200, 500 and computed the mean squared error (MSE) for the lower, middle, and upper quartiles, Q(1/4), Q(1/2), and Q(3/4), respectively from the Weibull(α, λ) family of distributions with distribution function given as F(t)=1e(x/λ)α. The censoring distribution was give as a standard exponential distribution. The results based on 1000 Monte Carlo replications are given in Table 2. In addition, the proportion of times an estimator was available is provided, i.e. the value for a given quartile may be missing if the last observation or observations are censored. The MSE values are based solely on observed values. The quantile estimator corresponding to the warped survival model is given as

QˆX(u)=FˆT1(Bpˆ,qˆ1(u)) (24)

where FˆT1() denotes the empirical quantile function for T, Bp,q1() is the beta quantile function and pˆ and qˆ come from the psuedo-likelihood estimation at (14).

Table 2.

MSE for Weibull quartile estimation with C ~ exp(1) with percent of observed quartiles. A comparison of the product-limit (PL) estimator to the warped survival (WS) estimator.

Estimator n Q(1/4) % Q(1/2) % Q(3/4) %
Weibull(1,1)
 WS 50 0.006 100 0.023 100 0.118 100
 PL 50 0.008 100 0.041 100 0.226   93.3
 WS 200 0.002 100 0.006 100 0.031 100
 PL 200 0.002 100 0.009 100 0.047   99.9
 WS 500 0.001 100 0.002 100 0.012 100
 PL 500 0.001 100 0.003 100 0.016 100
Weibull(3,3)
 WS 50 0.130   99.0 0.264   99.4 0.724   99.0
 PL 50 0.222   95.2 0.253   80.9 0.336   61.2
 WS 200 0.037 100 0.067 100 0.133 100
 PL 200 0.055 100 0.106   99.3 0.173   89.1
 WS 500 0.016   99.9 0.026 100 0.062 100
 PL 500 0.018   99.9 0.033 100 0.095   98.6
Weibull(3/4,3/4)
 WS 50 0.003 100 0.016 100 0.107 100
 PL 50 0.004 100 0.025 100 0.202   94.9
 WS 200 0.001 100 0.004 100 0.031 100
 PL 200 0.001 100 0.005 100 0.050   99.9
 WS 500 ¡0.001 100 0.002 100 0.012 100
 PL 500 ¡0.001 100 0.002 100 0.015 100
Weibull(3/4,3)
 WS 50 0.059 100 0.102 100 4.933 100
 PL 50 0.118   99.8 0.292   81.0 3.879   31.5
 WS 200 0.012 100 0.043 100 1.924 100
 PL 200 0.017 100 0.087   98.0 1.629   43.7
 WS 500 0.005 100 0.043 100 1.094 100
 PL 500 0.006 100 0.087 100 1.14   58.1

As seen from Table 2 the warped estimator provides an estimate of the quartiles a greater proportion of the time and consistently has a better MSE than that of the product-limit estimator when the estimator is available the majority of simulations. The exception for having worse performance relative to the MSE is when the product-limit estimator does not produce a quartile estimator in substantial proportion of cases. If we were to use the maximum observation5 as per large sample theory the MSE values would be similar to that of the warped estimator. As can be seen, this phenomenon occurs even for large samples in terms of not producing a traditional estimator for the upper quartile for the product-limit estimator whereas the warped estimator provided an upper quartile estimate 100% of the time for moderate to large samples with reasonable MSE values.

4 Univariate data example

In order to illustrate our warped survival estimator versus the product-limit estimator we utilized a textbook example10 of n = 43 observations from a surgically placed catheter group and fit time-to-infection. The data are as follows:

  • Infection times: 1.5, 3.5, 4.5, 4.5, 5.5, 8.5, 8.5, 9.5, 10.5, 11.5, 15.5, 16.5, 18.5, 23.5, 26.5

  • Censored observations: 2.5, 2.5, 3.5, 3.5, 3.5, 4.5, 5.5, 6.5, 6.5, 7.5, 7.5, 7.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 12.5, 13.5, 14.5, 14.5, 21.5, 21.5, 22.5, 22.5, 25.5, 27.5

The plot of the two curves is overlayed in Figure 3. The warped curve is more refined as it has more “steps”. The quartile estimates from the warped survival curve based on (24) along with the corresponding 95% bootstrap percentile confidence intervals based on 10,000 resamples are given as Qˆ(1/4)=9.5(7.5,14.5), Qˆ(1/2)=21.5(13.5,25.5), and Qˆ(3/4)=25.5(21.5,.). Similarly, for the product-limit estimator based on standard methods we have the point estimators and the corresponding 95% confidence intervals given as Qˆ(1/4)=10.5(5.5,16.5), Qˆ(1/2)=18.5(11.5,.), and Qˆ(3/4)=26.5(23.5,.). Practically speaking tied observations are not an issue since the empirical estimator for the observed values can accommodate ties readily.

Figure 3.

Figure 3

Time to infection estimates for surgically placed catheter group based on the semi-parametric warped model versus the product-limit estimator.

5 Two sample test for survival curve differences

In this section, we develop a new and novel two-group pseudo-likelihood ratio test for testing H0 : SX(t) = SY(t) versus H1 : SX(t) ≠ SY(t) for populations corresponding to the random variables X and Y. The new test easily extends to the k group setting and utilizes the semi-parametric warped estimator developed above at its core. The test follows along the lines of the Wilcoxon rank-sum test, for which the test statistic is a function of the pooled ranks between X and Y. This test has exact type I error control and can be applied for general two and k group comparisons.

Towards this end let X1, X2, ⋯, Xnx denote i.i.d. absolutely continuous failure times and let C1, C2, ⋯, Cnx denote the corresponding i.i.d. absolutely continuous non-informative right censoring times such that we observe Ri = min(Xi, Ci) and δix=I(Xi<Ci), i = 1, 2, ⋯, nx from group 1 and let Y1, Y2, ⋯, Yny denote the i.i.d. absolutely continuous failure times and let D1, D2, ⋯, Dny denote the corresponding i.i.d. absolutely continuous non-informative right censoring times such that we observe Tj = min(Yj, Dj) and δjy=I(Yj<Dj), j = 1, 2, ⋯, ny. Let the total sample size be denoted as n = nx + ny. The key assumption for this test is that the censoring distributions for group 1 and group 2 are equivalent, i.e. FC = FD, such as what might be seen in a randomized clinical trial setting.

In the semi-parametric warped setting, our test for the equivalence of survival curves and/or distribution functions may now be written compactly as

H0:Bpx,qx[FR(t)]=Bpy,qy[FT(t)] (25)
H1:Bpx,qx[FR(t)]Bpy,qy[FT(t)]

where FR(t) and FT(t) are the distribution functions for the observed values.

Construction of the pseudo-likelihood ratio test

In order to test the hypothesis at (26) we propose a pseudo-likelihood ratio test using the following steps:

  1. Estimate the distribution function under H0 for the combined values for Ri, i = 1, 2, ⋯, nx and Tj, j = 1, 2, ⋯, ny as
    Fˆpooled(t)=i=1nI(tzi)n+1, (26)
    where the n × 1 vector z = (r|t) is the concatenation of the two sets of observed value vectors from group 1 and group 2, respectively. This format is similar to using the pooled ranks in the Wilcoxon rank-sum test, where the pooled ranks are given as (n+1)Fˆpooled(t).
  2. Denote the observed values of Fˆpooled(t) as a function of z corresponding to group 1 as Fˆpooled1(t) and group 2 as Fˆpooled2(t), i.e. the rankits broken out by group.

  3. Denote the pseudo-likelihood under H0 as
    L0=i=1nxbpz,qz[Fˆpooled1(zi)]δix(1Bpz,qz[Fˆpooled1(zi)])1δix×i=nx+1nbpz,qz[Fˆpooled2(zi)]δjy(1Bpz,qz[Fˆpooled2(zi)])1δjy (27)
    where pz = px = py and qz = qx = qy under H0 at (26).
  4. Denote the pseudo-likelihood under H1 as
    L1=i=1nxbpx,qx[Fˆpooled1(zi)]δix(1Bpx,qx[Fˆpooled1(zi)])1δix×i=nx+1nbpy,qy[Fˆpooled2(zi)]δjy(1Bpy,qy[Fˆpooled2(zi)])1δjy (28)
  5. Denote the observed value of the pseudo-likelihood ratio statistic as Δ = −2(log(L0) − log(L1)).

  6. Permute the groups B times and for each permutation calculate B permuted pseudo-likelihood ratio statistics denoted as Δi=2(log(L0)log(L1)), i = 1,2,…,B.

  7. The Monte Carlo approximate permutation p-value is given as i=1BI(Δi>Δ)/B. The exact p-value can be calculated across all permutations as necessary. However, for large B the approximation will be sufficiently accurate.

Comment 1

It is critical to note the key point of this work in that our test is exact in terms of controlling the type I error given the permutation framework even if the underlying model does not perfectly fit the data. Hence, this test can be considered as a general semi-parametric test for comparing survival curves.

Comment 2

The approach above is readily extended to the k group setting by simply adding additional terms to L0 and L1 at (27) and (28), respectively, corresponding to group 3 and upwards.

In general, the large sample distributions for pseudo-likelihood ratio tests are complex11 and are only distributed asymptotically chi-squared under certain conditions. Generally, the distribution consists of a weighted average of chi-square distributions. However, heuristically we have discovered that Δ = −2(log(L0) − log(L1)) follows well a chi-squared distribution with 2 degrees of freedom similar to standard large sample theory and provides almost identical p-values to that of the permutation test. An extensive simulation study comparing the new test (and its chi-square approximation) versus the classic log-rank test and Wilcoxon rank-sum test follows our illustrative example in the next section.

6 Two-group comparison

In this section, we provide a simple two-group survival curve comparison using the surgically place catheter data (group 1) from section 4 with percutaneous placed catheter data10 (group 2) given as:

  • Infection times: 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 2.5, 2.5, 3.5, 6.5, 15.5

  • Censored observations: 0.5, 0.5, 0.5, 0.5, 0.5,0.5, 0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 3.5, 3.5, 3.5, 3.5, 4.5, 4.5, 4.5, 5.5, 5.5, 5.5, 5.5, 5.5, 6.5, 7.5, 7.5, 7.5, 8.5, 8.5, 8.5, 9.5, 9.5, 10.5, 10.5, 10.5, 11.5, 11.5, 12.5, 12.5, 12.5, 12.5, 14.5, 14.5, 16.5, 16.5, 18.5, 19.5, 19.5, 19.5, 20.5, 22.5, 24.5, 25.5, 26.5, 26.5, 28.5

The product-limit estimates for the survival curves are given by Figure 4, where the percutaneous group is shifted to the left. The emphasis going forward is that our new test will provide an alternative and oftentimes much more powerful test than the standard log-rank or Wilcoxon tests found in most statistical software packages. For the permutation two-group warped survival test we use B = 1000 permutations. The exact permutation p-value was 0.005. The approximate pseudo-likelihood based p-value based on a chi-square distribution with 2 degrees of freedom was 0.0004. The classic log-rank and Wilcoxon p-values were p = 0.4013 and p = 0.0097, respectively. In our simulation study, in the next section we will show that the new warped survival test oftentimes is much more powerful than both the log-rank and Wilcoxon tests and that the chi-square approximation is virtually identical to the permutation test. The permutation p-value is exact under the exchangeability assumption even if the warped survival model only approximates the true underlying population structure.

Figure 4.

Figure 4

Product-limit estimator for the time to infection estimates for surgically placed catheter group versus the percutaneous catheter group.

The example SAS code corresponding to this section may be found at https://www.researchgate.net/publication/289540278_SAS_CODE_FOR_A_Robust_Semi-Parametric_Warping_Estimator_of_the_Survivor_Function_With_an_Application_to_Two_Group_Comparisons

7 Simulation comparison of the new test with the log-rank and Wilcoxon tests

We compared the type I error and power between the exact permutation (EP) Method, approximate likelihood ratio (ALR), the classic log-rank (LR) test, and the classic Wilcoxon (WN) test for testing H0 : SX(t) = SY(t) versus H1 : SX(t) = SY(t) for Weibull (W), Log-Normal (LN), and Log-Logistic (LL) distributions. The results are found in Tables 3 to 5. For the simulations we used 500 Monte Carlo resamples, nx = ny = 35 and a non-informative censoring distribution C ~ exp(1) for both groups. For the Monte Carlo approximation to the permutation test we used 100 Monte Carlo permutations. The upper left column for each simulation table presents the type I error control for each method. As expected all methods control the type I error appropriately. In terms of statistical power the new permutation method provides the best global power across methods. There are a few instances where the log-rank test or the Wilcoxon test is superior to each other; however, in each instance the new permutation test is comparable and generally provides the best global coverage given an unknown underlying distribution. For example in Table 3 when group 2 is W(3,1), the power for the permutation test is 0.924 as compared to 0.118 for the log-rank test and 0.650 for the Wilcoxon test. However, when the group 2 distribution is W(2,1/2) the power for the permutation test is 0.856 as compared to 0.636 for the log-rank test and 0.178 for the Wilcoxon test. In this instance, in these examples the power magnitude for the log-rank test and the Wilcoxon test is reversed, yet the permutation test maintains better power for each alternative. In some instances, the new test is dramatically more sensitive than the log-rank and/or the Wilcoxon test towards testing the hypothesis of interest, e.g. in Table 4 with the LN(0,1/2) alternative for group 2 the power for the permutation test is 0.504 as compared to 0.064 for the log-rank test and 0.234 for the Wilcoxon test. These results consistently hold across a variety of distributions and sample sizes and are not presented. Finally, it should be noted that the ALR test approximates well the permutation test and could be used given large sample sizes when the permutation test may be computationally infeasible.

Table 3.

Comparison of type I error and power between the exact permutation (EP) Method, approximate likelihood ratio (ALR), the classic log-rank (LR) test, and the classic Wilcoxon (WN) test for testing H0 : SX(t) = SY(t) versus H1 : SX(t) ≠ SY(t) with group 1 distributed as W(1,1), nx = ny = 35 and censoring distribution C ~ exp(1) for both groups.

Estimator Group 2
W(1,1) W(1/2,1) W(1/2,2) W(1,1/2) W(1,3)
EP 0.046 0.522 0.460 0.452 0.688
ALR 0.044 0.484 0.430 0.426 0.684
LR 0.036 0.126 0.070 0.556 0.768
WN 0.048 0.394 0.098 0.432 0.648
Estimator W(3,1) W(2,1/2) W(3,3) W(1/3,1/3) W(2,2)
EP 0.924 0.856 0.999 0.980 0.918
ALR 0.944 0.874 0.999 0.978 0.920
LR 0.118 0.636 0.999 0.750 0.876
WN 0.650 0.178 0.999 0.976 0.944

Table 5.

Comparison of type I error and power between the exact permutation (EP) Method, approximate likelihood ratio (ALR), the classic log-rank (LR) test, and the classic Wilcoxon (WN) test for testing H0 : SX(t) = SY(t) versus H1 : SX(t) ≠ SY(t) with group 1 distributed as LL(0,1), nx = ny = 35 and censoring distribution C ~ exp(1) for both groups.

Estimator Group 2
LL(0,1) LL(−1/2,1/2) LL(−1/2,1) LL(−1/2,2) LL(0,1/2)
EP 0.044 0.520 0.224 0.500 0.438
ALR 0.036 0.528 0.214 0.438 0.452
LR 0.050 0.186 0.298 0.180 0.084
WN 0.030 0.080 0.302 0.440 0.252
Estimator LL(0,2) LL(1/2,1/2) LL(1/2,1) LL(1/2,2) LL(−3,3)
EP 0.328 0.652 0.116 0.262 0.998
ALR 0.282 0.654 0.112 0.232 0.996
LR 0.092 0.428 0.152 0.044 0.904
WN 0.204 0.740 0.158 0.090 0.992

Table 4.

Comparison of type I error and power between the exact permutation (EP) Method, approximate likelihood ratio (ALR), the classic log-rank (LR) test, and the classic Wilcoxon (WN) test for testing H0 : SX(t) = SY(t) versus H1 : SX(t) ≠ SY(t) with group 1 distributed as LN(0,1), nx = ny = 35 and censoring distribution C ~ exp(1) for both groups.

Estimator Group 2
LN(0,1) LN(−1/2,1/2) LN(−1/2,1) LN(−1/2,2) LN(0,1/2)
EP 0.048 0.718 0.224 0.758 0.504
ALR 0.040 0.716 0.214 0.756 0.520
LR 0.056 0.548 0.298 0.310 0.064
WN 0.042 0.186 0.302 0.688 0.234
Estimator LN(0,2) LN(1/2,1/2) LN(1/2,1) LN(1/2,2) LN(-3,3)
EP 0.494 0.862 0.294 0.390 0.999
ALR 0.496 0.868 0.280 0.392 0.999
LR 0.090 0.574 0.334 0.074 0.998
WN 0.282 0.886 0.348 0.066 0.998

8 Summary remarks

In this note, we consider the classic mathematical statistics relationship ST = SX(t)SC(t), where T = min(X, C) as described in the introduction. The key idea is that FX(t) ≈ g(FT(t)) where g is what we termed a warping function. The choice of g is arbitrary as long as g(FT(t)) is a proper distribution function. Since for a given sample size n we can arrive at a proper estimator for FT(t) given as FˆT(t)=i=1nI(tti)/(n+1) this implies that for the appropriate g, g(FˆT(t)) is also a proper distribution function estimator. For our purposes we defined g to be a beta distribution function, which in turn allows for g(FˆT(t)) to take on several shapes similar to what has been evolving in the parametric literature using beta transformation families.12 The method is quite robust in that in the most basic sense we are working with a transformation of rankits and not the raw data values. The theoretical framework for this estimator follows nicely from previous work about pseudo-likelihood methods.

The utility of this semi-parametric approach is not necessarily found in the estimation of Sx but is more directed towards the two and k group comparisons of survival curves. We provide an exact α level test, that as might be expected has substantial power gains over the more traditional tests when the proportional hazards assumptions do not hold relative to the log-rank test and the weighted proportional hazards assumptions do not hold relative to the Wilcoxon test. For large samples we provide an asymptotic chi-square approximation, which also appears to provide very accurate type I error control in the medium sample size setting. In general, the new exact permutation test is relatively efficient when the various proportional hazards assumptions do hold. It was a rare event in our simulation study where the log-rank or Wilcoxon test was more efficient than the new semi-parametric approach. Hence, in terms of a global test we believe this new test has a great deal of utility, particularly when the proportional hazards assumption may be assumed to be unreasonable, e.g. a delayed treatment effect of one group compared to another, which may lead to a change-point type relationship relative to the respective hazard function ratio. Similarly, under classic permutation test exchangeability assumptions the test is always exact in terms of type I error control.

Acknowledgments

We wish to acknowledge the referees. Their comments greatly enhanced the presentation of our proposed approach.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under award number UL1TR001412.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

  • 1.Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958;52:457–481. [Google Scholar]
  • 2.Ryan TP, Woodall WH. The most-cited statistical papers. J Appl Stat. 2005;32:461–474. [Google Scholar]
  • 3.Miller RG., Jr What price Kaplan-Meier. Biometrics. 1983;39:1077–1081. [PubMed] [Google Scholar]
  • 4.Oakes D. A note on the Kaplan-Meier estimator. Am Statist. 1983;47:39–40. [Google Scholar]
  • 5.Chen K, Lo S-H. On the rate of uniform convergence of the product-limit estimator: Strong and weak laws. Ann Stat. 1997;25:1050–1087. [Google Scholar]
  • 6.Cox DR, Oakes D. Analysis of survival data. New York: Chapman & Hall/CRC; 1984. [Google Scholar]
  • 7.Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natnl Cancer Inst. 1959;22:719–448. [PubMed] [Google Scholar]
  • 8.Prentice RL. Linear rank tests with right censored data. Biometrika. 1978;34:167–179. [Google Scholar]
  • 9.Genest C, Ghoudi K, Rivest LP. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika. 1995;82:543–552. [Google Scholar]
  • 10.Klein JP, Moeschberger ML. Survival analysis: Techniques for censored and truncated data. New York: Springer; 2003. [Google Scholar]
  • 11.Liang K-Y, Self SG. On the asymptotic behaviour of the pseudolikelihood ratio test statistic. J Roy Stat Soc Ser B. 1996;58:785–796. [Google Scholar]
  • 12.Jones MC. Families of distributions arising from distributions of order statistics. Test. 2004;13:1–43. [Google Scholar]

RESOURCES