Abstract
Background and objectives:
In survival analysis both the Kaplan-Meier estimate and the Cox model enjoy a broad acceptance. We present an improved spline-based survival estimate and offer a fully automated software for its implementation. We explore the use of natural cubic splines that are constrained to be monotone. Apart from its superiority over the Kaplan Meier estimator our approach overcomes limitations of other known smoothing approaches and can accommodate covariates. Unlike other spline methods, concerns of computational problems and issues of overfitting are resolved since no attempt is made to maximize a likelihood once the Kaplan-Meier estimator is obtained. An application to laryngeal cancer data, a simulation study and illustrations of the broad application of the method and its software are provided. In addition to presenting our approaches, this work contributes to bridging a communication gap between clinicians and statisticians that is often apparent in the medical literature.
Methods:
We employ a two-stage approach: first obtain the stepwise cumulative hazard and then consider a natural cubic spline to smooth its steps under restrictions of monotonicity between any consecutive knots. The underlying region of monotonicity corresponds to a non-linear region that encompasses the full family of monotone third-degree polynomials. We approximate it linearly and reduce the problem to a restricted least squares one under linear restrictions. This ensures convexity. We evaluate our method through simulations against competitive traditional approaches.
Results:
Our method is compared to the popular Kaplan Meier estimate both in terms of mean squared error and in terms of coverage. Over-fitting is avoided by construction, as our spline attempts to approximate the empirical estimate of the cumulative hazard itself, and is not fitted directly on the data.
Conclusions:
The proposed approach will enable clinical researchers to obtain improved survival estimates and valid confidence intervals over the full spectrum of the range of the survival data. Our methods outperform conventional approaches and can be readily utilized in settings beyond survival analysis such as diagnostic testing.
Keywords: Constrained splines, Cox model, Restricted least squares, Kaplan Meier, MATLAB, Smooth distribution function, Survival
1. Introduction
In clinical studies that assess patient survival times, necessary tools are employed to estimate the survival curve for patients with a given disease of interest or the underlying survival curves for different treatment arms of the study. Based on such estimates, median survival rates and other measures can be calculated. For example, different trajectories of the survival curves among different treatments can be visualized, along with their confidence intervals (CIs). This is commonly addressed with the use of the well-known Kaplan-Meier (KM) survival estimator (see [1]). Further analysis might refer to the estimation of the underlying survival curves given a specific covariate profile such as patient age, gender, or stage of cancer; this is commonly accomplished through the use of the Cox model (see [2]). Various techniques have been proposed to improve survival estimation by providing smooth estimates. On the other extreme of the crude KM estimate lie common parametric models such as the Weibull or the Gamma model. However, to implement such models, one must be able to justify distributional assumptions that in practice may be unrealistic or even computationally infeasible. For example, the problem of fitting the generalized gamma distribution, which is a broad parametric model for survival settings (see [3]), is still under study due to numerical difficulties associated with non-convergence problems. Especially in exploratory studies where no prior information is available regarding parametric distributional assumptions, a nonparametric approach is a necessity, and that is the scenario we address in this paper.
Splines provide a smooth nonparametric-based strategy (see [4] and [5] for an overview). The main literature on splines is presented under a nonparametric regression framework, but there is relevant work under survival settings as well (see [6] and [7] among others). Splines should not be confused with common parametric models. Strictly speaking, splines involve parameters that need to be estimated from the data; however, the rationale for using a spline is that it belongs to a family of piecewise polynomials that are flexible enough to capture models that lie in super-families. As mentioned in [8], all nonparametric curve estimates involve underlying smoothing parameters. An example is the number of bins of a histogram, which is analogous to the bandwidth of a nonparametric kernel estimate. This is not to be considered equivalent to the case of common parametric models in which the modeling assumptions are strict.
Even though there have been significant advancements that provide better and more efficient survival estimators compared to traditional, crude nonparametric empirical-based methods, the consideration of such advanced approaches in clinical practice is very limited. An example is the well-known receiver operating characteristics (ROC) curve that is typically employed under a diagnostic test framework, which is nothing more than a function of two survival curves. It has been almost two decades since [9] published a paper providing formal proof that kernel estimators of the ROC curve are better than the empirical estimators. Yet, in spite of numerous readily available kernel-based software applications, most clinical researchers insist on using empirical-based estimates; see [10] as one indicative example among many. That research focuses on esophageal, gastric and colon cancers and the evaluation of two biomarkers, carcinoembryonic antigen and CA19-9. The situation is similar in studies that use survival endpoints. In simulations, [11] showed spline-based estimators to be more efficient than the regular KM estimators. Other more recent studies (see [12]) have discussed similar issues. However, we see that the KM estimator is the only one considered in the vast majority of clinical papers that refer to studies of pancreatic, breast, colon, kidney, lung and prostate cancer, among others (see [13], [14], and [15] as indicative examples).
The rationale for smoothing relies on a disadvantage of the KM estimator related to its stepwise form. By construction, the KM estimator implies that the survival rate is unchanged between two event times of arbitrary length. This assumption is quite unnatural for the true and unknown survival curve. Another drawback of the KM survival estimator is that it cannot provide estimation beyond the last event time, even if censored observations follow. As a result, construction of CIs near the tail of the survival curve might collapse. However, its smooth alternatives might also suffer from similar and other drawbacks, which we state below.
Kernel-based alternatives are based on the jumps of the KM estimator. Consider the data (Ti, Di), i = 1, … , n, where Ti = min (Xi, Ci), Xi is the time to an event of interest, Ci is the censoring variable, and Di is a binary indicator variable taking the value 1 for the event of interest and 0 for censoring, i.e., Di = I(Xi < Ci). Let T(i), D(i), i = 1, … , n be {Ti, Di} ordered with respect to the Tiâs. The KM survival estimator is given by
and the estimated kernel density is
with Kh(u) = h1K(u/h), where K( · ) is the kernel function, Si is the size of the jump of the KM estimator at Ti, and h is a positive number called the bandwidth. Note that Si = 0 if and only if Ti corresponds to a censored observation. This approach shares the same disadvantage as the KM estimator, that is, it is limited to estimating survival up to the last event time. For a review on kernel smoothing, see [8].
Regarding a spline-based alternative, we refer to the log-spline approach, which was introduced for censored data by [6]. Let the integer K ≥ 3, and the knot sequence τ1, …, τk with −∞ ≤ L < τ1 < … < τk ≤ U ≤ ∞, where L and U are some numbers. The log-spline density model is stated as
| (1) |
where
is the normalizing constant and the basis functions B1(x), B2(x), … , Bp(x) can be chosen to form a natural spline. It also provides an exponential extension of the survival estimate beyond the last knot. However, in the presence of censoring, the log-spline-based likelihood function is not convex, and convergence of any optimization routine is not guaranteed. Furthermore, it cannot accommodate covariates. For the accommodation of covariates, a spline-based approach was proposed by Herndon and Harrell [7]. Their method is based on the maximization of the underlying likelihood, which under the spline structure might lead to computational problems of convergence, especially for small sample sizes and/or high levels of censoring.
In this paper, we provide an improved version of the hazard constrained natural spline (HCNS) approach initially presented in [12] that overcomes all drawbacks of the aforementioned approaches. More specifically, the proposed methodology has the following attractive features. (i) It provides a smooth estimate of the survival function. (ii) Based on simulations, the proposed survival estimate outperforms the KM estimate. (iii) Its convergence is guaranteed since convex optimization is involved. (iv) It provides estimation beyond the last event time. Along with the details of the methodology, we discuss the underlying algorithm and also provide fully automated, user-friendly software to communicate this approach to practitioners and clinicians. This paper is organized as follows: In Section 2, we state the model specification and discuss its properties. We note that the fit of the model is based on minimizing its distance from the broadly used KM estimator (or the Cox model in the case of covariates); hence, there are no concerns of overfitting and its robustness is simultaneously not compromised. In Section 3, we describe the HCNS approach, and provide examples that can be straightforwardly reproduced by the reader using MATLAB. In Section 4, we present a simulation study and in Section 5, an application, showing the necessary code for analysis. We conclude with a discussion.
2. Background methodology
Here we describe the proposed method which we will refer to as HCNS (Hazard Constrained Natural Spline). Consider again data of the form {Ti Di}, where Ti = min(Xi, Ci), with Ci being the censoring variable, Xi the time-to-event variable and Di the event indicator, taking the value 1 for an event and zero otherwise (note that in MATLAB all routines that accommodate censoring have the opposite coding for the events/censorings; however we decided to provide the option for the user to define the event value). Consider K distinct knots τ1 < τ2 < … < τk. Denote x+ = max (0, x). We consider the following model for the cumulative hazard function:
| (2) |
where
| (3) |
See also [5] and [7] for an overview and a similar spline formulation. Model (2) has the following properties:
(i) It is linear beyond the last knot;
(ii) It equals zero before the first knot;
(iii) Its first and second derivatives are continuous;
(iv) Its first derivative is zero at the first knot; and
(v) It has K – 2 parameters to be estimated.
Survival estimation based on the available data {Ti, Di} is obtained in two stages. In the first stage, the KM-based cumulative step hazard function is obtained. In the second stage, model (2) is fitted to the corners of the cumulative step hazard function. The steps of the KM estimator occur only at event times, that is at Ti∣Di = 1. However, the cumulative hazard function is monotone, and thus monotonicity restrictions must be imposed.
Monotonicity constraints for a cubic polynomial, P, in an interval [τj, τj+1] where τj < τj+1 are given by [16]. Let and , and consider the region M = M1 ∪ M2, where M1 is the square defined by α = 0, 3 and b = 0, 3 and M2 is the ellipse defined by φ(α, b) = (α – 1)2 + (α – 1)(b – 1) + (b – 1)2 – 3(α + b – 2) = 0. Ellipse φ(α, b) is tangent to the coordinates (0,3) and (3,0). The cubic polynomial, P, is ensured to be monotone in the interval [τj, τj+1] if and only if the point (αj, bj) lies within region M. See Fig. 1 for a visualization of region M. Outside M, P is nonmonotone [τj , τj+1]. Under the further restriction that 0 ≤ min(P′(τj), P′(τj+1)), the polynomial is non-decreasing.
Fig. 1.

Monotonicity region M of a cubic polynomial. M corresponds to the union M = M1 ∪ M2, where M1 is the square defined by a = 0, 3 and b = 0, 3 and M2 is the ellipse defined by φ(a, b) = (a – 1)2 + (a – 1)(b – 1) + (b – 1)2 – 3(a + b − 2) = 0. Ellipse φ(a, b) is tangent to the coordinates (0,3) and (3,0).
Note that region M is a nonlinear region and if one attempts to fit model (2) under the implied conditions of monotonicity based on region M, then computational problems may occur. If one considers a linearly defined subregion, A, of M, then monotonicity would be achieved but other candidate models would be excluded. On the other hand, a linearly defined region has the merit of reducing the problem to be a restricted least squares one, with linear restrictions on the parameters. Thus, convergence is guaranteed since the function to minimize is always convex. We explore a linear approximation A of the entire region of monotonicity M by using optimal (in terms of the inscribed area) polygons within M. Smith [17] presented an algorithm for deriving the optimal inscribed polygon within an ellipse. Using Smith’s algorithm, Bantis et al. [12] showed that any optimal inscribed (8k + 2)-gon, k = 1, 2, … within M can be exactly calculated. The MATLAB code given in the Appendix provides the optimal inscribed polygon along with the corresponding plot for region M. The user is only expected to provide the value of k in the first line.
In [12], we explored the use of an optimal 18 – gon to approximate region M (k = 2). In Fig. 2, we show the approximation obtained from the optimal 10-gon (k = 1), 18-gon (k = 2). 26-gon (k = 3) and 34-gon (k = 4); these figures can be reproduced by the code given in the Appendix.
Fig. 2.

The shaded region is the linear approximation of the monotonicity region for k = 1, 2, 3, 4. The circles refer to the approximation of the region M. The dots refer to the approximation of the ellipse φ(a, b).
Based on simulations, [12] recommended the use of six knots due to the restrictions that force the model to be zero before the first knot. For the same reason, one may choose to place more knots near the early event times (i.e., for smaller Ti’s) where additional flexibility is required to avoid underestimating the cumulative hazard function in that region. Regarding the knot placement, in this paper, we consider a more sophisticated strategy: We derive 10 equally spaced points that expand from min (event times) = min(Ti∣Di = 1) up to max(event times) = max(Ti∣Di = 1); each of these points is a candidate for placing a knot. Using 6 knots, there are 10!/(6!4!) = 210 possible combinations, and thus 210 possible knot schemes. Next, we consider 10 points at the following percentiles that are calculated only by the fully observed data: 0, 2.5th, 5th, 10th, 20th, 40th, 50th, 60th, 80th, and 100th percentiles. Exploring again all possible combinations, there are 210 additional combinations (knot schemes) to be explored. In a given application, and if asked by the user as we see in the next section, all 420 knot-schemes are tested by fitting model (2) to the KM-based cumulative hazard function. Finally, the knot scheme that results in the smallest distance to the corners of the step function is chosen. That is, the knot scheme selection is based on the criterion
| (4) |
where is the fitted model defined in (2) under the appropriate constraints of monotonicity, and is the KM-based cumulative hazard estimator.
This may seem like a difficult task from the point of view of computing time. However, with current computer technology, this procedure requires only a matter of seconds. In effect, resampling methods are feasible for statistical inference on a given data set, as illustrated in our simulation study later on.
2.1. Constraints and optimization
We aim to minimize the function
| (5) |
where H is model (2), under the linear constraints of monotonicity. Denote with Q the linear segments that form the approximation of the entire region of monotonicity M without including the ones that lie on the horizontal and vertical axes (for example, Q = 16 for the optimal 18-gon). There are Q(K – 1) + K constraints, consisting of Q(K – 1) + K – 1 inequalities and one equality given in [12]. Alternatively, one can consider Q(K – 1) + K + 1 inequality constraints, since the equality can be written as two inequalities, and so finally all constraints can be written in the following form:
Thus, the problem stated is a restricted least squares one with linear restrictions. The function to minimize is always convex and convergence is guaranteed. A MATLAB built-in function for these kinds of optimization problems is lsqlin.
2.2. Accommodating covariates
The generalization of the approach to accommodate covariates is done under the assumption of proportional hazards. The usual Cox model can be fitted to the data in order to obtain the baseline cumulative step hazard estimate. Then, model (2) is fitted to this crude estimate under the same constraints discussed above. Thus, the two-stage analysis required in a setting with p covariates Z1, Z2, … , Zp, which can be more compactly denoted as matrix Z
Stage 1: Fit the Cox model of the form and derive
Stage 2: Fit model (2) to the corners of the (that is where events occur) under the constraints A[β1, β2, …, βK–1]′ ≤ 0 and derive the corresponding estimate .
Once the model (2) is fitted, one can easily derive any survival estimation for any profile of a subject based on , where are simply the estimates provided by the usual Cox model fit. In order to examine the proportional hazards assumption, we introduce the HCNS Cox-Snell-type residuals. We define them as
| (6) |
where is the HCNS baseline cumulative hazard estimate. A graphical check of the proportional hazards assumption can be obtained by plotting the HCNS cumulative hazard estimate of the residuals ri versus the ri. If the proportional hazards assumption is valid, then this plot should provide a curve close to the reference diagonal line with slope 1 through the origin.
The HCNS package provides all the functions needed to apply the described methodology. An cell by cell illustration along with the data used in our application is provided in the file. The structure of the main routine of the HCNS can be summarized by the flowchart presented in Fig. 3.
Fig. 3.

Flowchart of the underlying algorithm of the HCNS approach.
3. Software description
The routine has been developed using MATLAB R2018a and will maintained at the first author’s website www.leobantis.net. The functions included in the file are HCNS, HCNSboots, HCNSsup, HCNScox, HCNScoxsup, conlsqlin, cnsk, and approxM. We also provide cell by cell illustrations and examples in the file examples, along with the actual data (file applicationlarynx) that are analyzed in the Application section. Apart from functions HCNS and HCNSboots, all others are interior functions that are called depending on the choices of the user. Right next we provide a description of these two functions which are of interest to the user. The inputs of the HCNS function are the following (mentioned with the order that are required from the user):
INPUT
time: an array that contains the time variable that is right censored.
status: a boolean array taking values 0 or 1 if the corresponding element of time is an event time or a censoring time respectively. (Note that MATLAB uses this coding in all its “survival” related functions, which is the opposite of the common coding used in a survival settings. We developed the code using the MATLAB coding).
defineevent: Define the event (must be a scalar value equal to zero or one).
Z: a covariate matrix (each column corresponds to one covariate). If there are no covariates available then use “[]” instead.
knots: The knots provided by the user. There is also an option of setting this field to “auto” and 6 knots will be used after checking all 420 combinations of knot schemes described in Section 2.
k: A positive integer greater or equal to 1. Based on the value of k, the optimal in terms of inscribed area (8k + 2)-gon will be used for approximating the region of monotonicity.
options: As defined in the built in MATLAB routine coxphfit. Refers only to the Cox model fitting procedure, dealing with maximum number of iterations as well as the toleration of convergence. See the coxphfit documentation.
OPTIONAL INPUT:
plots: Can be set as cumulative hazard, survivor, cdf or none to plot the corresponding functions along with the corresponding empirical function. In the case where covariates (Z) are available the baseline corresponding functions are plotted (i.e. at Z=0, for all covariates). The “none” option does not create any graph and allows you to proceed to the next optional input arguments.
profil: It may contain a specific profile of covariate values and the estimates produced will refer to that specific profile. It is an array with length equal to the number of columns of the covariate matrix Z. (The “plots” input argument must be given if the “profil” argument is to be used).
profilplots: Can be set as “cumulative hazard”, “survivor”, “cdf” to plot the corresponding functions along with the empirical corresponding function for the specific profile given in “profil”.
OUTPUT
bhat: the estimated spline coefficients.
Hx: a “function handle” that can yield the value of the estimate of the cumulative hazard estimator based on the presented method, for any value(s) of x.
Fx: a “function handle” that can yield the value of the estimate of the cumulative distribution estimator based on the presented method, for any value(s) of x.
Sx: a “function handle” that can yield the value of the estimate of the survival estimator based on the presented method, for any value(s) of x.
knots: The knots used. If the knots were provided by the user then these knots are simply returned. If the knots were set to auto then the selected knot scheme is returned.
KMdist: The sum of squares of the spline model from the corners of the step cumulative hazard function (that is the quantity presented in (5)).
gcoxhat: the estimated Cox coefficients if covariates are available. If there are no covariates then ghat is set to be NaN (i.e. “not a number”).
stats: provides standard errors, Z statistics, p-values as well as the covariance matrix of the Cox estimated coefficients (from the built in MATLAB function coxphfit).
Next, we provide some examples that can be straightforwardly reproduced by the reader in MATLAB so as to clarify the input/output arguments:
Example 1. Generate some data (n = 300) from the Weibull distribution with parameters 2 and 3 and then apply the presented method:
n=300 x=wblrnd(2,3,n,1); % Generate some data from the Weib(2,3) c=wblrnd(2,3,n,1); % Generate from the Weib(2,3) the censoring variable xcen=min(x,c); % Derive the censored data (expected censoring=50%) status=(x>c); % Derive the censoring indicator % The data for analysis are the variables xcen and status % The presented approach is carried out from the following line: [bhat Hx Fx Sx KMdist knots] =HCNS(xcen, status,0, [], 'auto', 2,[], 'survivor')
An optimal 18-gon is used to approximate the monotonicity region (since k is set to 2). The array bhat contains the spline coefficients estimates, and the gcoxhat is returned to be NaN since no covariates are available. The knot placement procedure is set to “auto”, thus 6 knots are used and 420 knot placement schemes are tested. The knots returned for this specific data set generated are knots=[ 0.2591 0.5761 0.8931 1.2101 2.4781 2.7951] (as can also be seen by Fig. 4) and the sum of the squared distance from the corners of the KM estimator is KMdist=0.1114. Alternatively the user could manually provide the knots desired. The plot of the survival estimate is asked by the user and the plot that is generated by the above code is given in Fig. 4. The survival estimate is plotted up to the last event time, however the user can also plot the presented estimates beyond the last event time. The “function handles” Hx, Fx and Sx can be used to evaluate the proposed estimate at any time value. Of course, caution is needed when extrapolating the curves. For example, to evaluate the estimate at 0.5, 1, 1.5 and 2 we request:
Fig. 4.

Survival estimate of the presented method as generated by the code in Example 2.
Hx([0.5 1 1.5 2])
which yields as a result 0.0087 0.1276 0.3855 1.1723. Similarly we derive:
Fx([0.5 1 1.5 2])
which yields 0.0087 0.1198 0.3199 0.6903 and
Sx([0.5 1 1.5 2])
which yields 0.9913 0.8802 0.6801 0.3097
Example 2. In this example we will generate some time values from a Cox model and then use the presented routine for estimation:
n=300; u=rand(n,1); % Generate n number from the Uniform(0,1) a=2;b=3;g=2; % Parameters of the Weibull baseline survival, and cox coef g=2 z=exprnd(0.3,n,1); % Generate exponentially distributed covariate with mean 0.3 x=(-log(u)./(a.^(-b).*exp(g.*z))).^(1./b); %Generate values from Cox model c=exprnd(4,n,1); % Generate the censoring variable status=(x>c); % Derive the censoring indicator xcen=min(x,c); % Derive censored time values % The data for analysis now are the variables xcen, status, and z (the covariate) % We apply the presented approach by using only the following line [bhat Hx Fx Sx KMdist knots gcoxhat stats]=HCNS(xcen, status, 0, [z], 'auto',2,[], … 'survivor')
The interpretation of the output is similar to the one provided in Example 1. Here we derive a value of ghat=2.041 which is the estimated coefficient of the covariate based on the usual Cox model. The plot requested now, is the survival estimate for the baseline survival, that is for Z = 0, and is presented in Fig. 5.
Fig. 5.

Survival estimate of the baseline survival (Z=0) as generated by the code in Example 2.
We base inference on the percentile bootstrap technique. HCNSboots can be used to obtain 95% confidence intervals for the cumulative hazard, survival, or cumulative distribution functions. The input/output arguments of the HCNSboots are the following:
INPUT:
time: As defined in the HCNS routine.
status: As defined in the HCNS routine.
defineevent: Define the event (must be a scalar value equal to zero or one).
Z: As defined in the HCNS routine.
knots: As defined in the HCNS routine. The auto option is still available. If chosen, then all knot combinations will be explored for each bootstrap sample.
kgon: As defined in the HCNS routine.
options: As defined in the built in MATLAB routine coxphfit. Refers only to the Cox model fitting procedure, dealing with maximum number of iterations as well as the toleration of convergence. See the coxphfit documentation.
CIat: Time values at which the 95% confidence intervals are to be obtained.
boots: The number of the bootstrap samples.
clevel: The level of significance, for example 0.05 to obtain 95% confidence intervals.
discard: A positive number declaring the number of standard deviations based on which the intervals gcoxhat+-discard*SE(gcoxhat) will be build. Setting this as “[]” is equivalent to “Inf”, that is infinity. If for a bootstrap sample one or more of the Cox coefficients lie outside the corresponding interval(s), then the bootstrap sample will be discarded and another one will be drawn instead. This is since all data might turn out to be censored or the last event for one group may be earlier than the first event of another for two categories of one covariate. In these cases one might end up trying to maximize a monotone likelihood during fitting a Cox model. (see also [18])
OPTIONAL INPUT:
profil: As defined in the HCNS routine. If given, then the confidence intervals will be derived for the selected covariate profile. Obviously, input argument Z must be also given if this optional input argument is to be used.
OUTPUT:
CIH: A two column matrix that contains the derived 95% bootstrap based confidence intervals for the cumulative hazard. Its left column are the lower confidence limits and its left refer to the upper ones. These confidence intervals refer to the specific profile profil, if provided.
CIS: Confidence intervals for the survival function.
CIF: Confidence intervals for the cumulative distribution function.
countb: How many bootstrap samples were discarded.
gcoxboots: A matrix with rows defined by “boots” and columns equal to the number of covariates, that is equal to the columns in Z. Contains the bootstrap estimates of the underlying Cox model for each bootstrap sample.
In a traditional KM survival plot one can see what are the time-points that events and censored observations occur. Regarding our plots, we choose to plot the knots on the x-axis as the knots provide a crucial part of our developed approach. A practitioner, if necessary, could simply plot over our survival curves (on the time x-axis), the event times as dots of a specific color and the censored data points with a different color.
4. Simulation study
We present a simulation study that involves three underlying distributions for data generation. A Weibull distribution for which we chose the parameters to match those of a fitted Weibull to the larynx data of the application section (Weib(7.5,1.015)), a Gamma (Gam(2,0.2)), as well as a mixture of two Weibulls that exhibits bimodality (0.5Weib(3,3)+0.5Weib(7,5)). We considered levels of random censoring of 30%, 50% and 70% and sample sizes of 50, 100, and 300. The HCNS outperforms the KM with impressive differences throughout, and especially close to the tail of the survival curve. We also provide a plot illustrating the coverage for the bootstrap based confidence intervals for the HCNS method compared to those obtained based on the Greenwood formula for the traditional Kaplan Meier estimator (see Fig. 6). We explore the coverage for a setting where the true density is a Weibull distribution and investigate levels of censoring at 30%, 50%, and 70%. We observe that the aim of 95% coverage is attained for the proposed method while the Greenwood based CIs for the KM dramatically fail as the level of censoring increases near the tail of the distribution. For this setting we extrapolate the KM estimator beyond the last event time, at the value attained at the last event time. In case one considers the KM to be 0 beyond the last event time, the results related to the coverage of Greenwood’s formula for the KM are even worse (not presented here for brevity).
Fig. 6.

Coverage of the bootstrap based HCNS confidence intervals versus the Greenwood based confidence intervals for the traditional Kaplan Meier. The coverage is explored at the 20th, 30th, …, 90th and 95th percentiles. The data are generated from a Weibull(2,3) distribution with a sample size of 100 and levels of censoring 30%, 50% and 70%. The targeted coverage is 95%.
The criterion that we base our results against the KM estimator is the Mean Integrated Squared Error (). Since the KM estimator cannot provide estimation beyond the last event time we consider that the KM estimate of the survival beyond the last event time equal to , where tmax = max(event time). The values we report correspond to the integrated the squared error from 10th to 90th, 20th to 80th and 30th to 70th percentiles of the true survival functions. That is, we obtain , and for 1000 repetitions and then we evaluate the average to obtain the corresponding MISE’s: MISE(10–90), MISE(20–80) and MISE(30–70). Finally we compare based on the relative MISEs i.e. . The rMISE(20–80) and rMISE(30–70)) are similarly defined. Namely if the rMISE is a times > 1 then this implies that the KM exhibits a times higher mean integrated squared error for that interval compared to the proposed method. We used the knot placement set in auto and k = 2. This means that an optimal 18-gon is used to approximate the region of monotonicity for each iteration. All relevant results are presented in Table 1. We observe that in all cases the presented approach outperforms the KM estimator. Most differences, however, seem to appear for heavier censoring and/or lower sample sizes. The results that refer to the tail of the true underlying distribution reveal a dramatic outperformance of the KM. Results are analogous for the coverage performance of our method against the Greenwood based CIs as well. We observe that the aim of 95% coverage is attained for the proposed method while the Greenwood based CIs for the KM dramatically fail as the level of censoring increases near the tail of the distribution. For this setting we extrapolate the KM estimator beyond the last event time, at the value attained at the last event time. In case one considers the KM to be 0 beyond the last event time, the results related to the coverage of Greenwood’s formula for the KM are even worse (not presented here for brevity). Regarding the form of the polygon used to approximate the monotonicity region between two consecutive knots, we note that while the execution time is a matter of seconds for a given data set, it will increase significantly if we set k to high (integer) values. We recommend using k = 2 or 3 and not too large values from a computational time point of view.
Table 1.
Comparison of the KM and the HCNS approach when the auto option is employed for knot specification. The rMISE10–90, rMISE20–80, rMISE30–70, and rMISE80–95 are considered for evaluation. These measures express how many times less mean squared error does the HCNS provide compared to the KM.
| Distribution | n | Censoring | 10–90 | 20–80 | 30–70 | 80–95 |
|---|---|---|---|---|---|---|
| 30% | 1.3710 | 1.3668 | 1.5621 | 4.8686 | ||
| 50 | 50% | 2.1452 | 1.6310 | 1.7463 | 23.6453 | |
| 70% | 7.7892 | 4.9934 | 4.0690 | 237.1140 | ||
| 30% | 1.3508 | 1.4457 | 1.6580 | 4.4965 | ||
| Weibull | 100 | 50% | 1.9527 | 1.6172 | 1.7440 | 15.8917 |
| 70% | 6.5821 | 3.5472 | 2.7664 | 100.3554 | ||
| 30% | 1.4542 | 1.5497 | 1.8479 | 4.9877 | ||
| 300 | 50% | 1.7717 | 1.6728 | 2.0006 | 13.8555 | |
| 70% | 6.6449 | 3.2972 | 2.5913 | 342.8861 | ||
| 30% | 1.2587 | 1.3127 | 1.6120 | 5.7215 | ||
| 50 | 50% | 1.7024 | 1.5140 | 1.8172 | 11.4866 | |
| 70% | 3.0868 | 2.2381 | 2.0758 | 60.1494 | ||
| 30% | 1.1876 | 1.3016 | 1.5568 | 3.3873 | ||
| Gamma | 100 | 50% | 1.5627 | 1.5493 | 2.0891 | 16.2365 |
| 70% | 2.5110 | 1.7635 | 1.8652 | 25.7414 | ||
| 30% | 1.1915 | 1.3006 | 1.6398 | 5.3034 | ||
| 300 | 50% | 1.3498 | 1.4631 | 1.7888 | 4.5218 | |
| 70% | 1.6875 | 1.6147 | 1.8513 | 39.4018 | ||
| 30% | 1.2056 | 1.3380 | 1.5858 | 2.5047 | ||
| 50 | 50% | 1.2665 | 1.3631 | 1.7085 | 3.2311 | |
| 70% | 1.6704 | 1.6946 | 1.8244 | 6.2275 | ||
| 30% | 1.2077 | 1.3660 | 1.6163 | 2.4081 | ||
| Mixture | 100 | 50% | 1.2271 | 1.3430 | 1.5578 | 2.7998 |
| 70% | 1.4012 | 1.4093 | 1.6835 | 4.1724 | ||
| 30% | 1.1806 | 1.2540 | 1.3935 | 2.8066 | ||
| 300 | 50% | 1.2199 | 1.3505 | 1.4690 | 2.4241 | |
| 70% | 1.3249 | 1.4295 | 1.7128 | 3.4491 |
We note that as long as there are some event-times, one can always derive the KM estimate. Generally, a spline technique requires some fully observed data between consecutive pairs of knots. Regarding our approach, we point out that it can also converge even if there are no data between two consecutive knots. The reason lies in the family of restrictions that we impose. As opposed to maximum likelihood-based splines that are prone to convergence problems (as also illustrated in Bantis et al., 2012) we operate in two stages: First, we derive the KM, and then smooth out (under a family of restrictions) the KM stepwise function using restricted least squares. The nature of the optimization problem involved in our approach is far more robust for smaller samples compared to conventional spline approaches. A comparison of the convergence rates has also been discussed in Bantis et al. (2012) that clearly illustrates our advantage. In addition, as a referee requested, we have run an additional simulation that refers to the Weibull distribution where we used a sample size of 25 with 50% censoring. We used 1000 monte carlo iterations and we had no convergence issues. Moreover, the obtained rMISE10–90, rMISE20–80, rMISE30–70, and rMISE80–95 are 3.2919, 2.6168, 2.6325, and 27.3536, respectively. This indicates that our method outperforms the KM for such a small sample size.
5. Application
We also apply the proposed routine based on the HCNS approach to a real data set with one covariate. The data set, which is publicly available, was presented by Kardaun [19] and consists of 90 male patients with cancer of the larynx who were treated at a Dutch hospital during the period 1970–1978. The data contain the time to the event (death) or censoring and the age of the patient as well as the stage of the disease. There are four stages, which are ordered from the least serious to the most serious (stage 1 is the least serious). The usual Cox analysis of this data set was presented in [20]. This data set has been analyzed with the proposed method in [12] with a much simpler knot placement rule than the one provided by the auto option that we decided to use here. In the file applicationlarynx, we present the necessary code to derive survival estimates for each stage at the mean age (64.6111 years) and the corresponding plots (Figs. 7-9). Derivation of the pointwise CIs for the HCNS survival curve is also provided in the same file. We also apply the proposed routine based on the HCNS approach to a real data set with one covariate. The data set, which is publicly available, was presented by Kardaun [19] and consists of 90 male patients with cancer of the larynx who were treated at a Dutch hospital during the period 1970–1978. The data contain the time to the event (death) or censoring and the age of the patient as well as the stage of the disease. There are four stages, which are ordered from the least serious to the most serious (stage 1 is the least serious). The usual Cox analysis of this data set was presented in [20]. This data set has been analyzed with the proposed method in [12] with a much simpler knot placement rule than the one provided by the auto option that we decided to use here.
Fig. 7.

Survival estimates for the four stages of larynx cancer at the mean age (64.6111). The solid lines refer to the HCNS approach, and the step functions refer to the corresponding Cox estimates. For the HCNS approach the auto procedure was used for the knot placement.
Fig. 9.
A graphical check of the proportional hazards assumption for the laryngeal cancer data based on the HCNS residuals. We observe that the HCNS estimate fitted to the residuals is close to the reference diagonal line, indicating that the assumption is valid. Top left: Covariates used are the cancer stage and patient age. Top right: Corresponding scatterplot of the HCNS residuals versus the Cox-Snell-type residuals of a regular Cox model. Bottom left: Covariates used are the cancer stage and patient age as well as the interaction of stage 2 and age. Bottom right: Corresponding scatterplot of the HCNS residuals versus the Cox-Snell-type residuals of a regular Cox model.
From the generated Fig. 7, we observe that at the mean age (64.6111 years), the survival curve of stage 1 provides higher survival probabilities. As we move to stage 4, we observe that the survival curve yields lower survival probabilities, which is what we expect.
In the previous example, we estimated the baseline survival times and used the Cox model formulation to obtain estimates for the desired profile. If, for example, we were interested in directly estimating the survival of an individual of age 64.6111 years, diagnosed with the most serious cancer stage, i.e., S(t∣age = 64.6111, stage = 4) for t = 1, 2, … , 6, along with the corresponding 95% CI for the survival estimate, based on 300 bootstrap samples, we could simply type the following input:
% Derive the desired estimates for the specific profile [0 0 1 mean(age)]: [bhat Hx Fx Sx]=HCNS(time, status, 0, Z, 'auto', [], 2, 'none', [0 0 1 mean(age)]); Shat=Sx(1:6) % obtained survival estimate values for t=1,2,…,6 Shat= 0.9266 0.8535 0.7945 0.7446 0.6885 0.6094 %Now derive the corresponding 95% CIs for these estimates: [CIH CIS CIF]= HCNSboots(time, status, 0, Z, 'auto', 2, [], [1:6], 300, 0.05, [], [0 0 1 mean(age)]) CIS= 1.0000 0.8861 0.9898 2.0000 0.7912 0.9304 3.0000 0.7224 0.8835 4.0000 0.6409 0.8508 5.0000 0.5572 0.8324 6.0000 0.4739 0.7702
The leftmost column of CIS contains the times at which the CIs are computed. The second and third columns (left to right) are the respective lower confidence limits and the upper confidence limits. In outputs CIH and CIF, the corresponding CIs for the respect cumulative hazard and cumulative distribution are also provided. Note that in this example we used the auto procedure for the knot placement. Hence, in each bootstrap sample, 420 minimizations are employed. In effect, even though the procedure is feasible, it can be time-consuming. It took about 20min to derive the CIs mentioned above. In the case where the knots are supplied by the user, this time is significantly reduced. We used HCNSboots with knots set equal to the ones derived by the HCNS estimate using the auto option. The time needed for HCNSboots to perform all calculations (again using 300 bootstrap samples) was about 8s. To obtain pointwise CIs for the whole range of the survival curve, one can simply define a find grid of points instead of [1:6]. An analogous result is visualized in Fig. 8 for the mean of the age and for cancer stage 1.
Fig. 8.

Survival estimate for patients with stage 1 laryngeal cancer, along with pointwise 95% confidence intervals.
Below we present the code for reproduction of Fig. 7.
data=[…] % A matrix that contains the data. Each column is one variable.
% 1st column contains the cancer stage variable,
% 2nd:time variable, 3rd:age, 4th:status
stage=data(:,1);time=data(:,2); %derive variables from the data matrix.
age=data(:,3);status=data(:,4);
% In the original data set code 0 is for censoring and 1 for death.
% MATLAB needs the opposite coding for the Cox model (in the HCNS the user can define the event):
status=status.*(−1)+1;
Z=[(stage==2) (stage==3) (stage==4) age]; % build the covariate matrix
% Now the data are ready for analysis
% Fit Cox model to derive baseline S0 (for Z=0):
[bcox logL H stats] = coxphfit(Z,time, 'censoring', status, 'baseline',0);
S0cox=exp(-H(:,2)); % this is the Cox baseline survival.
% Apply the HCNS routine to estimate baseline functions:
[bhat H0 F0 S0 KMdist knots gcoxhat]=HCNS(time, status, 0, Z, 'auto', 2);
g2=gcoxhat(1);g3=gcoxhat(2);g4=gcoxhat(3); gage=gcoxhat(4); % These are the cox coefs.
gr=0:0.01:8.5; % construct a grid of points over which the survival estimates will be plotted.
figure
hold on
% Plot the HCNS survival estimates for each stage:
plot(gr,S0(gr).^(exp(gage.*mean(age))),'k','LineWidth',2)
plot(gr,S0(gr).^(exp(gage.*mean(age)+g2)),'k','LineWidth',2)
plot(gr,S0(gr).^(exp(gage.*mean(age)+g3)),'k','LineWidth',2)
plot(gr,S0(gr).^(exp(gage.*mean(age)+g4)),'k','LineWidth',2)
% Plot the Cox survival estimates for each stage:
stairs(H(:,1),S0cox.^(exp(gage.*mean(age))),'k')
stairs(H(:,1),S0cox.^(exp(gage.*mean(age)+g2)),'k')
stairs(H(:,1),S0cox.^(exp(gage.*mean(age)+g3)),'k')
stairs(H(:,1),S0cox.^(exp(gage.*mean(age)+g4)),'k')
xlabel('t',ylabel('S(t)'
From the generated Fig. 7 we observe that, at the mean of age (64.6111), the survival curve of stage 1 provides higher survival probabilities. As we move to stage 4 we observe that the survival curves yield lower survival probabilities which is what we expect.
6. Discussion
The KM and Cox models enjoy broad acceptance in clinical practice when it comes to studies that evaluate a survival time endpoint. However, there is a gap between the statistical and clinical fields when it comes to employment of more modern and efficient methods. In this paper, we present improved spline based survival estimates using convex optimization. We do not maximize a likelihood at any stage to smooth out a KM or a Cox based survival estimate. Furthermore, we provide a robust knot selection scheme. We also provide a user-friendly fully automated routine that needs no initial values or input for the locations of the knots. The notion we follow is to estimate a survival curve through the cumulative hazard function based on natural cubic splines that are forced to be monotone. Our approach uses a linear approximation for the region of monotonicity that reduces the problem to a restricted least squares one, with linear restrictions on the parameters. In effect, the function to minimize is always convex, and the numerical stability of the code is ensured (in MATLAB, this kind of optimization problem is addressed by function lsqlin, which is also the function used by our proposed routine). This is not the case for other spline-based methods that attempt to maximize a likelihood in the presence of censoring. Our simulation studies show that our proposed technique outperforms the KM estimator in all cases considered, which involved 36 different scenarios. The coverage of the corresponding CIs that can be derived using the percentile bootstrap is satisfactory and yields much better results than the ones based on the Greenwoodâ formula (or based on the approach taken in [21] in the case of covariates) near the right tail of the distribution, where few data are available (see [12]). The corresponding CIs based on the Greenwood formula for the KM estimator might completely collapse, by construction, when interest lies in the tail of the survival curve. This is not the case for the HCNS approach.
Our approach can also accommodate covariates through the usual Cox model. For knot placement, an automatic procedure with 6 knots is available as a default. The user can explore other knot placement schemes and different numbers of knots, but caution is needed since the model is zero before the first knot. More knots may be required in the early time values. Note that the placement of the first knot at the first event time provides a natural imitation of the KM estimator. Apart from that, the HCNS approach is optimized based on its distance from the broadly used KM estimator (or the Cox model baseline estimate in the case of covariates), thus its robustness is not compromised and there is no concern of overfitting. This is because overfitting can only be defined (by construction) as over-capturing the KM estimator. Hence, a limiting case of severe overfitting (say, by using a large number of knots) would result in the KM estimator itself. This is not the case for other spline-based approaches in which an excessive number of knots will actually result in overfitting by providing a noisy estimate and/or computational problems that may or may not depend on the optimization’s initial values, which in our case are completely unneeded throughout. The code for our proposed approach will be maintained/updated through the first author’s personal website.
Acknowledgement
The research of Dr. Leonidas E. Bantis is supported in part by the COBRE grant (NIH) P20GM130423 and by the NIH Clinical and Translational Science Award (UL1TR002366) to the University of Kansas.
Illustration of the approximation of the monotonicity region
We note that what follows is not required in order for a practitioner to employ our routines. The following code provides, for a given k = 1, 2, … , the optimal in terms of inscribed area (8k + 2)-gon for approximating the region of monotonicity M. (If k is not set to be a positive integer then an error will appear.)
k=3; % This can be set to any positive integer for finer approximations
xc=2;yc=2; % center of the ellipse
theta=-pi/4; % tilt of the major axis
a=2.44949; % major semi-axis of the ellipse
b=sqrt(2); % minor semi-axis of the ellipse
N=12*k; % Points to approximate the ellipse
N=N+1;
% Apply Smith's algorithm to approximate the ellipse:
df=2*pi/(N-1);
CT=cos(theta);
ST=sin(theta);
CDP=cos(df);SDP=sin(df);CNDP=1;SNDP=0;
x=zeros(1,N);y=x; %preallocation
for n=1:N
xp=a*CNDP;
yp=b*SNDP;
x(n)=xc+xp*CT-yp*ST;
y(n)=yc+xp*ST+yp*CT;
TEMP=CNDP*CDP-SNDP*SDP;
SNDP=SNDP*CDP+CNDP*SDP;
CNDP=TEMP;
end
% Plot the approximation of the ellipse:
plot(x,y,'k');hold on
axis square;axis([0 4 0 4]);
%Re arrange data so as the first point to be (x,y)=(3,0):
M=[x' y'];M(max(size(M)),:)=[];M=circshift(M,k);
x=M(:,1);y=M(:,2);
%Plot the approximation of the region of monotonicity:
plot(x(1:(8*k+1)),y(1:(8*k+1)),'ok')
plot([3 0 3 0],[3 3 0 0],'ok')
xlabel('a');ylabel('b')
%Shade the approximated region:
fill([x(1:(8*k+1));0],[y(1:(8*k+1));0],[0.7,0.7,0.7]);alpha(0.5)
hold off
%Finally the points of approximation of the monotonicity region are:
xpoints=[x(1:(8*k+1));0];ypoints= y(1:(8*k+1));0];
Footnotes
Declaration of Competing Interest
The authors have no conflict of interest to report.
References
- [1].Kaplan EL, Meier P, Nonparametric estimation from incomplete observations, J. Am. Stat. Assoc 53 (1958) 457–481. [Google Scholar]
- [2].Cox DR, Regression models and life-tables, J. R. Stat. Soc. Ser. B 74 (1972) 187–220. [Google Scholar]
- [3].Cox C, Chu H, Schneider MF, Muñoz A, Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution, Stat. Med 26(23) (1974) 4352–4374. [DOI] [PubMed] [Google Scholar]
- [4].Boor C.d., A Practical Guide to Splines, Springer-Verlag New York, Inc., 2001. [Google Scholar]
- [5].Harrell FE, Regression Modelling Strategies (With Applications to Linear Models, Logistic Regression, and Survival Analysis), Springer Series in Statistics, 2001. [Google Scholar]
- [6].Kooperberg C, Stone JS., Logspline density estimation for censored data, J. Comput. Graph. Stat 1 (1992) 301–328. [Google Scholar]
- [7].Herndon JE, Harrell FE Jr., The restricted cubic spline as baseline hazard in the proportional hazards model with step function time-dependent covariables, Stat. Med 14 (19) (1995) 2119–2129. [DOI] [PubMed] [Google Scholar]
- [8].Wand MP, Jones MC, Kernel smoothing, Monographs on Statistics and Applied Probability, Chapman and Hall, London, 1995. [Google Scholar]
- [9].Yong Z, Kernel estimators are better than empirical, Stat. Probab. Lett 44 (3) (1999) 221–228. [Google Scholar]
- [10].Bagaria B, Sood S, Sharma R, Lalwani S, Comparative study of CEA and CA19-9 in esophageal, gastric and colon cancers individually and in combination (ROC curve analysis), Cancer Biol. Med 10(3) (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Pan W, Smooth estimation of the survival function for interval censored data, Statistics in Medicine 19 (2000). 2611U2624. [DOI] [PubMed] [Google Scholar]
- [12].Bantis LE, Tsimikas JV, Georgiou SD, Survival estimation through the cumulative hazard with constrained natural cubic splines, Lifetime Data Analysis, 2013. DOI: 10.1007/s10985-012-9218-4. [DOI] [PubMed] [Google Scholar]
- [13].Yuan C, Clish CB, Wu C, Mayers JR, Kraft P, Townsend MK, Zhang M, Tworoger SS, Bao Y, Qian ZR, Rubinson DA, Ng K, Giovannucci EL, Ogino S, Stampfer MJ, Gaziano JM, Ma J, Sesso HD, Anderson GL, Cochrane BB, Manson JE, Torrence ME, Kimmelman AC, Amundadottir LT, Vander-Heiden MG, Fuchs CS, Wolpin BM, Circulating metabolites and survival among patients with pancreatic cancer, J. Natl. Cancer Inst 108 (6) (2016) 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Solomon BM, Rabe KG, Slager SL, Brewer JD, Cerhan JR, Shanafelt D, Overall and cancer-specific survival of patients with breast, colon, kidney, and lung cancers with and without chronic lymphocytic leukemia: a SEER population-based study, J. Clin. Oncol 31 (2013) 930–937. [DOI] [PubMed] [Google Scholar]
- [15].Scher HI, Fizazi K, Saad F, Taplin M-E, Sternberg CN, Miller K, de Wit R, Mulders P, Chi KN, Shore ND, Armstrong AJ, Flaig TW, Fl chon A, Mainwaring P, Fleming M, Hainsworth JD, Hirmand M, Selby B, Seely L, de Bono JS, Increased survival with enzalutamide in prostate cancer after chemotherapy, N. Engl. J. Med 367 (2012) 1187–1197. [DOI] [PubMed] [Google Scholar]
- [16].Carlson RE, Monotone piecewise cubic interpolation, SIAM J. Numer. Anal 2 (1980) 238–246. [Google Scholar]
- [17].Smith LB, Drawing ellipses, hyperbolas or parabolas with a fixed number of points and maximum inscribed area, Comput. J 14 (1970) 81–86. [Google Scholar]
- [18].Loughin TM, On the bootstrap and monotone likelihood in the cox proportional hazards regression model, Lifetime Data Anal. 4 (1998) 393–403. [DOI] [PubMed] [Google Scholar]
- [19].Kardaun O, Statistical analysis of male larynx-cancer patients UA case study, Stat. Nederlandica 37 (1983) 103–126. [Google Scholar]
- [20].Klein JP, Moeschberger ML, Survival Analysis, Techniques for Censored and Truncated Data, Springer Verlag, 2003. [Google Scholar]
- [21].Link CL, Confidence intervals for the survival function using cox’s proportional hazards model with covariates, Biometrics 40 (1984) 601–610. [PubMed] [Google Scholar]

