A semi-parametric generalization of the Cox proportional hazards regression model: Inference and Applications

Karthik Devarajan; Nader Ebrahimi

doi:10.1016/j.csda.2010.06.010

. Author manuscript; available in PMC: 2012 Jan 1.

Published in final edited form as: Comput Stat Data Anal. 2011 Jan 1;55(1):667–676. doi: 10.1016/j.csda.2010.06.010

A semi-parametric generalization of the Cox proportional hazards regression model: Inference and Applications

Karthik Devarajan ^a,^*, Nader Ebrahimi ^b

PMCID: PMC2976538 NIHMSID: NIHMS216277 PMID: 21076652

Abstract

The assumption of proportional hazards (PH) fundamental to the Cox PH model sometimes may not hold in practice. In this paper, we propose a generalization of the Cox PH model in terms of the cumulative hazard function taking a form similar to the Cox PH model, with the extension that the baseline cumulative hazard function is raised to a power function. Our model allows for interaction between covariates and the baseline hazard and it also includes, for the two sample problem, the case of two Weibull distributions and two extreme value distributions differing in both scale and shape parameters. The partial likelihood approach can not be applied here to estimate the model parameters. We use the full likelihood approach via a cubic B-spline approximation for the baseline hazard to estimate the model parameters. A semi-automatic procedure for knot selection based on Akaike’s Information Criterion is developed. We illustrate the applicability of our approach using real-life data.

Keywords: censored survival data analysis, crossing hazards, Frailty model, maximum likelihood, regression, spline function, Akaike information criterion, Weibull distribution, extreme value distribution

1 Introduction

The modeling and analysis of data in which the principal endpoint is the time until an event occurs is often of prime interest in medical and engineering studies. Typically, such an event is the onset of a disease or death itself as seen in clinical trials or failure of an item or a system as seen in industrial life testing. The time to an event is normally referred to as survival or failure time.

The primary goal in analyzing censored survival data is to assess the dependence of survival time on covariates. The secondary goal is the estimation of the underlying distribution of survival time. The Cox Proportional Hazards (PH) model (Cox, 1972) is a standard tool for exploring the association of covariates with survival time. An interesting feature of this model is that it is semi-parametric in the sense that it can be factored into a parametric part consisting of a regression parameter vector associated with the covariates and a non-parametric part that can be left completely unspecified.

In the Cox PH model, given a vector of possibly time-dependent covariates z, the hazard function at time t is assumed to be of the form

λ (t ∣ z) = λ_{0} (t) g (z) .

(1.1)

where λ₀(t) is the baseline hazard function, denoting the hazard under no covariate effect and g(z) is a non-negative function of the covariate vector z, referred to as the risk function, such that g(0) = 1. The most commonly used form of the Cox PH model is

λ (t ∣ z) = λ_{0} (t) e^{β^{'} z}

(1.2)

where β = (β₁,…,β_p)′ is a p vector of regression coefficients. The focus is on inference for β, with the baseline hazard function, λ₀(t), the non-parametric part, left completely unspecified.

In spite of its semi-parametric feature, the Cox PH model implicitly assumes that the hazard and survival curves corresponding to two different values of the covariates do not cross. Although this assumption may be valid in many experimental settings, it has been found to be suspect in others. For example, if the treatment effect decreases with time, then one might expect the hazard curves corresponding to the treatment and control groups to converge. Other examples that indicate the presence of non-proportional hazards are also given in Gore et al. (1984), and Tonak et al. (1979), among others.

In this paper, we describe a semi-parametric generalization of the Cox PH model which allows crossing of hazards as well as survival functions. In Section 2, we discuss its unique properties and place it within the context of censored survival data analysis. In Section 3, we describe an estimation procedure for this model using cubic B-spline approximations for the baseline hazard. We illustrate our method with real-life examples in Section 4, and in Section 5 we provide some concluding remarks.

2 A Semi-Parametric Generalization of the Cox PH Model

We describe a semi-parametric generalization of the Cox PH model in which the hazard functions corresponding to different values of the covariates can cross. The special case of this model was originally introduced by Quantin et al. (1996) for the purpose of goodness of fit testing of the Cox PH model. Devarajan (2000) outlined the unique properties of this non-proportional hazards regression model as well as inference for this model using maximum penalized likelihood estimation, and provided a theoretical justification for using spline approximations for the baseline hazard. In addition, Devarajan and Ebrahimi (2002, p.237) used this model and developed a goodness of fit procedure for testing the Cox PH model. In independent work, Hsieh (2001) and Wu & Hsieh (2009) discussed an estimating equations approach for this model by approximating the baseline hazard using piecewise constants.

In our model, the survival function corresponding to a covariate vector z is assumed to be of the form

S (t ∣ z) = g (z) {S_{0} (t)}^{h (z)},

(2.1)

where S₀(t) is an arbitrary baseline survival function, and g(z) and h(z) are nonnegative functions of the covariate vector z such that g(0) = h(0) = 1. This model includes, for the two sample problem, the case of two Weibull and two extreme value distributions differing in both scale and shape parameters. The Cox PH model is obtained as a special case of model (2.1) by setting h(z) = 1. In this paper, we will consider only the exponential function for g and h, i.e., g(z) = exp(β′z) and h(z) = exp(γ′z) where β and γ are unknown p vectors of parameters. In terms of cumulative hazard functions, our non-proportional hazards regression model takes the specific form

Λ (t ∣ z) = e^{β^{'} z} {Λ_{0} (t)}^{\exp (γ^{'} z)} .

(2.2)

The Cox PH model is obtained as a special case of model (2.2) by setting γ = 0. The conditional survival function is

S (t ∣ z) = \exp {- e^{β^{'} z} {[- \log S_{0} (t)]}^{\exp (γ^{'} z)}}

(2.3)

and the conditional hazard function is

λ (t ∣ z) = λ_{0} (t) \exp [{(β + γ)}^{'} z + {e^{γ^{'} z} - 1} \log {Λ_{0} (t)}] .

(2.4)

Applying a complementary log(− log) transformation in (2.3), we get

\log {- \log {S (t ∣ z)}} = β^{'} z + \exp (γ^{'} z) \log {- \log {S_{0} (t)}}

The above equation can be expressed as

ψ {S (t ∣ z)} = \exp (γ^{'} z) \cdot h (t) + β^{'} z

(2.5)

where ψ(x) = log(− log(x)) and h(t) = log{− log{S₀(t)}}. This can be shown to be a member of the family of models described in Cheng et al. (1997) (see Devarajan, 2000, pp.45-48 for details). An equivalent version of (2.5) is

\exp (γ^{'} z) \cdot h (T) = - β^{'} z + ε

(2.6)

where the error ε has distribution function F = 1 − ψ⁻¹. Since the distribution of the baseline h(T) = log{− log(S₀(T)}} is unit extreme value, equation (2.6) results in a scale and shape transformation of this unit extreme value distribution. The generalized model (2.3) can be interpreted as a transformation of the unit extreme value distribution in terms of reparametrizations of the scale and shape parameters. Similarly, applying a log transformation in (2.3) (with ψ(x) = log(x) and h(t) = log{S₀(t)}) and using similar arguments as above, one can interpret our generalized model as a transformation of the unit exponential distribution in terms of reparametrizations of the scale and shape parameters.

2.1 Some Useful Features of the Generalization

We describe several useful features of our generalized model and highlight its relationship to other survival models. These properties provide the framework for a generalized approach to censored survival data analysis.

2.1.1 Crossing hazards over time

The hazards ratio corresponding to two different covariate vectors z₁ and z₂ is

\frac{λ (t ∣ z_{1})}{λ (t ∣ z_{2})} = \exp {{(β + γ)}^{'} (z_{1} - z_{2}) + [e^{γ^{'} z_{1}} - e^{γ^{'} z_{2}}] \log [Λ_{0} (t)]} .

(2.7)

Since this ratio is a monotone function of t, the model allows the hazards ratio to invert over time. In other words, it allows crossing of hazard curves. For example, when treatment effect decreases or increases over time, model (2.2) can be applied.

2.1.2 Relation to the Time-Dependent Coefficient Cox PH Model

If γ is assumed to be close to zero, we can approximate the right hand side of equation (2.2) as follows:

Λ (t ∣ z) = e^{β^{'} z} {Λ_{0} (t)}^{1 + γ^{'} z} .

Thus, for two different covariate vectors z₁ and z₂, we have,

\begin{array}{l} \log {\frac{Λ (t ∣ z_{1})}{Λ (t ∣ z_{2})}} & = & {[β + \log {Λ_{0} (t)} \cdot γ]}^{'} (z_{1} - z_{2}) \\ = & η^{'} (t) (z_{1} - z_{2}), \end{array}

(2.8)

where η(t) = β + g(t) · γ with g(t) = log{Λ₀(t)}. This is a special case of the Cox PH model with time-dependent coefficients η(t) which allows for crossing of hazard curves. When the deviation from proportional hazards is small, our proposed model approximates the Cox PH model with time dependent coeffcients.

In terms of the hazard functions,

\frac{λ (t ∣ z_{1})}{λ (t ∣ z_{2})} = \exp {β + h (t) \cdot γ}^{'} (z_{1} - z_{2}),

(2.9)

where h(t) = 1+log{Λ₀(t)}. Cox (1972) considered the case where h(t) = t, a dummy time-dependent variable for goodness of fit testing of the Cox PH model. Therneau and Grambsch (1994) have considered the model (2.9) with an assumed form for h(t).

2.1.3 Proportionality of the Hazard-Cumulative Hazard Ratios

Using equations (2.2) and (2.4), we see that,

\frac{λ (t ∣ z)}{Λ (t ∣ z)} = \frac{λ_{0} (t)}{Λ_{0} (t)} \exp (γ^{'} z) .

(2.10)

Let k(t∣z) denote the conditional growth rate of the logarithm of the cumulative hazard function. Then $k (t ∣ z) = \frac{d}{dt} \log Λ (t ∣ z) = \frac{λ (t ∣ z)}{Λ (t ∣ z)} .$ Using (2.10), we see that

k (t ∣ z) = k_{0} (t) \exp (γ^{'} z),

(2.11)

where $k_{0} (t) = \frac{λ_{0} (t)}{Λ_{0} (t)}$ is the growth rate of the logarithm of the baseline cumulative hazard function.

Equation (2.10) has a form similar to that of (1.2), the Cox PH model. The difference between the two is that (1.2) models the growth rate of the cumulative hazard function while (2.10) models the growth rate of the logarithm of the cumulative hazard function. Equation (2.10) implies that the ratio of the hazard function to the cumulative hazard function, given a covariate vector z, is proportional to the ratio of the hazard function to the cumulative hazard function at baseline. It is worth mentioning here that this property is related to the constancy of the ratio of the hazard function to the cumulative hazard function of the extreme value distribution. Note that if we set γ = 0 in (2.10), the ratios of hazard to cumulative hazard in (2.10) are equal and it reduces to the Cox PH model.

2.1.4 Proportionality of the Logarithm of Cumulative Hazards

For β = 0, model (2.2) reduces to

Λ (t ∣ z) = {Λ_{0} (t)}^{\exp (γ^{'} z)} .

(2.12)

Model (2.12) is similar to (2.2) and has all its features with the exception that it does not include the Cox PH model. By taking logarithm on both sides of (2.12), it is easy to see that the logarithms of the cumulative hazards are proportional.

2.1.5 Relation to the Frailty Model

In survival analysis, it has been found that differences between hazards due to covariate effects tend to converge as follow-up time elapses. Such an effect can be accounted for by postulating the existence of unobserved random effects or frailties with prognostic value. In terms of hazard functions, we can write

λ (t ∣ z, u) = λ_{0} (t) e^{β^{'} z} u,

(2.13)

where u is an unobserved frailty or random effect (see Hougaard (2000) pp.215-245 for more details). Assuming that u has a positive stable distribution with parameter θ, one can easily show that

Λ (t ∣ z) = {Λ_{0} (t)}^{\exp (θ)} e^{u θ β^{'} z} .

(2.14)

It is clear from (2.14) that if we define γ = (θ, 0, ⋯, 0), then our proposed model and (2.14) will be the same. When frailty interacts with treatment, we can re-write (2.14) as

λ (t ∣ z, u) = λ_{0} (t) e^{u β^{'} z} .

(2.15)

This also leads to crossing or diverging hazards (Cuzick & Trejo, 1992, p.65; Hastie & Tibshirani, 1993) and can be shown to be equivalent to (2.14), and hence to our proposed model.

2.1.6 Interaction between covariates and baseline hazard

We can rewrite the conditional survival function given by (2.3) as

\begin{array}{l} S (t ∣ z) & = & \exp {- e^{β^{'} z} {[- \log S_{0} (t)]}^{\exp (γ^{'} z)}} \\ = & g_{0} {β^{'} z + e^{γ^{'} z} h_{0} (t)} \end{array}

(2.16)

where g₀(x) = exp{−exp(x)} and h₀(t) = log Λ₀(t). The generalized model (2.3) allows for interaction between covariates and the baseline hazard.

2.2 Interpretation of the model

For the two-sample problem, the non-proportional hazards model (2.2) can be written as

Λ (t ∣ z = 1) = {Λ_{0} (t)}^{\exp (γ)} e^{β}

(2.17)

where Λ₀(t) = Λ(t∣z = 0) and Λ(t∣z = 1) are the conditional cumulative hazard functions for the control and treatment groups respectively. From (2.17), we see that as γ → 0, the model becomes proportional in hazards. As ∣γ∣ → ∞, the model deviates from the proportionality assumption, i.e., large values of ∣γ∣ may be indicative of diverging or converging hazards over time. One way to interpret the parameter γ in this model is to consider the following equation obtained from (2.17) using (2.10),

\frac{λ (t ∣ z = 1)}{Λ (t ∣ z = 1)} = \frac{λ_{0} (t)}{Λ_{0} (t)} \exp (γ),

(2.18)

which implies that

\log {\frac{\frac{λ (t ∣ z = 1)}{Λ (t ∣ z = 1)}}{\frac{λ_{0} (t)}{Λ_{0} (t)}}} = γ,

a result analogous to that obtained (as the logarithm of the relative risk) for the regression parameter β in the Cox PH model in the two-sample setting.

3 Estimation for the Non-Proportional Hazards Model

The observed data consist of independent observations on the triple (X, δ, z), where X is the minimum of a failure and censoring time pair (T, C), δ = I(T ≤ C) is the indicator of the event that a failure has been observed and z = (z₁, … , z_p)′ is a p vector of covariates. The random variables T and C denote the survival and censoring times respectively which are assumed to be independent.

The fundamental assumption of proportionality of hazards in the Cox PH model (1.2) requires that the hazards ratio remains constant over time. The partial likelihood approach of Cox (1972) is the standard inferential method for this model. The baseline hazard, λ₀(t), drops out of this partial likelihood and can be left completely unspecified (Cox, 1972). On the other hand, it is evident from equations (2.4) and (2.7) that the generalized model (2.2) allows crossing of hazard curves over time. Due to the non-constant hazards ratio, a factorization of the likelihood into a “partial” likelihood and a nuisance part is not possible. The baseline hazard must be specified in the likelihood and needs to be explicitly estimated. There are several methods available for estimation in this model. For example, Bordes & Breuils (2006) discussed sequential estimation for semi-parametric models with application to the PH model, while Wu & Hsieh (2009) adopted a piecewise constant approach to estimating the baseline hazard in model (2.2). One could use a set of parametric basis functions, such as splines, to approximate the baseline hazard and then proceed with maximum likelihood (ML) methods to estimate the parameters of the model. Splines have been used to model the baseline hazard in the Cox PH model by Durrelman & Simon (1989), Gray (1992) and Rosenberg (1995).

Models based on spline approximations are intermediate in structure between completely parametric models and non-parametric methods, and provide flexibility in incorporating a variety of shapes. Under certain regularity conditions, Devarajan (2000, pp.53-58) showed that the maximum penalized likelihood estimate of the baseline hazard and baseline cumulative hazard in model (2.2) are exponential splines with knots at the unique failure times. This provides us with a theoretical justification for using spline functions to model the baseline hazard and cumulative hazard functions. de Boor (2001, Chapter IX, p.87) describes in detail the computational properties of B-spline basis functions. The B-spline basis function has the properties of a probability density function and a linear combination of such functions is like a mixture of densities that allows for various distributions. In this paper, we use a set of cubic B-spline basis functions to model the baseline hazard.

3.1 Modeling the Baseline Hazard using B-Splines

We assume that the baseline hazard function, λ₀(t), and its first two derivatives are continuous and model it as a linear combination of cubic B-spline basis functions. This model consists of K + 1 segments determined by K knots t₀ < t₁ < t₂ < … < t_K < t_K+1 over the interval [t₀, t_K+1] subject to the constraints that adjacent functions and their first two derivatives agree at the knots. It requires total of 4(K + 1) parameters and K + 4 basis functions (Devarajan (2000, p.72); de Boor (2001)).

We also define six additional ‘slack’ knots (Rosenberg (1995)) given by t₋₃ = t₀ − 3, t₋₂ = t₀ − 2, t₋₁ = t₀ − 1, t_K+2 = t_K+1 + 1, t_K+3 = t_K+1 + 2 and t_K+4 = t _K+1 + 3 where t₀ = min(x_i), t_K+1 = max(x_i) and (x_i, δ_i), i = 1, … , n are right-censored observations. Using the result from Schumaker (1981, p.46), the cubic B-Spline basis functions are given by

B_{k} (t) = (t_{k + 4} - t_{k}) \sum_{l = k}^{k + 4} {\frac{{(t_{ℓ} - t)}_{+}^{3}}{\prod_{\begin{matrix} m = k \\ m \neq l \end{matrix}}^{k + 4} t_{m} - t_{l}}}

(3.1)

where x₊ = x if x ≥ 0 and zero otherwise. Using (2.2) and (2.4), the log-likelihood for right-censored observations (x_i, δ_i), i = 1, … , n where δ_i is the censoring indicator, is given by

ℓ = \sum_{i = 1}^{n} {δ_{i} [\log λ_{0} (x_{i}) + {(β + γ)}^{'} z_{i} + (e^{γ^{'} z_{i}} - 1) \cdot \log Λ_{0} (x_{i})] - {[Λ_{0} (x_{i}) - Λ_{0} (t_{0})]}^{\exp (γ^{'} z_{i})} e^{β^{'} z_{i}}} .

(3.2)

Using a linear combination of B-spline basis functions, the baseline hazard is given by $λ_{0} (t ∣ α) = \sum_{k = - 3}^{K} η_{k} B_{k} (t)$ with B_k(t) defined as in (3.1). With the number of knots fixed at K, the order of approximation attainable is O(K⁻⁴) (Devarajan (2000, p.73); de Boor (2001)).We take η_k = e^αk to ensure positivity of the hazard function λ₀(t∣α). The cumulative baseline hazard function is $Λ_{0} (t ∣ α) = \sum_{k = - 3}^{K} e^{α k} I_{k} (t)$ where

I_{k} (t) = \frac{(t_{k + 4} - t_{k})}{4} {\sum_{l = k}^{k + 4} \frac{{t_{l}}_{+}^{4}}{\prod_{\begin{matrix} m = k \\ m \neq l \end{matrix}}^{k + 4} (t_{m} - t_{l})}} - \frac{(t_{k + 4} - t_{k})}{4} {\sum_{l = k}^{k + 4} \frac{{(t_{l} - t)}_{+}^{4}}{\prod_{\begin{matrix} m = k \\ m \neq l \end{matrix}}^{k + 4} (t_{m} - t_{l})}}

is a quartic B-spline basis function. Since I_k(t) is monotone increasing and positive and e^αk is positive for each k, monotonicity and positivity of the cumulative hazard function is ensured. Now, the log-likelihood given by (3.2) becomes,

ℓ = \sum_{i = 1}^{n} δ_{i} {\log (\sum_{k = - 3}^{K} e^{α_{k}} B_{k} (x_{i})) + {(β + γ)}^{'} z_{i} + (e^{γ^{'} z_{i}} - 1) \cdot \log (\sum_{k = - 3}^{K} e^{α_{k}} I_{k} (x_{i}))} - {\sum_{i = 1}^{n} {\sum_{k = - 3}^{K} e^{α_{k}} [I_{k} (x_{i}) - I_{k} (t_{0})]}}^{\exp^{(γ^{'} z_{i})}} e^{β^{'} z_{i}} .

(3.3)

This equation is maximized with respect to the model parameters α, β and γ to obtain the maximum likelihood (ML) estimates. Using the second partial derivatives of the likelihood function with respect to the model parameters, we obtain the Fisher information matrix and hence the variance-covariance matrix of the parameter estimates. If n → ∞ at a rate faster than the rate at which K → ∞, then one can show that our proposed estimators are consistent estimators and are normally distributed (Devarajan, 2000, pp.73-74).

3.2 Implementation

ML estimation of model parameters was implemented using the Broyden-Fletch-Goldfarb-Shanno (BFGS) method for the unconstrained optimization of a nonlinear multivariate function in double precision. Routines from the IMSL Fortran Library (DU2ING and DU2POL) were used for maximization of the likelihood function with respect to model parameters α, β and γ to obtain the ML estimates. Multiple restarts of the optimization procedure were performed with various starting values to check the consistency and validity of the results.

3.3 Knot Selection

An integral part of the estimation procedure using any spline approximations is the choice of the number and location of the knots. In general, it is desirable to have the knots reasonably spread out with roughly equal number of data points between successive knots. Knot selection has been discussed by many authors in the context of hazard estimation. We develop a semi-automatic knot selection procedure that involves initial knot placement based on quantiles equally spaced in percentile ranks of the uncensored observations followed by stepwise knot deletion and final model selection based on Akaike’s Information Criterion (AIC). During the stepwise deletion process, a sequence of models indexed by ν is obtained such that the ν^th model has n_ν parameters. In each step, only the significant parameters are retained. The significance of any parameter associated with a covariate or a knot, say β, is determined by its Wald z-score. All non-significant knots are removed from the likelihood function at each stage as they do not provide a good approximation to the baseline hazard. If l_ν denotes the log-likelihood of the ν^th model, then the AIC for this model, with penalty parameter a = 2 is given by

AIC (a, ν, K) = - 2 ℓ_{ν} + a n_{ν} = - 2 ℓ (α, β, γ; K) + 2 n_{ν} .

The model ν corresponding to the minimum value of AIC(a = 2, ν,K) is chosen as the final model where n_ν is the number of significant parameters in the final model. This method provides a balance between computational complexity and model flexibility.

4 Illustration of our Methods

We illustrate estimation in the non-proportional hazards model (2.2) using real-life examples. All figures presented were created using the R statistical language and environment (R Development Core Team (2009), www.R-project.org).

4.1 Example 1: Leukemia Data

This dataset consists of times to remission (in weeks) of 42 leukemia patients (Kalbfleisch and Prentice, 1980, pp.205-206). Patients are randomized into two groups of 21 each - a control and a treated group. Group membership is the only covariate of interest and is specified by z where z = 0, 1 denote the treatment and control groups, respectively. Figure 1(a) presents the Kaplan-Meier survival curves for these groups.

(a): Kaplan-Meier Survival Curves (Leukemia Data)

(b) Fitted Survival Curves (Leukemia Data)

First, we modeled this data using the non-proportional hazards regression model (2.2) with B-spline approximations for the baseline hazard. Slack knots were chosen as described in Section 3.1. The remaining knots were chosen based on quantiles equally spaced in percentile ranks of the uncensored observations. This model consists of two parameters, β and γ. The knot selection procedure described in Section 3.3 was implemented for this data set. The model with three knots based on quantiles of uncensored observations provided the best fit using the AIC criterion. Table 4.1 presents a summary of the knot selection procedure where K denotes the number of knots. Results of the final model fit are provided in Table 4.2. The log-likelihood corresponding to this final model is -106.084. The parameter estimates corresponding to the 4 slack knots are α̂₋₃ = −19.93, α̂₋₂ = −4.15, α̂₋₁ = −3.97 and α̂₀ = −3:45; and those corresponding to the 3 knots based on quantiles are α̂₁ = −3.68, α̂₂ = −2.88 and α̂₃ = −16.92. The parameter γ provides information about the non-proportionality in the hazards and its estimate is close to zero. From the results presented in Table 4.2, we observe that this parameter is highly non-significant while the parameter β is highly significant at the 5% level. Figure 1(b) presents the fitted survival curves for the two groups based on model (3.1) computed using the parameter estimates specified above.

Table 4.4.

Mice Data - Knot Sequence: 202, 244, 300, 428 Quantiles of Uncensored Observations based on Final Model

Parameter	Estimate	Std. Error	Wald z-score	p-value

β	-0.414	0.222	-1.865	0.0622
γ	-0.348	0.144	-2.417	0.0157

Open in a new tab

Next, we fit the Cox PH model to this data. We obtained an estimate of 1.57 (with standard error 0.412) for the regression coefficient β and a highly significant p-value of 0.00014. Our results for the fit based on the non-proportional hazards model (2.2) corroborate those based on a fit of the Cox PH model. In addition, the fitted parametric survival curves presented in Figure 1(b) lend evidence to the same conclusion as their non-parametric counterparts shown in Figure 1(a). The Cox PH model thus provides a good fit for this example and the estimated treatment effect is indeed 1.57 under both models.

Example 2: Mice Data

As a general rule, we recommend fitting the generalized model (2.2) to a dataset since the Cox PH model is embedded within this model. If the assumption of proportionality is indeed true, we expect it to be reflected in the model fit as seen in this example. On the other hand, any evidence of crossing hazards will likely be indicated by the fit of the more general model. We illustrate this in our next example.

In this example, we analyze data from an animal experiment presented in Wei (1984) in which a control group of 22 male mice are subject to 300 rads of radiation and followed for cancer incidence. The treatment group consists of 29 mice placed in a germ-free environment. The purpose of the study was to determine the effect of a germ-free environment in developing thymic lymphoma. All 51 observations in this dataset were uncensored. Figure 2(a) presents the survival curves for the control and treatment groups. The survival curves cross at various points until about 250 days after which the treatment group has higher survival probability than the control group.

(a) Kaplan-Meier Survival Curves (Mice Data)

(b) Fitted Survival Curves (Mice Data)

First, we modeled this data using the non-proportional hazards regression model (2.2) with B-spline approximations for the baseline hazard. Slack knots were chosen as described in Section 3.1, and the remaining knots were chosen based on quantiles equally spaced in percentile ranks of the uncensored observations. As in the previous example, this model contains two parameters, β and γ. Once again, the knot selection procedure described in Section 3.3 was implemented for this data. The best model, chosen using the AIC criterion, consisted of four knots based on quantiles of uncensored observations. Table 4.3 presents a summary of the knot selection procedure for this data. Results of the final model fit are provided in Table 4.4. The log-likelihood corresponding to this final model is -301.369. The parameter estimates corresponding to the 4 slack knots are α̂₋₃ = −5.93, α̂₋₂ = −169.71, α̂₋₁ = −4.73 and α̂₀ = −4.18; and those corresponding to the 4 knots based on quantiles are α̂₁ = −34:21, α̂₂ = −3.20, α̂₃ = −12.79 and α̂₄ = −2.54.

From the results of the final model fit presented in Table 4.4, we observe that the parameter β is marginally significant at the 5% level while the parameter γ is highly significant. This indicates that the hazards for the two treatment groups are non-proportional and corroborates the graphical evidence from Figure 2(a). We computed the fitted survival curves for the two groups based on (3.1) using the estimates of the model parameters specified above. These parametric curves, displayed in Figure 2(b), cross in a similar fashion to that seen in their non-parametric counterparts shown in Figure 2(a).

For exploratory purposes, we also analyzed this data set using the Cox PH model. The regression coefficient in this model, β, was only marginally significant at the 5% level (p-value = 0.0732). This suggests a possible non-proportionality of the hazards corresponding to the treatment and placebo groups. Using (2.18), we obtain

\log {\frac{\frac{\hat{λ (t ∣ z} = 1)}{Λ (t ∣ z = 1)}}{\frac{λ_{0} (t)}{Λ_{0} (t)}}} = \hat{γ} = - 0.348

for this example and thus we expect

\frac{λ (t ∣ z = 1)}{Λ (t ∣ z = 1)} < \frac{λ_{0} (t)}{Λ_{0} (t)} .

This indicates an interaction between the baseline hazard and treatment. Note that proportionality of hazards amounts to equality of the hazard-cumulative hazard ratios in (2.18). Using (2.17), we also obtain

\frac{\log \hat{Λ} (t ∣ z = 1)}{\log {\hat{Λ}}_{0} (t)} = \exp (- 0.348) = 0.71 .

That is, mice in the treatment group have significantly better survival rate than the control group.

These examples suggest that a few knots based on equally spaced quantiles of uncensored observations provide a reasonably good approximation to the unknown baseline hazard. Also, a fit of this generalized model provides a method for testing the assumption of proportional hazards.

Table 4.1.

Summary of Knot Selection Procedure for Leukemia Data

K	β̂	γ̂	log-likelihood	AIC

10	1.890	0.099	-104.772	239.544
9	1.616	-0.076	-104.556	223.112
8	1.692	0.070	-104.927	231.854
7	1.719	0.058	-105.735	227.470
6	1.672	0.007	-105.683	223.366
5	1.578	-0.066	-105.522	223.044
4	1.564	-0.088	-105.909	223.818
3	1.573	-0.111	-106.084	222.168
2	1.549	-0.129	-106.106	222.212
1	1.538	-0.138	-106.120	222.240

Open in a new tab

Table 4.2.

Leukemia Data - Knot Sequence: 5,8,12.75 Quantiles of Uncensored Observations based on Final Model

Parameter	Estimate	Std. Error	Wald z-score	p-value

β	1.573	0.5044	3.1187	0.0018
γ	-0.111	0.3113	-0.3568	0.7213

Open in a new tab

Table 4.3.

Summary of Knot Selection Procedure for Mice Data

K	β̂	γ̂	log-likelihood	AIC

6	-0.453	-0.411	-300.516	625.032
5	-0.434	-0.393	-301.369	624.154
4	-0.414	-0.348	-301.369	622.738
3	-0.438	-0.270	-302.743	623.486
2	-0.364	-0.587	-304.204	624.408
1	-0.438	-0.312	-305.732	625.464

Open in a new tab

Acknowledgments

The authors would like to thank the Associate Editor and referee for providing valuable comments that helped improve the presentation in this paper. The work of the first author was supported in part by NIH grant P30 CA 06927 and an appropriation from the Commonwealth of Pennsylvania.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1.Bordes L, Breuils C. Sequential Estimation for Semiparametric Models with Application to the Proportional Hazards Model. Journal of Statistical Planning and Inference. 2006;136:3735–3759. [Google Scholar]
2.de Boor C. A Practical Guide to Splines. Springer-Verlag; New York: 2001. [Google Scholar]
3.Cheng SC, Wei LJ, Ying Z. Predicting Survival Probabilities with Semiparametric Transformation Models. Journal of the American Statistical Association. 1997;92:227–235. [Google Scholar]
4.Cox DR. Regression Models and Life Tables (with Discussion) Journal of the Royal Statistical Society-Series B. 1972;34:187–202. [Google Scholar]
5.Cuzick J, Trejo B. Analysis of Trials with Treatment-Individual Interactions. In: Klein J, Goel P, editors. Survival Analysis: State of the Art. Kluwer, Dordrecht; The Netherlands: 1992. pp. 65–71. [Google Scholar]
6.Devarajan K. PhD Dissertation. Division of Statistics, Northern Illinois University; 2000. Inference for a Non-proportional Hazards Regression Model and Applications. [Google Scholar]
7.Devarajan K, Ebrahimi N. Goodness-of-Fit Testing for the Cox Proportional Hazards Model. In: Huber C, Balakrishnan N, Nikulin MS, Mesbah M, editors. Goodness-of-Fit Tests and Model Validity. Birkhauser; Boston: 2002. pp. 237–254. [Google Scholar]
8.Devarajan K, Ebrahimi N. Testing for Covariate Effect in the Cox Proportional Hazards Regression Model. Communications in Statistics - Theory & Methods. 2009;38(14):2333–2347. doi: 10.1080/03610920802536958. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Durrelman S, Simon R. Flexible Regression Models with Cubic Splines. Statistics in Medicine. 1989;8:551–561. doi: 10.1002/sim.4780080504. [DOI] [PubMed] [Google Scholar]
10.Gore SM, Pocock SJ, Kerr GR. Regression Models and Non-proportional Hazards in the Analysis of Breast Cancer Survival. Applied Statistics. 1984;33:176–195. [Google Scholar]
11.Gray RJ. Spline-based Tests in Survival Analysis. Biometrics. 1994;50:640–652. [PubMed] [Google Scholar]
12.Hastie T, Tibshirani RJ. Varying-Coefficient Models. Journal of the Royal Statistical Society-Series B. 1993;55:757–796. [Google Scholar]
13.Hougaard P. Analysis of Multivariate Survival Data. Springer-Verlag; New York: 2000. [Google Scholar]
14.Hsieh F. On Heteroscedastic Hazards Regression Models: Theory and Application. Journal of the Royal Statistical Society-Series B. 2001;63(1):63–80. [Google Scholar]
15.IMSL Fortran Math Library. Fortran Subroutines for Mathematical Applications. 1991;3(version 2) [Google Scholar]
16.Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. John Wiley; New York: 1980. [Google Scholar]
17.Quantin C, Moureau T, Asselain B, Maccario J, Lellouch J. A Regression Model for Testing the Proportional Hazards Hypotheses. Biometrics. 1996;52:874–885. [PubMed] [Google Scholar]
18.R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2009. URL http://www.R-project.org. [Google Scholar]
19.Rosenberg PS. Hazard Function Estimation Using B-Splines. Biometrics. 1995;51:874–887. [PubMed] [Google Scholar]
20.Schumaker LL. Spline Functions Basic Theory. John Wiley & Sons; New York: 1981. [Google Scholar]
21.Therneau TR, Grambsch PM. Proportional Hazards Tests and Diagnostics Based on Weighted Residuals. Biometrika. 1994;81:515–526. [Google Scholar]
22.Tonak V, Hermanek P, Banz H, Groitl H. Cytotoxic and Hyperthermic Perfusion: A Preliminary Study. Cancer Treatment Review. 1979;6:135–141. doi: 10.1016/s0305-7372(79)80033-x. [DOI] [PubMed] [Google Scholar]
23.Wei LJ. Testing Goodness-of-Fit for Proportional Hazards Model with Censored Observations. Journal of the American Statistical Association. 1984;79:649–652. [Google Scholar]
24.Wu H-DI, Hsieh F. Heterogeneity and Varying Effect in Hazards Regression. Journal of Statistical Planning and Inference. 2009;139:4213–4222. [Google Scholar]

[R1] 1.Bordes L, Breuils C. Sequential Estimation for Semiparametric Models with Application to the Proportional Hazards Model. Journal of Statistical Planning and Inference. 2006;136:3735–3759. [Google Scholar]

[R2] 2.de Boor C. A Practical Guide to Splines. Springer-Verlag; New York: 2001. [Google Scholar]

[R3] 3.Cheng SC, Wei LJ, Ying Z. Predicting Survival Probabilities with Semiparametric Transformation Models. Journal of the American Statistical Association. 1997;92:227–235. [Google Scholar]

[R4] 4.Cox DR. Regression Models and Life Tables (with Discussion) Journal of the Royal Statistical Society-Series B. 1972;34:187–202. [Google Scholar]

[R5] 5.Cuzick J, Trejo B. Analysis of Trials with Treatment-Individual Interactions. In: Klein J, Goel P, editors. Survival Analysis: State of the Art. Kluwer, Dordrecht; The Netherlands: 1992. pp. 65–71. [Google Scholar]

[R6] 6.Devarajan K. PhD Dissertation. Division of Statistics, Northern Illinois University; 2000. Inference for a Non-proportional Hazards Regression Model and Applications. [Google Scholar]

[R7] 7.Devarajan K, Ebrahimi N. Goodness-of-Fit Testing for the Cox Proportional Hazards Model. In: Huber C, Balakrishnan N, Nikulin MS, Mesbah M, editors. Goodness-of-Fit Tests and Model Validity. Birkhauser; Boston: 2002. pp. 237–254. [Google Scholar]

[R8] 8.Devarajan K, Ebrahimi N. Testing for Covariate Effect in the Cox Proportional Hazards Regression Model. Communications in Statistics - Theory & Methods. 2009;38(14):2333–2347. doi: 10.1080/03610920802536958. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Durrelman S, Simon R. Flexible Regression Models with Cubic Splines. Statistics in Medicine. 1989;8:551–561. doi: 10.1002/sim.4780080504. [DOI] [PubMed] [Google Scholar]

[R10] 10.Gore SM, Pocock SJ, Kerr GR. Regression Models and Non-proportional Hazards in the Analysis of Breast Cancer Survival. Applied Statistics. 1984;33:176–195. [Google Scholar]

[R11] 11.Gray RJ. Spline-based Tests in Survival Analysis. Biometrics. 1994;50:640–652. [PubMed] [Google Scholar]

[R12] 12.Hastie T, Tibshirani RJ. Varying-Coefficient Models. Journal of the Royal Statistical Society-Series B. 1993;55:757–796. [Google Scholar]

[R13] 13.Hougaard P. Analysis of Multivariate Survival Data. Springer-Verlag; New York: 2000. [Google Scholar]

[R14] 14.Hsieh F. On Heteroscedastic Hazards Regression Models: Theory and Application. Journal of the Royal Statistical Society-Series B. 2001;63(1):63–80. [Google Scholar]

[R15] 15.IMSL Fortran Math Library. Fortran Subroutines for Mathematical Applications. 1991;3(version 2) [Google Scholar]

[R16] 16.Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. John Wiley; New York: 1980. [Google Scholar]

[R17] 17.Quantin C, Moureau T, Asselain B, Maccario J, Lellouch J. A Regression Model for Testing the Proportional Hazards Hypotheses. Biometrics. 1996;52:874–885. [PubMed] [Google Scholar]

[R18] 18.R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing; Vienna, Austria: 2009. URL http://www.R-project.org. [Google Scholar]

[R19] 19.Rosenberg PS. Hazard Function Estimation Using B-Splines. Biometrics. 1995;51:874–887. [PubMed] [Google Scholar]

[R20] 20.Schumaker LL. Spline Functions Basic Theory. John Wiley & Sons; New York: 1981. [Google Scholar]

[R21] 21.Therneau TR, Grambsch PM. Proportional Hazards Tests and Diagnostics Based on Weighted Residuals. Biometrika. 1994;81:515–526. [Google Scholar]

[R22] 22.Tonak V, Hermanek P, Banz H, Groitl H. Cytotoxic and Hyperthermic Perfusion: A Preliminary Study. Cancer Treatment Review. 1979;6:135–141. doi: 10.1016/s0305-7372(79)80033-x. [DOI] [PubMed] [Google Scholar]

[R23] 23.Wei LJ. Testing Goodness-of-Fit for Proportional Hazards Model with Censored Observations. Journal of the American Statistical Association. 1984;79:649–652. [Google Scholar]

[R24] 24.Wu H-DI, Hsieh F. Heterogeneity and Varying Effect in Hazards Regression. Journal of Statistical Planning and Inference. 2009;139:4213–4222. [Google Scholar]

PERMALINK

A semi-parametric generalization of the Cox proportional hazards regression model: Inference and Applications

Karthik Devarajan

Nader Ebrahimi

Abstract

1 Introduction

2 A Semi-Parametric Generalization of the Cox PH Model