Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Dec 1.
Published in final edited form as: Biometrics. 2012 Sep 24;68(4):1126–1135. doi: 10.1111/j.1541-0420.2012.01784.x

A General Class of Semiparametric Transformation Frailty Models for Non-proportional Hazards Survival Data

Sangbum Choi 1,, Xuelin Huang 1
PMCID: PMC3530665  NIHMSID: NIHMS378765  PMID: 23005582

Summary

We propose a semiparametrically efficient estimation of a broad class of transformation regression models for non-proportional hazards data. Classical transformation models are to be viewed from a frailty model paradigm, and the proposed method provides a unified approach that is valid for both continuous and discrete frailty models. The proposed models are shown to be flexible enough to model long-term follow-up survival data when the treatment effect diminishes over time, a case for which the proportional hazards or proportional odds assumption is violated, or a situation in which a substantial proportion of patients remains cured after treatment. Estimation of the link parameter in frailty distribution, considered to be unknown and possibly dependent on a time-independent covariates, is automatically included in the proposed methods. The observed information matrix is computed to evaluate the variances of all the parameter estimates. Our likelihood-based approach provides a natural way to construct simple statistics for testing the proportional hazards and proportional odds assumptions for usual survival data or testing the short-term and long-term effects for survival data with a cure fraction. Simulation studies demonstrate that the proposed inference procedures perform well in realistic settings. Applications to two medical studies are provided.

Keywords: Transformation models, Nonparametric likelihood, Counting process, Discrete frailty, Compound Poisson frailty, Cure fraction, Survival analysis

1. Introduction

In randomized clinical cancer trials, patients are often followed long after treatment has been terminated. The Cox proportional hazards model is frequently used to model and infer data from such clinical studies, but the proportional hazards (PH) assumption can be violated in practice when a treatment effect disappears as time progresses during long follow-up. For example, consider the study of a malignant melanoma cancer data set, in which patients were treated surgically at the Plastic Surgery Unit in Odense, Denmark, from 1964 to 1973 and followed until 1978. Using the Cox model, Andersen et al. (1993, Examples VII.2.5, VII.3.1, and VII.3.4) found that the presence of ulceration and tumor thickness were the main prognostic factors for patient survival, but they also noted the lack of PH for these variables. Figure 1 illustrates the Kaplan-Meier (KM) estimates for patients with small, medium and large tumor size. A noticeable feature of this data set is a considerable fraction of long-term survivors: survival probabilities at 10 year are 82.8%, 42.0% and 59.8% for subgroups with small, medium, and large sized tumors, respectively, and no more events were observed thereafter. Furthermore, the KM curves of patients with medium and large tumor crossed at approximately 7 years after surgery. Such phenomena are common in long-term follow-up survival data, making the PH assumption inadequate.

Figure 1.

Figure 1

Malignant melanoma cancer example. The Kaplan-Meier estimates for survival times for tumor size, classified as small (≤2 cm), medium (2–5 cm) and large (>5 cm).

The proportionality assumption can and should be challenged, but the basic model is so well known and widely used that it makes sense to ensure that the standard Cox model is nested within more flexible alternative semiparametric models. A natural approach in this regard is to consider a class of semiparametric transformation models, which specifies the survival function of the survival time T given the covariates Z as

SZ(t)=P(Tt|Z)=g{0teβZ(s)dΛ(s)}, (1)

where β is a vector of unknown regression coefficients and Λ(·) is an increasing but unspecified nonnegative function. The continuous, strictly decreasing link function g(·) is given or specified up to a finite-dimensional parameter. Note that when g(x) = ex, model (1) reduces to the proportional hazards (PH) model, and when g(x) = 1/(1 + x), it is the proportional odds (PO) model (Bennett, 1983).

A fundamental element of model (1) for characterizing non-proportionality is the baseline link function g(·). Since g(·) is the survival function of an underlying distribution, it is natural to think of a family of parametric distributions whose link functions are indexed by an extra parameter α. This extension may be based on a frailty consideration: under the PH frailty model, a random variable or so-called frailty W > 0 is introduced to act multiplicatively on the hazard; thereby, the link function is the Laplace transform of W,

gα(x)=Eα[eWx]=0ewxdF(w;α). (2)

Since the frailty W is usually assumed to be unobservable, a family of stable distributions such as gamma, inverse Gaussian, or positive stable need to be postulated (Hougaard, 2000). This implies that the population under study is a mixture of individuals with different risks, with the unmeasured heterogeneity being described by the frailty variable. Model (2) is particularly useful for modeling long-term survival data, such as the melanoma data described above: on average, more frail individuals die early on, whereas less frail individuals survive longer.

In long-term survival studies, it is of interest to estimate the sizes of the “frail” and “less frail” groups and how the covariates influence each group over time. In this case, the frailty model we seek is a mixture of a positive probability at zero frailty and a continuous distribution at positive frailties. This idea, motivated by (2), can be characterized by the PH model with a discrete frailty N ≥ 0 in place of a continuous frailty W > 0. It defines the link function as follows:

gα(x)=Eα[eNx]=n=0enxp(n;α). (3)

A discrete frailty N may account for unobservable clusters in population and zero frailty (N = 0) indicates no risk from the event. With P(N = 0) > 0, model (3) facilitates modeling survival data with a large proportion of long-term survivors or cured patients. In fact, model (3) includes many existing models for survival data with cure: a binary frailty assumption leads to the well-known two-mixture components model (Kuk and Chen, 1992; Sy and Taylor, 2000; Peng and Dear, 2000; Lu and Ying, 2004), and a Poisson frailty gives the promotion time cure rate model (Chen et al., 1999; Tsodikov et al., 2003; Zeng and Lin, 2006).

In this article, we propose a unified approach to the estimation of model (1), where the link function is induced by the frailty model (2) or (3). In the statistical literature, considerable but separate efforts have been made to develop statistical inferences about the semiparametric models (2) and (3). In other words, survival cure rate models induced from (3) have not been developed using the transformation model perspective. Inference procedures under model (2) for right-censored data have been proposed by Dabrowska and Doksum (1988), Cheng et al. (1995), Chen et al. (2002). Murphy (1994, 1995) provided a theoretical basis for a univariate gamma frailty model, which was further extended to a general class of transformation models by Kosorok et al. (2004). More recently, Zeng and Lin (2006) and Chen (2009) provided efficient and reliable methods based on the maximum likelihood estimation. In all these approaches, value of the link parameter α is assumed to be known and fixed, and different effects under various α’s are assessed by sensitivity analysis. The problem of estimating an unknown link parameter under the gamma frailty model was tackled by Cai and Cheng (2004) and Zucker and Yang (2006), but their methods are asymptotically inefficient, and they are applicable only when α is scalar. By contrast, under model (3) it is more desirable to allow the link parameter to be unknown and empirically estimated, because it determines the proportion of cured patients in sampled data and can depend on additional covariates to evaluate their long-term treatment effects. Parameters in this model can be estimated by maximizing a Monte Carlo approximation of a marginal likelihood (Kuk and Chen, 1992), by solving estimating equations (Lu and Ying, 2004), or by maximizing the full likelihood using an EM-type algorithm (Peng and Dear, 2000; Sy and Taylor, 2000). These methods are typically applicable only to the two-mixture cure model. Furthermore, their computational complexity prevents them from being widely used.

We consider a nonparametric maximum likelihood estimation (NPMLE) to make inferences about all the parameters in the postulated models. Our approach covers a broad class of univariate transformation models for non-proportional hazards data. Guided by martingale theory, our method can be easily implemented and yield asymptotically efficient estimators (Bickel et al., 1998) along with their variance estimates in a closed-form. In particular, a simple way to estimate the link parameter α is presented. It is known that the estimation of α can suffer from an identifiability problem when univariate data are considered (e.g. Zucker and Yang, 2006; Sy and Taylor, 2000), unless there are enough observed events for model (2) or both observed and censored events for model (3). We consider a systematic approach to estimate α as well as (β, Λ). Because the PH and PO models are nested, our likelihood-based approach provides a natural way to construct test statistics for testing the PH or PO assumption for usual survival data or testing short-term and long-term effects for survival data with a cure fraction.

In Section 2, several examples of transformation models are described in view of frailty models. In Section 3, the inference procedures are proposed. In Section 4, the results of simulation experiments are presented. In Section 5, two medical datasets are analyzed using the proposed methods. In Section 6, we give some concluding discussion.

2. Semiparametric Transformation Frailty Models

The proposed models are equivalent to the PH frailty regression, which defines the hazard function conditional on Z and frailty ϖ as

λϖ,Z(t)=ϖeβZ(t)λ(t), (4)

where λ(t) = dΛ(t)/(dt) is an unspecified hazard and ϖ = W > 0 or ϖ = N ≥ 0. The population model is then completed by assuming a parametric distribution for ϖ, leading to marginal expression (1). Under this assumption, the link function gα equals the Laplace transform of the frailty. Additional link functions may be obtained by alternating the frailty assumption (Hougaard, 2000).

2.1 Continuous frailty PH models

Examples of continuous frailty PH models (i.e., ϖ = W) are presented in Kosorok et al. (2004). The popular choice of frailty modeling is the gamma distribution with mean 1 and variance α, which leads to

gα(x)=(1+αx)1/α,   α0. (5)

The gamma frailty (GF) model has been also discussed previously by Harrington and Fleming (1982), leading to their family of weighted log-rank statistics, and is equivalent to the generalized proportional odds model (Dabrowska and Doksum, 1988) in the two-sample setting. This model includes the PH model (α = 0) and the PO model (α = 1). The inverse Gaussian frailty (IGF) model (Hougaard, 2000) specifies

gα(x)=exp [{(1+2αx)1/21}/α],   α0, (6)

and the positive stable frailty model has gα(x) = exp (−x1−α), 0 ≤ α < 1. The power transform in the positive stable distribution can be modified to obtain a class of Box-Cox (BC) transformations (Zeng and Lin, 2006)

gα(x)=exp [{(1+x)1α1}/(1α)], (7)

so that it also includes PH (α = 0) and PO (α = 1). Note that the BC class accommodates any real value of α. The log-normal frailty has

gα(x)=0exp (xeα1/2υα/2)ϕ(υ)dυ,   α0, (8)

where ϕ is the standard normal density. In addition to take account of unmeasured heterogeneity across individuals, these models can be used to test the PH or PO assumption. For instance, deviation from the PH and PO assumptions can be assessed by testing H0 : α = 0 and H0 : α = 1 in model (7), respectively.

The link functions introduced above are defined at α = 0 by continuity (α = 1 for (7)). In the above models, excepting the positive stable frailty, var(ϖ) = α. Theoretically, the derivation of marginal distributions under the frailty models restricts the range of α to be nonnegative. However, the value of α can also be negative as long as model (1) is well defined (Kosorok et al., 2004), although it no longer has a frailty interpretation.

Remark 1: It is interesting to note that the family of power variance frailties (Hougaard, 2000; Aalen, 1992), given by

gα(x)=exp [να(1ν){1(1+αxν)1ν}],   α0,ν>0, (9)

can generalize the GF, IGF, and BC models with E(ϖ) = 1 and var(ϖ) = α. Specifically, for ν = 1/2 model (9) is the IGF model, for ν = α it corresponds to BC, and as ν → 1 it gives GF. The positive stable frailty can also be induced from this model. For 0 < ν ≤ 1, (9) is a proper survival function, whereas for ν > 1, it corresponds to the compound Poisson frailty model (Aalen, 1992) in which P(ϖ = 0) is positive and thus accommodates zero frailties.

2.2 Discrete frailty PH models

As we noted, the class of discrete frailty models (i.e., ϖ = N) is suitable to infer survival data with a substantial proportion of cured patients. A binary frailty with P(N = 0) = 1/(1 + α) and P(N = 1) = α/(1 + α) leads to the two-mixture cure model (e.g., Kuk and Chen, 1992) with the link function

gα(x)=11+α+α1+αex,   α>0. (10)

If N has a Poisson distribution with mean α, model (3) implies the promotion time cure model (e.g., Chen et al., 1999)

gα(x)=exp{αF0(x)},   α>0, (11)

where F0(x) = 1 − ex, and if N has a geometric distribution with mean α, we further have

gα(x)=1/{1+αF0(x)},   α>0. (12)

We note that models (10) and (11) have been independently developed in the literature, but both can be represented within the discrete frailty framework as above. The link functions (10), (11), and (12) are improper in that gα(∞) = P(N = 0) > 0, which may represent the proportion of long-term cured patients, P(cured): it is a function of α only and is given by 1/(1 + α) for models (10) and (12) and by e−α for model (11).

When the cured proportion is heterogeneous with regard to some known prognostic factors, we can consider the link parameter α as a function of covariates, such that

α(X)=exp (γ0+γ1X) (13)

for some observed time-independent covariates X. A nice aspect of introducing this parametrization is the ability to separate the short-term and long-term covariate effects: P(cured|X) is given by 1/(1+eγ0+γ1X) with (10) and (12) and exp (eγ0+γ1X) with (11). Here, γ0 determines the cured proportion when X = 0 and γ1 measures the long-term effect of X. Note that models (11) and (12) are analogues, respectively, to the PH and PO models for X. The null hypotheses for no short-term effects and no long-terms effects can be defined by β = 0 and γ1 = 0. In our likelihood-based setting, tests for these hypotheses can be easily formulated by score tests or likelihood-ratio (LR) tests.

3. Methods

3.1 Notations

Let T and C denote the failure time and censoring time. Let Z(·) denote a vector of possibly time-dependent covariates. Assume that T and C are independent conditional on Z and that Z has a bounded total variation. In the counting process notation, the counting process N* (t) records the number of events up to time t and Y* (t) = I(Tt), where I(·) indicates the indicator function. Let Y (t) = Y* (t)I(Ct) and N(t) = N* (tC), where ab = min(a, b). The set of parameters is to be estimated from a random sample of n subjects, {Ni(t), Yi(t), Zi(t) : i = 1, …, n; t ∈ [0, τ]}, where τ is a maximum follow-up time. We assume α to be scalar, but extension to the vector-valued parameter, as in the case of (13), is straightforward. In such cases, we assume that X is bounded. For ease of notation, define

Hαlog gα,hαHα/(x),ψαlog hα/(x),φαlog hα/(α).

3.2 Maximum likelihood estimation

3.2.1 Direct maximization

A crucial issue in constructing a nonparametric likelihood in semiparametric problems is identification of the likelihood (Murphy, 1994). We turn to the NPMLE approach, in which Λ is nonparametrically estimated by step functions with jumps at distinct failure times. We denote by dΛ such a jump size of Λ. Efficient estimation can be made by maximizing the likelihood indexed by η ≡ (α, β, {dΛ}). The semiparametric problem is thus converted to a parametric problem in which the number of jumps equals the number of event times and is an order of n, depending on the amount of censoring. Accordingly, we formulate the log-likelihood on [0, τ] as

l(η)=i=1n{0τ log dHi(t;η)dNi(t)0τYi(t)dHi(t;η)}, (14)

where dHi(t; η) = hi(t−; η)eβ′Zi(t)dΛ(t), i = 1, …, n, and hi(t; η) = hαi(t; β, Λ)} with ξi(t;β,Λ)=0teβZi(s)dΛ(s) . Note that (14) is essentially a standard parametric log-likelihood function for parameters (α, β, Λ) subject to right censoring (Chen, 2009; Kalbfleisch and Prentice, 2002, Ch. 5.8). Under (14), the intensity function for Ni(t) is Yi(t)dHi(t; η), i.e.,

E[dNi(t)|t]=Yi(t)hi(t;η)eβZi(t)dΛ(t),   i=1,,n,

where ℱt is the filtration before t. Thus the counting process Ni(t) can be uniquely decomposed so that Mi(t;η)=Ni(t)0tYi(s)dHi(s;η) , i = 1, …, n, are orthogonal martingales adapted to ℱt at η = η0.

By differentiating (14) with respect to each component of η, we obtain the following likelihood score process:

U(η)=i=1n0τ{ηlog dHi(t;η)}dMi(t;η). (15)

More precisely, the corresponding likelihood scores, U(UαUβUdΛ) , are written as

Uα(η)=i=1n0τφi(t;η)dMi(t;η);
Uβ(η)=i=1n0τZi(t;η)dMi(t;η),

where φi(t; η) = φαi(t; β, Λ)} and

Zi(t;η)=Zi(t)+ψi(t;η)0tZi(s)eβZi(s)dΛ(s),

with ψi(t; η) = ψαi(t; β, Λ)}, and

UdΛ(η)=i=1n0τυi(t;η)dMi(t;η),

where

υi(t;η)=υ(t)+ψi(t;η)0tυ(s)eβZi(s)dΛ(s)

for any bounded function υ(t) over [0, τ]. It suffices to set υ(s) = I(s = t*) for an observed event time t*. It is immediately seen that U(η) is a martingale integral process, because

ηlog dHi(t;η)=[φi(t;η),Zi(t;η),{υi(t;η)}]

is a ℱt-predictable process. The NPMLE η̂ is then defined as the solution to the equation U(η) = 0. Note that α̂ is obtained from solving Uα = 0. It follows from martingale theory that a discrete observed information matrix can be calculated from

I(η)=i=1n0τ{ηlog dHi(t;η)}2Yi(t)dHi(t;η), (16)

where a⊗2 = aa′ for a vector a. The explicit expression for I(η) can be derived by following steps similar to those in Chen (2009), and it is given in the Appendix. At convergence, the large-sample variance-covariance matrix of the parameter estimates is obtained as the inverse of the information matrix (16).

Direct maximum likelihood estimation of the jumps in Λ has often been considered infeasible, but Zeng and Lin (2006) and others have shown this not to be so; if the gradients and the Hessian matrix of an objective function are provided, statistical optimization routines, such as fminunc in Matlab, can be utilized. This procedure is based on subspace trustregion-reflective Newton methods described in Coleman and Li (1994, 1996). Each iteration involves the approximate solution of a large linear system using the method of preconditioned conjugate gradients. The algorithm is terminated when the magnitude of gradients and the search step size are smaller than prespecified tolerances. In our analysis, we used the Matlab algorithm fminunc to maximize the log-likelihood (14), calculating the likelihood scores (15) and the information matrix (16). Even though this search algorithm may not guarantee the global maximum, it works very well with a reasonable initial guess. For the initial values, we can use (α(0), β(0)) = (1, 0) and dΛ(0) = 1/n. The variance-covariance matrix can be estimated from the observed information matrix I0(η) = −∂2l(η)/∂η∂η′ as in Zeng and Lin (2006). However, our experience is that when α is estimated, the use of I0(η) makes the procedure unstable and highly sensitive to initial values. Switching from I(η) to I0(η) near the end of optimization would be useful to gain more precision.

3.2.2 Profile likelihood

For comparison, we also consider Chen (2009)’s profile likelihood approach for our problem. Consider the following score process for θ = (α, β′)′

Uθ(η)=(Uα,Uβ)=i=1n0τχi(t;η)dMi(t;η), (17)

where χi(t; η) = [φi(t; η), Zi(t; η)′]′. To complete the method, an estimator of Λ for fixed θ is necessary. This can be accomplished by solving UdΛ = 0, or equivalently, using Chen (2009)’s weighted Breslow estimator Λ̂θ, which satisfies

dΛ(t;η)=i=1ndNi(t)i=1nYi(t)πi(t;η)hi(t;η)eβZi(t), (18)

where

πi(t;η)=1t+τψi(u;η)dMi(u;η)hi(t;η).

By plugging (18) into (17), we obtain the following profile likelihood score equation:

i=1n0τ{χi(t;η̂θ)Ê(t;η̂θ)} dNi(t)=0, (19)

where η̂θ = (θ, Λ̂θ), and

Ê(t;η̂θ)=j=1nYj(t)eβZj(t)χj(t;η̂θ)hj(t;η̂θ)j=1nYj(t)eβZj(t)πj(t;η̂θ)hj(t;η̂θ).

The NPMLE for the profile likelihood is obtained by iteratively solving (18) and (19) via Newton-Raphson, employing a convergence criterion similar to that of the direct maximization. At convergence, the variance-covariance matrix can be calculated from (16). It can be checked that under a frailty structure (4) the EM-steps are implicit with the profile likelihood, thus it can achieve global maximum by the monotone convergence of the EM algorithm, updating πi and solving (17) and (18) correspond to the E-step and M-step, respectively, implying the EM algorithm.

Although the profile likelihood is efficient in its operation time, it is less stable than direct maximization and may fail to converge when the link parameter α is unknown and estimated. By contrast, direct maximization works well with a moderate-to-large sample. When α is fixed, the profile likelihood also performs well (Chen, 2009).

3.3 Population versus conditional hazards from frailty models

It is noted that transformation models are marginal models, in which the population hazard function with time-invariant Z is equivalent to

λZ(t)=E[ϖ|T>t,t]eβZλ(t), (20)

where E[ϖ|T>t,t]=hα{Λ(t)eβZ}=gα{Λ(t)eβZ}gα{Λ(t)eβZ} . Note that the function hα plays a key role in characterizing a time-varying hazard. For instance, hα(x) = (1 + αx)−1 for (5) and hα(x) = (1 + x)−α for (7). In the case of the proportional hazards/intensity model, E[ϖ|T > t,ℱt] = 1. In general, E[ϖ|T > t,ℱt], which starts off at E[ϖ|ℱ0] at time zero, and decreases toward zero with time. For discrete frailty models, we may be interested in the probability of cured patients at time t, P(cured|T > t,ℱt) = P(ϖ = 0)/P(T > t|Z), because ϖ = 0 implies T > t for all t. This probability increases over time and will eventually become 1. These can be interpreted as follows: the frailty distribution changes over time because the more frail subjects drop out first, and thus the mean of the frailty is becoming smaller as the less frail subjects remain with declining incidence. This partly explains why a plateau is reached in the curves after sufficient follow-up in Figure 1.

4. Simulation studies

We carried out extensive simulation studies to examine the performance of the proposed methods under the postulated models in practical settings. In the first set of simulations, the survival times were simulated from the following model:

SZ(t)=gα{Λ(t)eβ1Z1+β2Z2},

where Λ(t) = t2, (β1, β2) = (1, 2), and we set τ = 10. Two covariates, Z1 and Z2, were generated from the εI(|ε| ≤ 2) and the Bernoulli distribution with success probability 0.5, respectively, where ε follows the standard normal distribution. The censoring time C followed an exponential distribution with a hazard rate λ0e0.5Z2, where the value of a constant λ0 was varied to yield approximately 30% censoring in each simulation. For the link function gα, we considered the following three continuous frailty models:

  • (i)

    gamma frailty (GF): gα(x) = (1 + αx)−1/α, α = 1;

  • (ii)

    Box-Cox model (BC): gα(x) = exp [− {(1 + x)1−α − 1}/(1 − α)], α = 0;

  • (iii)

    inverse Gaussian frailty (IGF): gα(x) = exp [− {(1 + 2αx)1/2 − 1}/α], α = 1.

Note that models (i) and (ii) correspond to the PO and PH models, respectively. Table 1 presents the results based on 1000 replicates for each sample of n = 200. We also conducted simulations with α = 0.5 and obtained similar results (in the Web Appendix). The proposed estimators appear to be unbiased, and the variance estimators reasonably reflect the true variances, although we observe slight discrepancies between them. This observation is expected to some extent, since computational instability may occur as the underlying link function is not fixed. The estimation of IGF with unknown α is more difficult because α is not well identified in this model, showing some degree of bias and relatively high variance. In fact, this is true for GF and BC but less clear ; in GF, for instance, (1+αx)−1/αex for small x, thus a reasonably high event rate is needed to estimate α satisfactorily. Computation would be difficult if Λ(t) or eβ′Z is too small. This observation agrees with that of Zucker and Yang (2006). In Table 1, we also compared the results from the direct maximization and the profile likelihood methods. We observed that the direct maximization converges in all cases, whereas the profile likelihood achieves convergence around 85%–95%, dependent on the value of α. Once convergence is achieved, the estimates of the two procedures are similar. Thus we recommend direct maximization when the sample size is moderate.

Table 1.

Summary statistics for the simulation studies under continuous frailty models

Direct maximization Profile likelihood


Model Par True Bias SE SEE CP Bias SE SEE CP
GF α 1.0 −0.029 0.319 0.346 0.94 −0.016 0.298 0.305 0.94
β1 1.0 0.035 0.200 0.189 0.95 0.049 0.215 0.192 0.93
β2 2.0 0.083 0.412 0.376 0.93 0.080 0.412 0.360 0.92
Λ(0.3) 0.158 −0.002 0.037 0.037 0.95 −0.002 0.037 0.036 0.93
Λ(0.7) 0.858 0.062 0.304 0.244 0.93 0.074 0.282 0.231 0.90
BC α 0.0 −0.027 0.602 0.576 0.96 −0.027 0.602 0.545 0.97
β1 1.0 0.010 0.182 0.179 0.95 −0.018 0.182 0.182 0.95
β2 2.0 −0.004 0.365 0.354 0.94 −0.030 0.361 0.366 0.96
Λ(0.3) 0.131 −0.002 0.028 0.028 0.94 −0.003 0.027 0.028 0.94
Λ(0.7) 0.443 0.004 0.087 0.085 0.93 −0.011 0.086 0.086 0.94
IGF α 1.0 0.090 1.261 1.329 0.94 0.061 1.367 1.461 0.95
β1 1.0 0.025 0.202 0.197 0.92 0.048 0.195 0.205 0.92
β2 2.0 0.043 0.376 0.384 0.92 0.084 0.377 0.401 0.93
Λ(0.3) 0.155 0.013 0.037 0.040 0.95 0.013 0.039 0.042 0.96
Λ(0.7) 0.710 0.065 0.296 0.291 0.92 0.112 0.307 0.318 0.93

Note: GF, gamma frailty, BC, Box-Cox, and IGF, inverse-Gaussian frailty model. Bias and SE are the bias and standard error of the parameter (Par) estimator, SEE is the mean of the standard error estimators, and CP is the coverage probability of the 95% confidence interval. Λ(p) is the value of Λ(·) evaluated at the 100p-th percentile of T given the covariate value at its expectation. The estimates and confidence intervals for α and Λ(·) are constructed on the basis of the log transformation.

In the second experiment, we generated survival events from cure rate models:

  • (iv)
    binary frailty cure rate model (BFC):
    SZ(t)=11+eγ0+γ1Z2+eγ0+γ1Z21+eγ0+γ1Z2exp (Λ(t)eβ1Z1+β2Z2);
  • (v)
    Poisson frailty cure rate model (PFC):
    SZ(t)=exp [eγ0+γ1Z2{1exp (Λ(t)eβ1Z1+β2Z2)}].

Covariates Z1 and Z2 are defined as before. In this setting, the link parameter α in (10) and (11) is parametrized as α(Z2) = eγ01Z2 to give BFC and PFC. The parameters are then (β1, β2, γ0, γ1, Λ). We let Λ(t) = 4t, (β1, β2) = (1,−1), (γ0, γ1) = (1,−1), and τ = 8. Under this setup, the long-term cured proportions for the control group (Z2 = 0) and for the treatment group (Z2 = 1) are 26.9% and 50.0% with BFC, respectively, and 6.6% and 36.8% with PFC. In all simulations the censoring rates are 50%–55%; such censoring is needed to identify long-term treatment parameters. The results, based on 1000 replicates with n = 200, are summarized in Table 2. The proposed methods perform well. The parameter estimators are virtually unbiased, the standard errors reflect the true variations well, and confidence intervals have reasonable coverage probabilities. It is noted that in this simulation the PH model is used for the uncured subgroup. One may instead want to apply the PO model (Lu and Ying, 2004). This can be done simply by replacing exp (−Λ(t)eβ1Z12Z2) in the preceding models (iv) and (v) with 1/ (1 + Λ(t)eβ1Z12Z2). The proposed methods also work well for this modification, as displayed in Table 2.

Table 2.

Summary statistics for the simulation results under discrete frailty models

Binary frailty Poisson frailty


Par True Bias SE SEE CP Bias SE SEE CP
Proportional hazards for uncured patients
β1 1.0 0.023 0.132 0.131 0.94 0.015 0.130 0.129 0.95
β2 −1.0 −0.020 0.344 0.337 0.95 −0.027 0.699 0.717 0.95
γ0 1.0 0.051 0.330 0.305 0.96 0.075 0.378 0.353 0.95
γ1 −1.0 0.022 0.550 0.567 0.95 0.020 0.589 0.610 0.96
Λ(τ/4) 0.5 −0.003 0.089 0.084 0.93 −0.006 0.152 0.131 0.94
Λ(τ/2) 1.0 0.008 0.181 0.166 0.93 0.011 0.350 0.309 0.93
Proportional odds for uncured patients
β1 1.0 0.045 0.199 0.193 0.94 0.007 0.131 0.131 0.95
β2 −1.0 0.024 0.495 0.504 0.95 −0.047 0.472 0.460 0.94
γ0 1.0 0.089 0.568 0.610 0.93 0.035 0.203 0.187 0.95
γ1 −1.0 0.012 0.836 0.916 0.97 0.026 0.333 0.322 0.93
Λ(τ/4) 0.5 0.009 0.134 0.132 0.94 0.012 0.152 0.141 0.93
Λ(τ/2) 1.0 0.068 0.344 0.327 0.95 0.045 0.359 0.335 0.93

5. Examples

5.1 The Veterans’ Administration lung cancer data

We first apply the proposed methods to data from the Veterans’ Administration lung cancer trial. In this trial, 137 male patients with advanced inoperable lung cancer were randomized to receive either a standard treatment or chemotherapy. A subset of the data for the 97 patients without prior therapy was analyzed by Chen et al. (2002) and Zeng and Lin (2006) with a linear transformation model. To facilitate comparison, we consider the same data with the GF, BC, and IGF models and summarize the results in Table 3. In general, attenuation of the coefficients in PH is observed, although it does not seem to be uniform across all covariates. This may reflect the effects from unmeasured heterogeneity or omitted covariates under the PH assumption (Kosorok et al., 2004). The estimates of frailty parameter α are 0.824, 1.078, and 1.066 for GF, BC, and IGF, respectively. The global chi-square test for the PH assumption using the rescaled Schoenfeld residuals gives a P-value of 0.0023, indicating non-proportionality. In order to assess the PH and PO assumptions, we conducted the LR test for H0 : α = 0 and H0 : α = 1 in the BC family: we found χ12=16.2 (P < 0.001) for α = 0 and χ12=0.85 (P = 0.88) for α = 1, favoring the PO model.

Table 3.

Regression analysis of the Veterans’ Administration lung cancer data, with standard error estimates in parentheses

PH PO GF BC IGF
Estimated α 0.824 (0.264) 1.078 (0.071) 1.066 (1.451)
Performance score −0.024 (0.007) −0.053 (0.010) −0.065 (0.007) −0.061 (0.007) −0.038 (0.008)
Adeno vs. large 0.851 (0.348) 1.307 (0.582) 1.344 (0.659) 1.446 (0.594) 1.175 (0.574)
Small vs. large 0.548 (0.321) 1.379 (0.555) 1.437 (0.606) 1.576 (0.573) 0.936 (0.517)
Squamous vs. large −0.215 (0.347) −0.183 (0.589) −0.081 (0.621) −0.125 (0.573) −0.225 (0.472)
    AICp 632.71 617.31 618.41 618.45 628.50

Note: PH, proportional hazards, PO, proportional odds, GF, gamma frailty, BC, Box-Cox, and IGF, inverse Gaussian frailty model.

In the analysis, a model comparison may be instrumental, because the class of models under consideration is not nested. To this end, we also reported the profile Akaike Information criterion (AICp), defined as penalized profile likelihood, AICp = −2lp(θ̂)+2|Θ|, where lp(θ) = l(θ, Λ̂θ) = supΛ l(θ, Λ) is the profile likelihood and |Θ| denotes the dimension of θ = (α, β′)′. The reported values of AICp in Table 3 also confirm that PO would be the best choice among these classes, but the use of GF and BC equally improves the fit, relative to the PH model. Also, PO is better than IGF, even though IGF is more flexible.

5.2 Malignant melanoma cancer data

As a second example, we consider the melanoma cancer data, illustrated in Section 1. The data include survival times since surgery for 205 patients with the cancer. Andersen et al. (1993) found that patients with a thick and/or ulcerated tumor had an increased chance of death from melanoma and that the PH assumption does not hold with these variables. Because 66% of the patients were alive at the end of the study, the data set has a high censoring rate (72%), as depicted in Figure 1. In addition to PH, GF, and BC, we analyze the data with BFC and PFC of the form (10) and (11), respectively. Table 4 presents the regression results for sex, tumor size (≥2 or <2 cm) and the presence of ulceration. The estimated frailty variances are 3.309 with GF and 1.820 with BC, which implies that the amount of frailty significantly differs across individuals. Under BFC and PFC, eβ can represent the hazards ratio for uncured patients given a cure fraction, which is 3.89 (P < 0.001) for BFC and 4.24 (P < 0.001) for PFC for a large versus small tumor. The hazards ratio of ulcerated versus nonulcerated tumor is 3.48 (P = 0.001) for BFC and 3.80 (P = 0.001) for PFC. According to AICp, all transformation models appear to be effective in describing non-proportionality attributable to the presence of cure. Interestingly, the continuous frailty models such as GF and BC also provide good fits for the data, although they do not account for a cure fraction. This is because more frail individuals would experience an event early, so the groups of case remaining for the long follow-up are stronger individuals with smaller frailty. We further fitted a compound Poisson frailty model (9) in Remark 1 that is flexible enough to include GF (ν = 1), IGF (ν = 0.5) and BC as special cases. According to the plot of the likelihood over a range of ν (Web Appendix), the likelihood function achieves its maximum at around ν = 0.9, which suggests GF as a method of choice among this class.

Table 4.

Regression analysis of the melanoma cancer data, with standard error estimates in parentheses

PH GF BC BFC PFC
Estimated α 3.309 (1.247) 1.820 (0.302) 1.552 (0.536) 0.950 (0.223)
Sex 0.357 (0.270) 0.960 (0.476) 1.037 (0.412) 0.878 (0.339) 0.927 (0.362)
Tumor size 1.117 (0.347) 1.899 (0.555) 1.705 (0.470) 1.359 (0.400) 1.445 (0.421)
Ulceration 0.958 (0.323) 1.726 (0.549) 1.553 (0.462) 1.247 (0.387) 1.337 (0.409)
    AICp 528.63 525.37 525.50 525.51 524.69

Note: PH, proportional hazards, GF, gamma frailty, BC, Box-Cox, BFC, binary frailty, and PFC, Poisson frailty model.

It is noted that the foregoing analysis accounts for short-term covariate effects only. To incorporate long-term effects, we consider discrete frailty models with the stratification for α as in (13). As an illustration, we fit the following BFC model

SZ(t)=11+eγ0+γZ+eγ0+γZ1+eγ0+γZ(1+Λ(t)eβZ)1 (21)

for covariates Z = [Z1 = tumor size, Z2 = ulceration]. The regression results can be found in the Web Appendix. Here, the PO model is employed for modeling short-term survival, Sz*(t)P(T>t|Z,T<)=1/(1+Λ(t)eβZ) . Thus, in a two-sample case of Z = 1 or 0, eβ can represent the constant odds ratio for the short-term effect of Z, i.e., (1S1*(t))/S1*(t)(1S0*(t))/S0*(t) . Under model (21), the odds ratio of large versus small tumor (eβ̂1) is given by 4.29 (P = 0.025), and that of ulcerated versus non-ulcerated tumor (eβ̂2) is 2.17 (P = 0.21). Note that the cure probability has a logistic model formulation, −logit{P(cured|Z)} = γ0 + γ′Z, and thus the factor e−γ corresponds to the (conditional) odds ratio of being cured between the two groups Z = 1 and Z = 0, which is given by 0.41 (P = 0.10) for tumor size and 0.36 (P = 0.042) for ulceration. This shows that a large tumor has a strong effect on short-term survival but that its long-term effect lessens once patients survive the short-term risks. In contrast, we obtain a more significant long-term effect of ulceration. Those effects can also be tested with the LR test; for example, when testing the short-term effect (β = 0) and the long-term effect (γ = 0) of ulceration, the LR test under model (21) gives χ12=0.021 (P = 0.652) and χ12=3.088 (P = 0.079), respectively, which supports the above conclusion.

In Figure 2, we plot the logarithms of the cumulative hazards of the fitted Cox PH model, PFC model, and KM estimates for large (>5 cm), medium (2–5 cm), and small (≤2 cm) tumor sizes. It can be seen that the PH assumption forces the distance between these curves to be constant over time, which is clearly not valid with this variable. The estimates from the proposed model are closer to their nonparametric counterparts than those from PH, supporting the choice of the proposed methods for non-proportional hazards.

Figure 2.

Figure 2

The melanoma cancer data. The logarithms of the estimated cumulative hazards functions for large, medium, and small tumor sizes from (a) the proportional hazards model and (b) the Poisson frailty cure model along with the Kaplan-Meier estimates. Three KM curves from top to bottom correspond to large, medium, and small tumor sizes.

6. Discussion

Several modeling techniques have been developed to deal with non-proportional hazards. A class of transformation models can naturally accommodate the violation of the proportionality stemming from unmeasured variables. Although asymptotic theories have been well established for the transformation models, these theoretical results have not been translated into practical estimation procedures, owing to complexities in simultaneously estimating all model parameters via maximum likelihood. In this article, we provide a unified approach for a general class of semiparametric transformation models with various link functions. In addition to regression and nonparametric parameters, our approach allows a link parameter, α, to be estimated, where the link parameter may be scalar or vector-valued through dependence on additional covariates. The problem here is closely related to the issue concerning whether the transformation parameter, as in the Box-Cox transformation, should be regarded as fixed or estimated (Bickel and Doksum, 1981); classical methods for transformation models have often conducted statistical inferences as if α̂ is preassigned, but this approach does not reflect any extra variation from searching the optimal link parameter. Our approach provides a systematic approach to these problems, requiring a moderate sample size.

Supplementary Material

Supp Material

Acknowledgements

This research was supported by the NIH grants 5-P01-CA055164 and 5-P50-CA100632. The authors would like to thank the editor, Dr. Louis, as well as an anonymous associate editor and two anonymous referees for insightful comments that improved the presentation of the paper. The authors also thank Professor I. Ha for some helpful suggestions.

Appendix

Information matrix

Here, the explicit expressions for the observed information matrix (16), ignoring the op(n) terms, are provided. The basic idea of its derivation is from Chen (2009). Let tj denote j-th distinct event times and dΛj = dΛ(tj). Recall that θ = (α, β′)′ and χi(t; η) = [φi(t; η), Zi(t; η)′]′. The information matrix I consists of the following submatrices:

IdΛjdΛj=UdΛj/(dΛj)=i=1ndNi(tj)+{i=1nYi(tj)e2βZj(tj)mi(tj;η)}(dΛj)2,

where

mi(tj;η)=tj+τYi(u)eβZi(u)hi(u;η)ψi2(u;η)dΛ(u).

For j < k,

IdΛjdΛk=UdΛj/(dΛk)=[i=1nYi(tj)eβ{Zi(tk)+Zi(tj)}{hi(tj;η)ψi(tj;η)+mi(tj;η)}]  (dΛjdΛk).
IθdΛj=Uθ/(dΛj)=UdΛj/θ=IdΛjθ=[i=1nYi(tj)eβZi(tj){χi(tj;η)hi(tj;η)+wi(tj;η)}]  (dΛj),

where

wi(tj;η)=tj+τYi(u)χi(u;η)eβZi(u)hi(u;η)ψi(u;η)dΛ(u).
Iθθ=Uθ/θ=i=1n0τYi(t)χi(t;η)2eβZi(t)hi(t;η)dΛ(t).

Footnotes

Supplementary Materials

Web Appendices referenced in Section 4 and 5.2, and simulation codes are are available with this paper at the Biometrics website on Wiley Online Library.

References

  1. Aalen OO. Modeling heterogeneity in survival analysis by the compound Poisson distribution. Annals of Applied Probability. 1992;2:951–972. [Google Scholar]
  2. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. New York: Springer; 1993. [Google Scholar]
  3. Bennett S. Analysis of survival data by the proportional odds model. Statistics in Medicine. 1983;2:273–277. doi: 10.1002/sim.4780020223. [DOI] [PubMed] [Google Scholar]
  4. Bickel PJ, Doksum KA. An analysis of transformations revisited. Journal of the American Statistical Association. 1981;76:296–311. [Google Scholar]
  5. Bickel PJ, Klaassen CA, Ritov Y, Wellner JA. Efficient and Adaptive Estimation for Semiparametric Models. 2nd edition. Baltimore: Johns Hopkins University Press; 1998. [Google Scholar]
  6. Cai T, Cheng S. Semiparametric regression analysis for double censored data. Biometrika. 2004;91:277–290. [Google Scholar]
  7. Chen K, Jin Z, Ying Z. Semiparametric analysis of transformation models with censored data. Biometrika. 2002;89:659–668. [Google Scholar]
  8. Chen MN, Ibrahim JG, Sinha D. A new Bayesian methods for cure rate models with a surviving fraction. Journal of the American Statistical Association. 1999;94:909–919. [Google Scholar]
  9. Chen Y-H. Weighted Breslow-type estimator and maximum likelihood estimation in semiparametric transformation models. Biometrika. 2009;96:591–600. [Google Scholar]
  10. Cheng SC, Wei LJ, Ying Z. Analysis of transformation models with censored data. Biometrika. 1995;82:835–845. [Google Scholar]
  11. Coleman TF, Li Y. On the convergence of reflective Newton methods for large-scale nonlinear minimization subject to bounds. Mathematical Programming. 1994;67:189–224. [Google Scholar]
  12. Coleman TF, Li Y. An interior, trust region approach for nonlinear minimization subject to bounds. SIAM Journal of Optimization. 1996;6:418–445. [Google Scholar]
  13. Dabrowska DM, Doksum KA. Estimation and testing in the two-ample generalized odds-rate model. Journal of the American Statistical Association. 1988;83:744–749. [Google Scholar]
  14. Harrington DP, Fleming TR. A class of rank test procedures for censored data. Biometrika. 1982;69:133–143. [Google Scholar]
  15. Hougaard P. Analysis of Multivariate Survival Data. New York: Springer; 2000. [Google Scholar]
  16. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd edition. New York: Wiley; 2002. [Google Scholar]
  17. Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazards frailty regression models. The Annals of Statistics. 2004;32:1448–1491. [Google Scholar]
  18. Kuk AYC, Chen C-W. A mixture model combining logistic regression with proportional hazards regression. Biometrika. 1992;79:531–541. [Google Scholar]
  19. Lu W, Ying Z. On semiparametric transforamtion cure models. Biometrics. 2004;91:331–343. [Google Scholar]
  20. Murphy SA. Consistency in a proportional hazards model incorporating a random effect. The Annals of Statistics. 1994;22:712–731. [Google Scholar]
  21. Murphy SA. Asymptotic theory for the frailty model. The Annals of Statistics. 1995;23:182–198. [Google Scholar]
  22. Peng Y, Dear KBG. A nonparametric mixture model for cure rate estimation. Biometrics. 2000;56:237–243. doi: 10.1111/j.0006-341x.2000.00237.x. [DOI] [PubMed] [Google Scholar]
  23. Sy JP, Taylor JMG. Estimation in a cox proportional hazards cure model. Biometrics. 2000;56:227–236. doi: 10.1111/j.0006-341x.2000.00227.x. [DOI] [PubMed] [Google Scholar]
  24. Tsodikov AD, Ibrahim JG, Yakovlev AY. Estimating cure rates from survival data: an alternative to two-component mixture models. Journal of the American Statistical Association. 2003;98:1063–1079. doi: 10.1198/01622145030000001007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Zeng D, Lin DY. Efficient estimation of semiparametric transformation models for counting processes. Biometrika. 2006;93:627–640. [Google Scholar]
  26. Zucker DM, Yang S. Inference for a family of survival models encompassing the proportional hazards and proportional odds models. Statistics in Medicine. 2006;25:995–1014. doi: 10.1002/sim.2255. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Material

RESOURCES