Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Jun 1.
Published in final edited form as: Biometrics. 2010 Jul 9;67(2):427–435. doi: 10.1111/j.1541-0420.2010.01456.x

ESTIMATING SUBJECT-SPECIFIC DEPENDENT COMPETING RISK PROFILE WITH CENSORED EVENT TIME OBSERVATIONS

Yi Li 1,*, Lu Tian 2,**, LJ Wei 1,***
PMCID: PMC2970653  NIHMSID: NIHMS210400  PMID: 20618311

SUMMARY

In a longitudinal study, suppose that the primary endpoint is the time to a specific event. This response variable, however, may be censored by an independent censoring variable or by the occurrence of one of several dependent competing events. For each study subject, a set of baseline covariates is collected. The question is how to construct a reliable prediction rule for the future subject’s profile of all competing risks of interest at a specific time point for risk-benefit decision makings. In this paper, we propose a two-stage procedure to make inferences about such subject-specific profiles. For the first step, we use a parametric model to obtain a univariate risk index score system. We then estimate consistently the average competing risks for subjects which have the same parametric index score via a nonparametric function estimation procedure. We illustrate this new proposal with the data from a randomized clinical trial for evaluating the efficacy of a treatment for prostate cancer. The primary endpoint for this study was the time to prostate cancer death, but had two types of dependent competing events, one from cardiovascular death and the other from death of other causes.

Keywords: Local likelihood function, Nonparametric function estimation, Perturbation-resampling method, Risk index score

1. INTRODUCTION

Consider a longitudinal clinical study whose primary endpoint is the time to a specific clinical event. However, this event time is possibly censored by an independent censoring variable or by the occurrence of one of several dependent competing events. For example, in a randomized clinical trial to evaluate the efficacy of estrogen diethylstilbestrol (DES) for treating stage 3 or 4 prostate cancer, 242 patients were randomly assigned to two high dose groups (≥ 1 mg/day) and 241 subjects were assigned to two low dose groups (≤ 0.2 mg/day) (Byar and Green, 1980; Cheng et al., 1998). The primary endpoint for the study is the time to prostate cancer death. At the end of the study, there were 48, 78 and 34 deaths due to prostate cancer, cardiovascular diseases and other causes in the high dose groups. For the low dose groups, the corresponding numbers of deaths are 77, 61 and 46, respectively. With respect to the overall survival, the high dose groups appeared to be superior to the low dose groups. Furthermore, the treatment with high doses of DES reduced the prostate cancer death. However, there was a serious concern about its potential fatal cardiovascular-related toxicity.

To quantify the “pure” treatment effect for prostate cancer in the presence of possibly dependent competing risks is a rather challenging task, if not impossible (Tsiatis, 1975). The risk-benefit decision makings on the proper usage of DES should depend on the entire profile of all competing risks, not solely on the prostate cancer mortality. Moreover, since the choice of balancing the risk and benefit is rather subject-specific, it is important to know how to utilize the future patient’s “baseline” characteristics to predict such an individual-level competing risk profile.

A classical method of handling dependent competing risk problem is to model the so-called cause-specific hazard function for the primary endpoint via the Cox proportional hazards model (Cox, 1972). However, it is not clear how to utilize this technique to make survival predictions (Kalbfleisch and Prentice, 1980; Pepe and Mori, 1993). A useful alternative to deal with competing risks is to consider the cumulative incidence functions (Benichou and Gail, 1990; Gaynor et al., 1993; Gelman et al., 1990; Korn and Dorey, 1992; Goldhirsch et al., 1994). Recently Cheng et al. (1998) and Fine and Gray (1999) modeled the cumulative incidence function with the subject’s covariates, for example, via a Cox-type model. Further novel procedures along this line have been studied, for example, by Fine (2001), Klein and Andersen (2005), Klein (2006) and Scheike et al. (2008). Another fruitful class of parametric or semi-parametric methods is to consider latent failure time modeling (Kalbfleisch and Prentice, 2002; Lawless, 2003; Andersen et al., 2002; Li et al., 2007) to analyze the competing risk data. The validity of predicting the competing risk profiles based on a parametric or semi-parametric model is heavily dependent on the adequacy of the fitted model.

In this paper, we are interested in constructing subject-level predictions of all dependent competing risks of interest at a specific time point, or a set of time points. When, for each subject, more than one baseline covariate is involved, a purely nonparametric function estimation procedure for the above event rates may not perform well even with relatively large samples. Here, we consider the case that there is a primary event of interest and construct a two-stage procedure. For the first step, we use a parametric or semi-parametric model to create a univariate risk index score predictive to the event rate of the primary interest. We then use a nonparametric function estimation method to make joint inferences about the average competing risks for subjects with the same index score. The new proposal is illustrated with the data from the above DES study.

For the case with only one risk category involved, Cai et al. (2010) utilized a similar two stage procedure for predicting the mean risk of subjects who have the same parametric risk score. Other novel semi-parametric methods for predicting risk of a single event with a high dimensional covariate vector have been proposed, for example, by Bair and Tibshirani (2004) using the supervised principal component approach. In the present paper, we took an approach of using a local multinomial likelihood for nonparametric smoothing technique at the second stage, which is a non-trivial generalization from Cai et al. (2010). We conducted an extensive numerical study to examine the performance of the new procedure compared with a one-step semi-parametric method, for example, using the generalized additive models. The new proposal appears to be superior to its one-step counterparts with respect to the mean squared error criterion.

2. CONSISTENT ESTIMATION FOR MEAN COMPETING RISKS OF SUBJECTS WITH THE SAME PARAMETRIC RISK SCORE

Suppose that there are K distinct types of possibly dependent competing events. For a random subject in the study, let be the study time period from the study entry to the first time point at which one out of these K events occurs. Let ε be a random variable whose possible values are {1, … , K}. If ε = k, Type k event is observed at . Also, let U be the subject’s “baseline” covariate vector. Furthermore, suppose that we are interested in the K conditional event rates at a specific time point t0, that is,

πk(U)=pr(T˜t0,ε=k|U),k=1,,K. (2.1)

In practice, is often censored by an independent continuous variable C with an unknown survival distribution G(·). Assume that C is independent of and U. Let T = min(,C) and Δ = I( = T), where I(·) is the indicator function. Also, let {(i, Ci, εi, Ui), i = 1, … , n} be n independent copies of (, C, ε, U). The problem is how to make inference about (2.1) based on the incomplete event time observations {(Ti, Δiεi, Ui), i = 1, … , n}. Unfortunately, if the dimension of U is greater than one, any existing nonparametric regression estimator for (2.1) may not perform well even when the sample size n is large and the event rates are not extremely low or high. Instead of estimating such fine subject-level event rates (2.1), a feasible, practical alternative is to construct a univariate parametric risk index system based on U and group the study subjects with respect to this scoring system. Then using a univariate nonparametric function estimation procedure, one may estimate consistently these K average competing event rates for each stratum whose subjects have the same index score.

To construct a univariate scoring system, we consider the case that there is a primary event of interest for the study, say, the event corresponding to ε = 1. Let X, a p × 1 vector, be a function of U and the first component of X is one. Let Xi be the counterpart of X from Ui, i = 1, … , n. Consider a parametric working model for the primary event rate:

π1(U)=g(βX), (2.2)

where g is a known strictly increasing, smooth function, for example, the anti-logit function, and β is a p × 1 vector of unknown parameters. Without censoring, one may use the maximum likelihood estimator or a simple estimating function such as

n1i=1nXi{I(Tit0,ε=1)g(βXi)} (2.3)

to estimate β.

In the presence of independent right censoring, one may modify (2.3) by adjusting censoring. One possible modification is

R(β)=n1i=1nwiG^(Tit0)Xi{I(Tit0,ε=1)g(βXi)}, (2.4)

where wi = I (Tit0Ci) = I(Tit0) Δi+I(Tit0) and Ĝ(·) is the Kaplan-Meier estimator for G(·). This generalization has been studied, for example, by Zheng et al. (2007) and Uno et al. (2008) in different settings. Heuristically, for a large sample size n, conditional on and U, the expected value of wi/Ĝ(Tit0) is one. This implies that for large n, R(β) ≈ (2.3). Therefore, asymptotically one would expect that a root β̂ to R(β) = 0 is free of the study-specific censoring distribution G(·). It is important to note that under rather mild conditions, β̂ converges to a finite value β0 even when the model (2.2) is not correctly specified (Uno et al., 2008). This stability property, coupled with the fact that β0 is free of the study-specific censoring distribution, is essential for developing our inference procedures. Note that if the model (2.2) is correctly specified, g(β̂′X) would be a consistent estimator for (2.1).

Now, consider a future subject from the same study population, whose U and X are U0 and X0 with potential, but unobservable (, ε)′ = (0, ε0)′, Let β̂′ X0 = v, a given constant. We are interested in estimating the following (K − 1) average event rates at time t0 :

pr(T˜0t0,ε0=k   |   β^X0=v),k=1,,K1, (2.5)

where the probability is with respect to the future observation (U0, 0, ε0) as well as the observed data {(Ti, Δiεi, Ui), i = 1, … , n}, from which β̂ is estimated. Note that the probabilities in (2.5) depend on the sample size n and are convergent to the following conditional probabilities

ηk(v)=pr(T˜0t0,ε0=k   |   β0X0=v),k=1,,K1, (2.6)

as n → ∞. Also note that (2.6) is the set of the multinomial cell probabilities for future subjects whose limiting risk score is v. For the non-censored case, let us consider a nonparametric estimation procedure for η(v) = {η1(v), … , ηK−1(v)}′ via a localized multinomial likelihood function. Specifically, let Yik = I (Tit0, εi = k) for k = 1, … , K − 1, and β̂′Xi = i. For notational ease, write pk = ηk(v), the probability of failing with cause k prior to time t0 given score v, for k = 1, … ,K − 1. Then, a kernelized log-likelihood function for η(v), expressed with the unknown parameter vector p = (p1, … , pK−1)′ is

i=1nKh(V^iv)k=1K1log{pkYik(1k=1K1pk)1k=1K1Yik}, (2.7)

where pk0,k=1K1pk1,Kh(s)=K(s/h)/h for a symmetric standard kernel function K(·) with a finite support and h is the smooth parameter.

In the presence of censoring, we add a weight function wi/Ĝ(Tit0) in front of Kh(·) in (2.7). The resulting log-likelihood is

i=1nwiG^(Tit0)Kh(V^iv)k=1K1log{pkYik(1k=1K1pk)1k=1K1Yik}. (2.8)

An estimator for η(v) = (η1(v), … , ηK−1) can be obtained by maximizing (2.8) with respect to p’s with the above constraints.

The performance of this nonparametric local estimator may be improved by replacing pk of each summand in (2.8) by

exp{ak+bk(V^iv)}1+i=1K1exp{ak+bk(V^iv)},

where a = (a1, … , aK−1)′ and b = (b1, … , bK−1)′ are unknown vectors of parameters. Here, the rational is to use a linear function ak + bk(Vv) to approximate log{ηk(V)/ηK(V)} in a small neighborhood of v (Fan and Gijbels, 1996). The resulting log-likelihood function is

(a,b;v)=i=1nwiKh(V^iv)G^(Tit0)k=1K1(Yik{ak+bk(V^iv)}log[1+k=1K1exp{ak+bk(V^iv)}]). (2.9)

Let â and be the maximizers for ℓ(a, b; v) with respect to a and b. Also, let η̂k(v) be

exp(a^k)/[1+k=1K1exp(a^k)],   k=1,,K1. (2.10)

3. CONSTRUCTING POINTWISE AND SIMULTANEOUS CONFIDENCE INTERVALS FOR ηk(·) OVER THE RISK SCORE

We can show that when h = O(n−ν), 1/5 < ν < 1/2, η̂k(v) is consistent estimator for ηk(v), k = 1, … ,K − 1 (see web based appendix A). Moreover, the joint distribution of

{(nh)1/2[f{η^k(v)}f{ηk(v)}],k=1,,K1} (3.1)

can be approximated by a multivariate normal with mean 0 and covariance matrix Σ(v), where f(·) : [0, 1] → [−∞, ∞] is a given smooth, strictly increasing function. In this paper, we let f(·) be the logit function.

To estimate the covariance matrix Σ(v) associated with (3.1), we utilize a perturbation-resampling procedure which is similar to a wild bootstrapping method (Mammen, 1993) and has been successfully applied to many interesting inference problems, especially in survival analysis (Gilbert et al, 2004; Tian et al. 2005). Specifically, let {Bi, i = 1, … , n} be a random sample from the unit exponential. Let a*={a1*,,aK1*} be the minimizer of ℓ*(a, b; v), a perturbed version of (2.9), where

*(a,b;v)=i=1nBiwiKh(Vi*v)G*(Tit0)(k=1K1Yik{ak+bk(Vi*v)}log[1+k=1K1exp{ak+bk(Vi*v)}]).

Here, G*(·) and Vi* are the perturbed counterparts of Ĝ(·) and i, respectively, i.e.,

G*(t)=exp[i=1n0tBid{I(Tis,Δi=0)}j=1nBjI(Tjs)],

Vi*=Xiβ* and β* is the solution to the perturbed estimating equation of (2.4)

i=1nBiwiG*(Tit0)Xi{I(Tit0,ε=1)g(Xiβ)}=0.

Furthermore, let the corresponding perturbed ηk*(v) be

exp(ak*)/{1+k=1K1exp(ak*)},k=1,,K1.

In the web based appendix B, we show that the covariance matrix Σ(v) can be consistently estimated by Σ̂ (v), the expectation of (nh)[f{η*(v)} − f{η̂(v)}] [f{η*(v)} − f{η̂ (v)}]′ (conditional on the observed data), where f{η*(v)}=(f{η1*(v)},,f{ηK1*(v)}) and f{η̂ (v)} = (f{η̂1(v)}, … , f{η̂K−1(v)})′. Noting that {G*(·),V1*,,Vn*} can be replaced by {Ĝ(·), 1, … , n} in the perturbation without effecting the asymptotical distributions of (nh)1/2 [f{η*(v)} − f{η̂ (v)}], since the differences between G*(·) and Ĝ(·) as well as between Vi* and i are in the order of Op(n−1/2) which is smaller than f{η*(v)} − f{η̂ (v)} = Op{(nh)−1/2}. However, our experience suggests that the simultaneous perturbation on Ĝ(·) and i often can improve the finite-sample performance of the proposed resampling method.

To obtain an approximation to Σ̂(v) for a given data set, we generate a large number, M, of independent realizations from {Bi, i = 1, … , n}. For each realization, we obtain a realization of f{η*(v)}. With M such independent realizations, one may use the standard sample covariance matrix estimate Σ̂ (v) or a robust version thereof to estimate Σ(v). This, coupled with the normal approximation to the distribution of f{η(v)}, provides confidence intervals for fk(v)}. A two sided (1 − α) confidence interval for ηk(v) is

f1[f{η^k(v)}±z(1α/2)(nh)1/2σ˜k(v)], (3.2)

where f(·) is the logit function, z(1−α/2) is the (1 − α/2) quantile of the standard normal distribution and σ̃k(v) is the standard error estimate from the kth diagonal element of Σ̂(v). Note that joint confidence regions for {ηk(v), k = 1, … ,K − 1} can also be obtained by considering a sup-type statistic: supk=1, … ,K−1|η̂k(v) − ηk(v)| to choose the cutoff point for the confidence intervals (3.2).

To construct a (1 − α) simultaneous confidence band for ηk(v) over a pre-specified interval ℐ of v, we cannot use use the conventional method based on a sup-statistic, supvσ˜k1(v)|(nh)1/2{η^k(v)ηk(v)}| due to the fact that as a process in v, the limiting distribution of (nh)1/2k(v) − ηk(v)} does not exist. On the other hand, one may utilize the strong approximation argument given in Bickel and Rosenblatt (1973) to show that the appropriately scaled sup of a specific transformation of η̂k(v) converges to a proper random variable in distribution. In practice, a (1 − α) simultaneous confidence band for {ηk(v), v ∈ ℐ} is

f1[f{η^k(v)}±cα(nh)1/2σ˜k(v)], (3.3)

where cα is obtained via the following equation:

pr(supvσ˜k1(v)|(nh)1/2[f{ηk*(v)}f{η^k(v)}]|<cα)=1α,

and {ηk*(v),v} is obtained by the above perturbation-resampling method with the same set of {Bi, i = 1, … , n}. The justification of adequacy of this approximation is given in web based appendix B. Note that unlike the pointwise confidence interval estimation for ηk(v), it does not seem trivial to generalize the above simultaneous confidence interval estimation for all k = 1, … ,K − 1.

Like any typical nonparametric function estimation problem, it is important to know how to choose the smooth parameter h in practice. Here, we propose a J–fold cross-validation method to choose an optimal h value. To this end, we first randomly divide the entire data set D into J mutually exclusive, roughly equally sized subsets, say, D1, … ,DJ. Let the set of observations in D, but not in Dj, be denoted by D(−j), j = 1, … , J. We construct the scoring system using β̂(−j) estimated with the observations in D(−j). Next, for a fixed h value, let the corresponding nonparametric estimator for ηk(v) be η̂(k,−j)(v). With these subject-specific risk estimates, we compute the log-likelihood function with observations in Dj :

j(h)=lDjwlG^(Tlt0)[k=1KYlklog{η^(k,j)(V^(l,j))}], (3.4)

where η^(K,j)(v)=1k=1K1η^(k,j)(v) and V^(l,j)=β^(j)Xl,lDj. Now, let cv(h)=j=1Jj(h). We choose the maximizer hop of ℓcv(h) as an “optimal” choice of the smooth parameter h.

It follows from the argument in Härdle et al. (1988), we expect that the above hop is in the order of Op(n−1/5). To ensure that the validity of the aforementioned large sample properties for ηk(v), one may choose a smooth parameter h = hop × n−ξ where 0 < ξ < 3/10. In practice, we find that the resulting nonparametric estimator performs well with ξ = 0.1.

4. AN ILLUSTRATIVE EXAMPLE

We use the new proposal to study a subset of the data from the DES trial discussed in the Introduction section. This data set consists of patient-level observations from the high DES dose groups. There were 242 patients in these groups with a median followup time of 63 months. Here, is the time from randomization to one of K = 4 competing events, and ε = 1, for prostate cancer death; = 2, for cardiovascular related death; and = 3, for other causes of death; = 4, for surviving beyond t0. At the end of the study, there were 48, 78 and 34 patients died due to prostate cancer, heart diseases and other causes, respectively. The baseline covariate vector U includes Age (AG), weight index (WT), performance rating (PF), history of cardiovascular disease (HX), serum hemoglobin (HG), size of primary lesion (SZ) and Gleason score (SG). Since this data set was analyzed in the past using a discretized coding system for the covariates due to an easy clinical interpretation (Byar and Green, 1980; Cheng et al., 1998), we followed the same system in our analysis. For convenience to readers, the coding for covariates is summarized in Table 1.

Table 1.

Coding of the covariates for the prostate cancer data

value

Variable 0 1 2
AG < 75 years 75–79 years ≥ 80 years
WT ≥ 100 80–99 < 80
PF Normal Limited
HX No Yes
HG ≥ 12g/100ml 9–11.9 g/100ml <9 g/100ml
SZ < 30 cm2 ≥ 30 cm2
SG ≥ 10 > 10

First, we consider a case for predicting the subject-level relatively long term competing risks. To this end, let t0 = 5 (years). Since the primary endpoint of the study is the time to prostate cancer death, we fitted the data with a working model (2.2) by letting X = (1, U)′ and g be the anti-logit function. The point estimate β̂ for β via (2.4) is given in Table 2 (a). Although only WY, SZ and SG are statistically significant with this working model, we used the entire covariate vector U to build the risk scoring V = β̂′X. In Figure 1(a), we present a smoothed density estimate of , which is a bimodal function. Most study subjects are clustered around = −4.5 and −0.9.

Table 2.

Regression Coefficient Estimates for Model (2.2) with the data from the high dose groups

(a) Time point t0 = 5 years

Coefficient Estimate Std. Error p-value
Intercept −4.64 0.79 < 0.01
AG −0.07 0.31 0.81
WT 0.66 0.37 0.07
PF 0.56 0.61 0.35
HX −0.56 0.46 0.23
HG 0.46 0.42 0.27
SZ 1.76 0.50 < 0.01
SG 3.37 0.75 < 0.01
(b) Time point t0 = 2 years

Coefficient Estimate Std. Error p-value
Intercept −5.87 1.12 < 0.01
AG −0.18 0.39 0.63
WT 0.74 0.40 0.06
PF −0.15 0.69 0.82
HX 0.29 0.54 0.58
HG 1.19 0.45 < 0.01
SZ 1.12 0.55 0.045
SG 3.25 1.05 < 0.01

Figure 1.

Figure 1

Consistent estimates (solid curve), pointwise 0.95 confidence intervals (enclosed by dotted curves) and simultaneous intervals (gray area) for various risks ηk(v) at t0 =5 years: (a) The density function for the index score; (b) Inference for η1(v); (c) inference for η2(v) ; (d) Inference for η3(v).

To estimate ηk(v), k = 1, 2, 3, we let the kernel Kh(·) for η̂k(v) be the standard Epanechnikov function. The smoothing parameter h was chosen by minimizing ℓcv(h) defined in Section 3 with a 10-fold cross-validation procedure. This results in h = 0.97. Lastly we let the 2nd and 98th percentiles of the empirical distribution of be the the boundary points of ℐ. We then constructed pointwise and simultaneous confidence intervals for {ηk(v), v ∈ ℐ} with M = 1000 realizations of the random sample from the unit exponential for the perturbation-resampling procedure. In Figure 1(b), for the prostate cancer 5-year mortality rate estimation, we present the point estimates {η1(v), v ∈ ℐ} with the solid curve, and the 0.95 pointwise intervals (enclosed by dotted curves) and simultaneous band (gray shaded zone). For example, the estimated average prostate cancer mortality rate for patients with an index score of −4.5 was 0.012 with a 95% simultaneous confidence interval of (0.0006,0.17) and a 95% pointwise confidence interval of (0.002,0.05), while the estimated average prostate cancer mortality rate for patients with an index score of −0.9 was 0.35 with a 95% simultaneous confidence interval of (0.27,0.41) and a 95% pointwise confidence interval of (0.30,0.38). In Figure 1(c)(d), we present their counterparts with respect to cardiovascular disease related death and death from other causes.

Note that the 5-year rate from “other causes” is rather at over v. On the other hand, patients with low risk scores (< −2), the prostate cancer death rates are low. However, the CV mortality rates are high. Therefore, for this group of future patients treated by DES high doses, one would closely monitor the patients’ CV functions. For patients with high risk score (> −2), it seems that a high dose DES may not be a good choice for treating prostate cancer.

Now, suppose that we are also interested in predicting a short term competing risk profile. To this end, we let t0 = 2 (years). We fitted the data with a parametric working model (2.2). Here, X = (1, U)′. The estimated regression coefficients are given in Table 2(b). Note that these estimates appear to be markedly different from those for the case with t0 = 5 (years), suggesting that the risk score system may depend on the time point of interest. Using the same setting as that for the above long term competing risk prediction problem, the resulting smoothing parameter value h is 1.29. The corresponding profiles for the dependent risks are given in Figure 2. For the present case, the mortality rates for CV death or “other causes” death are relatively at over the entire index score. On the other hand, it appears that the high dose DES works well for patients whose risk scores are lower than −2.

Figure 2.

Figure 2

Consistent estimates (solid curve), pointwise 0.95 confidence intervals (enclosed by dotted curves) and simultaneous intervals (gray area) for various risks ηk(v) at t0 =2 years: (a) The density function for the index score; (b) Inference for η1(v); (c) inference for η2(v) ; (d) Inference for η3(v).

5. A NUMERICAL STUDY FOR EXAMINING THE PERFORMANCE OF THE NEW PROCEDURE

We conducted an extensive numerical study to examine the performance of our proposal under practical settings. Instead of using the proposed two stage procedure, one may adapt a generalized additive model (GAM) approach for multinomial outcome data (Yee and Wild, 1996; Yee, 2010) to estimate the cell probability πk(U) in the presence of censoring. We found that our procedure generally performs better than such a semi-parametric method. For example, in one of simulation settings, we mimicked the prostatic cancer example and considered the case with K = 4 and a 3 × 1 vector of covariates, U = (U(1), U(2), U(3))′. For each subject, we first generated the correlated “latent” times D1,D2, and D3 to three distinct deaths (the fourth cell is for survivors beyond t0) via the following log-liner model:

logD1  =  α1U(1)+α2U(2)+e1,logD2  =  α1U(2)+α2U(3)+e2,logD3  =  α1U(3)+α2U(1)+e3,

where the random error (e1, e2, e3) follows a trivariate normal distribution with means being 0, variances being 1 and the covariances being ρ = 0.5, and the baseline covariates U(1), U(2) and U(3) are independent standard normals. Then is the minimum of D1,D2 and D3 and the event indicator ε is defined accordingly.

The censoring time C was assumed to be Unif [0, ξ], where ξ was chosen to yield a certain pre-specified censoring level with the corresponding parameter values α’s in the above model. For each simulation setting, 10,000 simulated data sets were generated to examine the performance of our procedure.

With each simulated data set and a fixed t0, the estimate of πk(Ui) for the ith subject using our procedure is η̂k(β̂′Xi), where Xi = (1, Ui)′, and β̂ and η̂k(·) were obtained via (2.4) and (2.10), respectively. We calculated 10,000 mean squared errors

1ni=1nk=1K1{η^k(β^Xi)πk(Ui)}2.

and then used the resulting sample mean, the empirical MSE (EMSE) to measure the performance of our procedure.

For the GAM approach, we used the following conventional working model:

log{πk(U)/π4(U)}=g0k+g1k(U(1))+g2k(U(2))+g3k(U(3)),k=1,2,3,

where g0k is a unknown constant, and the functions glk are completely unspecified, but E{glk(U(l))} = 0,l = 1, 2, 3. To fit the censored data, we adapted the GAM methodology for multinomial data developed by Yee and Wild (1996). Specifically, we multiplied the weight matrix in their objective function [denoted by Wi in eq. (6) of Yee and Wild (1996)] by a factor of wi/Ĝ(Tit0) [defined below (2.4)] to adjust for right censoring. Then, we used a B-spline with 3 degrees of freedom for each function glk(·) via the VGAM algorithm (Yee, 2010). With 10,000 simulated data sets, we calculated the corresponding EMSE with the GAM procedure.

In Table 3, we report the EMSEs under various sample sizes n, α’s in modeling the latent times, and ξ for censoring. For all the cases, our procedure has smaller EMSEs than the GAM-based method. Note that the GAM, though flexible, is only a working model. Therefore, we do not expect the resulting estimators for the cell probabilities to be consistent.

Table 3.

Empirical mean squared errors for the new method and a procedure based on generalized additve models with parameters α ’s for estimating {πk(·)} at time point t0 with sample size n and censoring distribution U(0, ξ)

α1 = α2 = 1 (n = 100)
(t0, ξ) (2.7, 15) (5, 15) (1.25, 15) (2.7, 25) (5, 25) (1.25, 25)

Proposed Method 0.125 0.204 0.052 0.128 0.202 0.064
GAM glogit 0.191 0.344 0.089 0.219 0.276 0.106
α1 = α2 = 1 (n = 200)
(t0, ξ) (2.7, 15) (5, 15) (1.25, 15) (2.7, 25) (5, 25) (1.25, 25)

Proposed Method 0.116 0.181 0.050 0.118 0.195 0.055
GAM glogit 0.222 0.267 0.073 0.192 0.277 0.084
α1 = α2 = 1 (n = 400)
(t0, ξ) (2.7, 15) (5, 15) (1.25, 15) (2.7, 25) (5, 25) (1.25, 25)

Proposed Method 0.110 0.161 0.048 0.116 0.179 0.046
GAM glogit 0.185 0.250 0.074 0.164 0.258 0.068
α1 = 0.5 α2 = 2.5 (n = 100)
(t0, ξ) (2.7, 15) (5, 15) (1.25, 15) (2.7, 25) (5, 25) (1.25, 25)

Proposed Method 0.130 0.186 0.059 0.119 0.203 0.055
GAM glogit 0.228 0.342 0.093 0.181 0.303 0.102
α1 = 0.5 α2 = 2.5 (n = 200)
(t0, ξ) (2.7, 15) (5, 15) (1.25, 15) (2.7, 25) (5, 25) (1.25, 25)

Proposed Method 0.119 0.174 0.051 0.112 0.201 0.050
GAM glogit 0.183 0.254 0.076 0.179 0.278 0.088
α1 = 0.5 α2 = 2.5 (n = 400)
(t0, ξ) (2.7, 15) (5, 15) (1.25, 15) (2.7, 25) (5, 25) (1.25, 25)

Proposed Method 0.117 0.166 0.046 0.112 0.180 0.044
GAM glogit 0.173 0.242 0.077 0.176 0.242 0.076

6. REMARKS

It is important to note that in this paper the index scoring system is constructed based on the contrast between the primary event rate and the average of all other competing risks at a specific time point via (2.2). In general, it is difficult, if not impossible, to create a univariate scoring system for grouping the subjects, which is sensitive to differentiating subject-level risks of all causes. On the other hand, for some specific situations, one may be able to construct a “sharper” index score. For example, in the DES study, since we are particularly concerned about the fatal cardiovascular risks with the high DES dose treatment, for each subject a modified score may be defined as a contrast of two univariate scores, one is β̂′X utilized in this article, and the other one is derived by modeling the CV death rate π2(U) via (2.2).

In this paper, we are interested in estimating the competing risks at a fixed time point (or a set of time points). We find that in general, for a subject with a covariate vector U, its score index for predicting long term risks can be quite different from that for short term risks. If a single score system is needed without a specific set of time points of interest, one may fit the data with a Cox-type model for the conditional cumulative incidence function (Fine and Gray, 1999; Cheng et al., 1998), say, for example, of the time to prostate cancer death in the DES example. The resulting risk estimates η̂ k(v), k = 1, … ,K − 1, in (2.5) are functions of time t. It would be interesting to examine the properties of these estimates as processes of t for a fixed risk index v. Cheng et al. (1998) proposed parametric counterparts of such estimators, but their estimators are likely biased when the models are not correctly specified.

ACKNOWLEDGEMENTS

The authors thank the Editor (Professor Thomas A. Louis), the AE and one anonymous referee for their insightful suggestions. The work was partially supported by the NIH grants.

Footnotes

Supplementary Materials

Web Appendices A and B, referenced in Section 3 are available under the Paper Information link at the Biometrics website http:www.tibs.org/biometrics.

REFERENCES

  1. Andersen PK. Competing risks as a multi-state model. Statistical Methods in Medical Research. 2002;11:203–215. doi: 10.1191/0962280202sm281ra. [DOI] [PubMed] [Google Scholar]
  2. Bair E, Tibshirani R. Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data. PLos. 2004;2(4):e108. doi: 10.1371/journal.pbio.0020108. doi:10.1371/journal.pbio.0020108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bair E, Hastie T, Paul D, Tibshirani R. Prediction by supervised principal components. JASA. 2006;101:119–137. [Google Scholar]
  4. Benichou J, Gail M. Variance calculations and confidence intervals for estimates of the attributable risk based on logistic models. Biometrics. 1990;46:991–1003. [PubMed] [Google Scholar]
  5. Bickel P, Rosenblatt M. On Some Global Measures of the Deviations of Density Function Estimates. Annals of Statistics. 1973;1:1071–1095. [Google Scholar]
  6. Byar DP, Green SB. The choice of treatment for cancer patients based on covariate information. Bulletin du Cancer. 1980;67:477–490. [PubMed] [Google Scholar]
  7. Cai T, Tian L, Uno H, Solomon S, Wei LJ. Calibrating Parametric Subject-specific Risk Estimation. Biometrika. 2010 doi: 10.1093/biomet/asq012. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cheng SC, Fine J, Wei LJ. Prediction of Cumulative Incidence Function under the Proportional Hazards Model. Biometrics. 1998;54:219–228. [PubMed] [Google Scholar]
  9. Cox D. Regression Models and Life-Tables. Journal of the Royal Statistical Society. Series B (Methodological) 1972;34:187–220. [Google Scholar]
  10. Fan J, Gijbels I. Local polynomial modelling and its applications. Boca Raton, Florida: CRC Press; 2006. [Google Scholar]
  11. Fine J. Regression modeling of competing crude failure probabilities. Biostatistics. 2001;2:85–97. doi: 10.1093/biostatistics/2.1.85. [DOI] [PubMed] [Google Scholar]
  12. Fine J, Gray R. A Proportional Hazards Model for the Subdistribution of a Competing Risk. JASA. 1999;94:496–514. [Google Scholar]
  13. Gaynor JJ, Feuer E, Tan C, Wu D, Little C, Straus D. On the Use of Cause-Specific Failure and Conditional Failure Probabilities: Examples From Clinical Oncology Data. Journal of the American Statistical Association. 88:400–409. [Google Scholar]
  14. Gelman R, Gelber R, Henderson IC, Coleman CN, Harris JR. Improved methodology for analyzing local and distant recurrence. Journal of Clinical Oncology. 1990;8:548–555. doi: 10.1200/JCO.1990.8.3.548. [DOI] [PubMed] [Google Scholar]
  15. Gilbert PB, Wei LJ, Kosorok MR, Clemens JD. Simultaneous Inferences on the Contrast of Two Hazard Functions with Censored Observations. Biometrics. 2004;58:773–780. doi: 10.1111/j.0006-341x.2002.00773.x. [DOI] [PubMed] [Google Scholar]
  16. Goldhirsch A, Gelber RD, Price KN, et al. Effect of systemic adjuvant treatment of first sites of breast cancer relapse. Lancet. 1994;343:377–381. doi: 10.1016/s0140-6736(94)91221-1. [DOI] [PubMed] [Google Scholar]
  17. Härdle W, Hall P, Marron JS. How far are automatically chosen regression smoothing parameters from their optimum? with discussion. J. Amer. Statist. Assoc. 1988;83:86–101. [Google Scholar]
  18. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. John Wiley & Sons; 2002. [Google Scholar]
  19. Klein JP, Andersen P. Regression modeling of competing risks data based on pseudovalues of the cumulative incidence. Biometrics. 2005;61:223–229. doi: 10.1111/j.0006-341X.2005.031209.x. [DOI] [PubMed] [Google Scholar]
  20. Klein J. Modelling competing risks in cancer studies. Statistics in Medicine. 2006;25:1015–1034. doi: 10.1002/sim.2246. [DOI] [PubMed] [Google Scholar]
  21. Korn E, Dorey F. Applications of crude incidence curves. Statistics in Medicine. 1992;11:813–829. doi: 10.1002/sim.4780110611. [DOI] [PubMed] [Google Scholar]
  22. Lawless J. Statistical models and methods for lifetime data. New Jersey: John Wiley & Sons, Inc.; 2003. [Google Scholar]
  23. Li Y, Tiwari R, Guha S. Mixture Cure Survival Models with Dependent Censoring. Journal of the Royal Statistical Society: Series B. 2007;69:285–306. [Google Scholar]
  24. Mammen E. Bootstrap and Wild Bootstrap for High Dimensional Linear Models. Annals of Statistics. 1993;21:255–285. [Google Scholar]
  25. Pepe M, Mori M. Kaplan - meier, marginal or conditional probability curves in summarizing competing risks failure time data? Statistics in Medicine. 1993;12:737–751. doi: 10.1002/sim.4780120803. [DOI] [PubMed] [Google Scholar]
  26. Scheike T, Zhang M, Gerds T. Predicting cumulative incidence probability by direct binomial regression. Biometrika. 2008;95:205–220. [Google Scholar]
  27. Tian L, Zucker D, Wei LJ. On the Cox Model With Time-Varying Regression Coeffcients. Journal of the American Statistical Association. 2005;100:172–183. [Google Scholar]
  28. Tian L, Cai T, Goetghebeur E, Wei LJ. Model evaluation based on the sampling distribution of estimated absolute prediction error. Biometrika. 2007;94:297–311. [Google Scholar]
  29. Tusnady G. A remark on the approximation of the sample distribution function in the multidimensional case. Periodica Mathematica Hungarica. 1977;8:53–55. [Google Scholar]
  30. Tsiatis A. A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences. 1975;72:20–22. doi: 10.1073/pnas.72.1.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Uno H, Cai T, Tian T, Wei LJ. Evaluating Prediction Rules for t-Year Survivors With Censored Regression Models. Journal of the American Statistical Association. 2007;102:527–537. [Google Scholar]
  32. Yee TW, Wild CJ. Vector generalized additive models. Journal of the Royal Statistical Society, Series B. 1996;58:481–493. [Google Scholar]
  33. Yee TW. The VGAM package for categorical data analysis. Journal of Statistical Software. 2010;32:1–34. [Google Scholar]
  34. Zheng Y, Heagerty P. Prospective Accuracy for Longitudinal Markers. Biometrics. 2007;63:332–341. doi: 10.1111/j.1541-0420.2006.00726.x. [DOI] [PubMed] [Google Scholar]

RESOURCES