Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic

Ming Wang; Qi Long

doi:10.1111/biom.12470

. Author manuscript; available in PMC: 2016 Sep 7.

Published in final edited form as: Biometrics. 2016 Jan 12;72(3):897–906. doi: 10.1111/biom.12470

Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic

Ming Wang ¹, Qi Long ²

PMCID: PMC4940324 NIHMSID: NIHMS757525 PMID: 26756274

Summary

Prediction models for disease risk and prognosis play an important role in biomedical research, and evaluating their predictive accuracy in the presence of censored data is of substantial interest. The standard concordance (c) statistic has been extended to provide a summary measure of predictive accuracy for survival models. Motivated by a prostate cancer study, we address several issues associated with evaluating survival prediction models based on c–statistic with a focus on estimators using the technique of inverse probability of censoring weighting (IPCW). Compared to the existing work, we provide complete results on the asymptotic properties of the IPCW estimators under the assumption of coarsening at random (CAR), and propose a sensitivity analysis under the mechanism of noncoarsening at random (NCAR). In addition, we extend the IPCW approach as well as the sensitivity analysis to high-dimensional settings. The predictive accuracy of prediction models for cancer recurrence after prostatectomy are assessed by applying the proposed approaches. We find that the estimated predictive accuracy for the models in consideration is sensitive to NCAR assumption, and thus identify the best predictive model. Finally, we further evaluate the performance of the proposed methods in both settings of low-dimensional and high-dimensional data under CAR and NCAR through simulations.

Keywords: Predictive accuracy, Survival endpoint, Concordance statistic, Inverse probability of censoring weighting, Coarsening at random, Noncoarsening at random, Sensitivity analysis

1. Introduction

Prediction models play an important role for assessing disease risk and prognosis (Freedman et al., 2005). They can be used to identify individuals at high risk, plan interventional trials, and improve personalized prevention and treatment strategies. Furthermore, they can also estimate the population burden, the cost of disease, and the impact of potential interventions or treatments. As a result, research in prediction models for disease risk and prognosis has drawn substantial interest in recent years (Freedman et al., 2009; Optenberg et al., 1997).

1.1 A Prostate Cancer Study

This work is motivated by a prostate cancer study where the outcome of interest is the time (months) to prostate cancer recurrence after prostatectomy. One of the main objectives is to find out an effective prediction model of future tumor recurrence after surgery to determine whether an immediate adjuvant therapy is warranted, thus a valid predictive accuracy evaluation is of the utmost importance. In this study, patients diagnosed with prostate cancer underwent radical prostatectomy, and their prostatectomy specimens were immediately collected after the surgery to be formalin-fixed and paraffin-embedded (FFPE). Subsequently, the RNA samples were isolated from these specimens with known long-term outcomes, and then mRNA and microRNA expression profiling were performed to obtain gene expression data by using a custom-designed panel of 1,536 probes (called biomarkers thereafter) of 522 prostate cancer relevant genes. In particular, a novel expression profiling platform, the DASL (cDNA-mediated Annealing, Selection, extension and Ligation) assay, was applied to allow a quantitative analysis of RNA from FFPE samples. In addition, the relevant clinical biomarkers were also collected, among which prostate-specific antigen (PSA) and Gleason scores were taken as key consideration due to their known association with prostate cancer risk. Using an earlier version of the prostate cancer data, Long et al. (2011) developed a robust partial linear accelerated failure time model relating recurrence risk to genomic biomarkers and the above two clinical risk factors. Later on, Long et al. (2011) identified a panel of protein-coding and microRNA biomarker signatures to predict cancer recurrence following prostatectomy. However, in these work, the predictive accuracy of the models was evaluated under a strong, likely an unrealistic assumption of independent censoring.

Since censoring is a special case of coarsened data, throughout we adopt the terminology from coarsened data to define the censoring mechanism (Heitjan and Rubin, 1991): 1) coarsening complete at random (CCAR), i.e., censoring is independent of the failure times; 2) coarsening at random (CAR), i.e., censoring is independent of the failure times conditional on the observed covariates; and 3) non-coarsening at random (NCAR), i.e., censoring is dependent on the failure times conditional on the observed covariates. The goal of our work is to evaluate predictive accuracy of the models adjusted for the censoring mechanisms of CAR and NCAR in the settings of low-dimensional and high-dimensional data.

1.2 Assessing Predictive Accuracy

In order to identify the optimal prediction model for subsequent applications in practice, it is crucial to develop robust predictive accuracy metrics for assessing and comparing different prediction models (Steyerberg et al., 2010). While it is fairly straightforward to assess prediction performance when all data are fully observed, it tends to be considerably challenging in the presence of censored data, which are ubiquitous in biomedical and epidemiologic research. In addition, it is common to assess prediction models for the censored data in observational studies where the censoring mechanism is likely to be CAR or NCAR (Siannis et al., 2005). Even if assessment is performed using the data from clinical trials, the censoring mechanism could still be CAR or NCAR due to informative dropouts or missing (Scharfstein et al., 1999). For such situations, the prediction assessment can be biased due to the ignorance of adjustment on censoring mechanisms.

While a considerable amount of work has been reported on development of predictive accuracy metrics (D’Agostino et al., 1997), there has been limited work on predictive accuracy metrics for censored data. The standard concordance (c) statistic as one of predictive accuracy measures has been extended to censored data (Harrell et al., 1996). For instance, Pencina and D’Agostino (2004) provided some insight on the relationship between the c–statistic and Kendall’s τ, and investigated its asymptotic normality; Gonen and Heller (2005) derived an analytical expression for the concordance probability under the framework of Cox Proportional Hazards (PH) models (Cox, 1972), which was robust to censoring but not invariant to monotone transformations of survival times. However, all of those have been developed under CCAR and/or limited to Cox PH models. As mentioned before, under CAR or NCAR, the standard c–statistic potentially leads to biased assessment of predictive accuracy or an estimate with a large variability (Uno et al., 2011; Gerds et al., 2013). Some recent work has attempted to modify the standard c–statistic. Uno et al. (2011) proposed a modified c–statistic by using the technique of inverse probability of censoring weighting (IPCW) that is valid under CCAR or CAR, where censoring weights were obtained by Kaplan-Meier (KM) estimates. Along similar lines, Gerds et al. (2013) proposed an alternative time-dependent IPCW c–statistic that is valid under CAR, which is closely related to our current work; though, they only provided the theoretical results on the consistency of the proposed estimator, and did not consider several complicated issues, such as NCAR and high-dimensional data.

As shown in the motivating study in Section 1.1, high-dimensional data such as genomic or proteomic data are commonly collected in biomedical research studies, presenting a challenge for evaluating prediction models because these data may provide additional information that is not captured by clinical variables. In particular, it is not trivial to incorporate the survival probabilities of censoring times particularly in the presence of high-dimensional data when estimating the IPCW c–statistic. While modern regularization methods have been developed including lasso (Tibshirani, 1997), ridge (Hoerl and Kennard, 1970), adaptive lasso (Zou, 2006), elastic net (Zou and Hastie, 2005) and so on, there is still not clear whether these techniques can improve the efficiency of the IPCW c–statistic. In the current work, we focus on lasso and elastic net, and figure out their applicability for obtaining the censoring weights when assessing prediction models in the high-dimensional setting. Furthermore, considering that censoring is likely to be CAR or even NCAR in the motivating study, we make the following contributions in addressing several issues associated with evaluation of prediction models with survival endpoints based on the IPCW c–statistic. First, we provide more complete theoretical results on the asymptotic properties of the IPCW c–statistic under CAR. Second, we propose a sensitivity analysis approach under NCAR. Third, we extend the proposed approaches to the case of high-dimensional data.

The remainder of the article is organized as follows. In Section 2, we formulate the problem, and introduce several types of c–statistics for censored data. We provide the asymptotic results for the extended IPCW c–statistics under CAR, propose a sensitivity analysis under NCAR, and investigate extensions of the proposed approaches to the setting of high-dimensional data. In Section 3, we apply the proposed approaches to the motivating study on prostate cancer. In Section 4, simulation results for comparing different estimators are provided, and the advantages of the IPCW c–statistic are illustrated afterwards. Note that our simulation studies are more extensive compared to the existing work by considering both of Cox PH models and accelerated failure time (AFT) models as well as high-dimensional settings. The conclusion with a brief discussion is provided in Section 5.

2. Methodology

Let 𝓓 _train and 𝓓 _test denote a training data set and a testing data set, which are independent with each other. 𝓓 _train containing n observations is used to build up a prediction model and 𝓓 _test including m observations is utilized to evaluate its prediction performance, where m/n is bounded away from 0 and ∞. 𝓓 _train and 𝓓 _test are obtained through either randomly splitting one data set into two or using data sets collected in two independent studies. The censoring mechanisms in the former case are expected to be the same in both 𝓓 _train and 𝓓 _test, whereas they may be different in the latter case. Without loss of generality, we assume that m = n, simplifying the derivation of the asymptotic results.

For observation i in 𝓓 = (𝓓 _train, 𝓓 _test), we denote the true failure time by T_i which is subject to censoring, the censoring time by C_i, the observed survival time by Y_i = min(T_i, C_i), and a set of p covariates by Z_i that are potentially associated with T_i and/or C_i. We also define δ_i = I(T_i ≤ C_i) as the event indicator. The observed data in 𝓓 are ${Y_{i}, δ_{i}, Z_{i}}_{i = 1}^{2 n}$ , where (Y_i, δ_i, Z_i)’s are assumed to be independent.

2.1 Predictive Accuracy Metric: Truncated c–statistic

Suppose that we are interested in assessing the predictive accuracy of any particular model for time-to-event data, denoted by ℳ₁, which is parameterized by β. ℳ₁ may be correctly-specified or mis-specified, a parametric or semi-parametric model, and a linear, partly linear, nonlinear, or varying coefficient regression model. For observation i, let S_i denote the predictive score under ℳ₁ with a higher value representing a higher survival probability. Thus, given any two independent observations 1 and 2, the predictive accuracy metric of interest for a pre-specified time point t is defined by the truncated c–statistic as

c (t) = P r (S_{1} < S_{2} ∣ T_{1} < T_{2}, T_{1} \leq t) .

(1)

where c(t) = 1 indicates the perfect discrimination. Of note, the truncated c–statistic is preferred for right censored data primarily because the tail part of the estimated survival function of the failure time is known to be unstable (Heagerty and Zheng, 2005; Uno et al., 2011; Gerds et al., 2013).

2.2 Estimation of Truncated c–statistic

The parameter estimates in ℳ₁, denoted by β̂, are obtained using 𝓓 _train through an estimation procedure, e.g., the partially likelihood approach for a Cox PH model. The corresponding estimated model is denoted by ℳ̂₁. For observation i in 𝓓 _test, the estimated predictive score under ℳ₁ is defined as S̃_i = ξ(β̂; Z_i), where ξ(·) is a smooth and monotone function determined by the model specification. For instance, if ℳ₁ is a Cox PH model, h(t; β, Z) = h₀(t) exp(β^TZ) with h₀(·) as the baseline hazard function, we can define S̃_i = −β̂^TZ_i; if ℳ₁ is a parametric AFT model, log(t; β, Z) = β^TZ + ε with the random error ε following a given parametric distribution, we can define S̃_i = β̂^TZ_i. Assume that β̂ converges in probability to a constant β₀, even if ℳ₁ is incorrectly specified; this holds under a mild non-separable condition when ℳ₁ is a Cox PH model (Uno et al., 2011; Zeng et al., 2004). It follows that the true predictive score under ℳ₁ is S_i = ξ(β₀; Z_i).

Under CCAR, one estimator of truncated c(t) is defined as

{\hat{c}}_{s}^{1} (\hat{β}; t) = \frac{\sum_{i \neq j} δ_{i} δ_{j} I ({\tilde{S}}_{i} < {\tilde{S}}_{j}, Y_{i} < Y_{j}, Y_{i} \leq t)}{\sum_{i \neq j} δ_{i} δ_{j} I (Y_{i} < Y_{j}, Y_{i} \leq t)},

(2)

where only the pairs with both observed survival times are used. If the pairs with only one censored outcome are also considered, it leads to the standard estimator of truncated c(t)

{\hat{c}}_{s}^{2} (\hat{β}; t) = \frac{\sum_{i \neq j} δ_{i} I ({\tilde{S}}_{i} < {\tilde{S}}_{j}, Y_{i} < Y_{j}, Y_{i} \leq t)}{\sum_{i \neq j} δ_{i} I (Y_{i} < Y_{j}, Y_{i} \leq t)} .

(3)

Note that both of these two estimators are consistent under a set of restrictive conditions including CCAR as described in Pencina and D’Agostino (2004), Uno et al. (2011), and Gerds et al. (2013).

To account for the censoring mechanism in evaluation of predictive accuracy, we consider a working model denoted by ℳ₂ relating C to Z, which is parameterized by η and represents the censoring mechanism. For example, we can use a Cox PH model for ℳ₂ with the conditional survival function for C as $π (c ∣ Z; η) = exp {- exp (η^{T} Z) \int_{0}^{c} h_{0} (u) d u}$ . The estimator of η, denoted by η̂, can be obtained using 𝓓 _test alone or using the whole data 𝓓 if the same censoring mechanism is anticipated in both 𝓓 _train and 𝓓 _test. Based on ℳ₂, we define the condition survival function for C by π(c|Z; η) = Pr(C > c|Z; η). Under CCAR, π(c|Z; η) = π(c). Under CAR, one estimator of truncated IPCW c(t) using only the pairs with both observed survival times is defined as

{\hat{c}}_{g}^{1} (\hat{β}; t) = \frac{\sum_{i \neq j} \frac{δ_{i} δ_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} I ({\tilde{S}}_{i} < {\tilde{S}}_{j}, Y_{i} < Y_{j}, Y_{i} \leq t)}{\sum_{i \neq j} \frac{δ_{i} δ_{j}}{{\hat{π}}_{i} {\hat{π}}_{j}} I (Y_{i} < Y_{j}, Y_{i} \leq t)},

(4)

where π̂_i = π(Y_i|Z_i; η̂) and π̂_j = π(Y_j|Z_j, η̂) based on the observed data from 𝓓 _test or 𝓓 . Similarly, the estimator of truncated IPCWc(t) as the extension of ${\hat{c}}_{s}^{2}$ is defined as

{\hat{c}}_{g}^{2} (\hat{β}; t) = \frac{\sum_{i \neq j} \frac{δ_{i}}{{\hat{π}}_{i} {\hat{π}}_{j}^{*}} I ({\tilde{S}}_{i} < {\tilde{S}}_{j}, Y_{i} < Y_{j}, Y_{i} \leq t)}{\sum_{i \neq j} \frac{δ_{i}}{{\hat{π}}_{i} {\hat{π}}_{j}^{*}} I (Y_{i} < Y_{j}, Y_{i} \leq t)},

(5)

with ${\hat{π}}_{j}^{*} = π (Y_{i} ∣ Z_{j}; \hat{η})$ . As alluded to previously, ${\hat{c}}_{g}^{2} (\hat{β}; t)$ has a similar format to the estimators investigated in Uno et al. (2011) and Gerds et al. (2013) but with different models for estimating censoring weights. In the following sections, we investigate several issues associated with these estimators that have not been investigated yet.

2.3 Asymptotic Properties of ${\hat{c}}_{g}^{2} (\hat{β}; t)$ under CAR

Theorem 2.1

Under the regularity conditions of (C1)–(C3) in Web Appendix, ${\hat{c}}_{g}^{2} (\hat{β}; t)$ is a consistent estimator of c(t), i.e., ${\hat{c}}_{g}^{2} (\hat{β}; t) \overset{p}{\to} c (t)$ . Furthermore, $\sqrt{n} ({\hat{c}}_{g}^{2} (\hat{β}; t) - c (t))$ converges weakly to a zero-mean Gaussian process.

The asymptotic properties of ${\hat{c}}_{g}^{2} (\hat{β}; t)$ under CAR are provided in Theorems 2.1 above. Gerds et al. (2013) reported similar results on consistency but our results on asymptotic normality are new. A sketch of the proof for Theorems 2.1 is provided in the Supplementary Materials. The asymptotic results for ${\hat{c}}_{g}^{1} (\hat{β}; t)$ can be derived along similar lines.

2.4 Sensitivity Analysis under NCAR

In the case of NCAR, we propose a sensitivity analysis approach, similar in spirit to the work by Siannis et al. (2005) and by Long et al. (2011). Specifically, we extend ℳ₂ for C to include T as a predictor, and it follows that the conditional survival function for C becomes π(C|Z, T; η, α) with the sensitivity parameter α quantifying the association between C and T conditional on Z. Since T is not observed for all subjects, (η, α) cannot be jointly estimated from the observed data. As a result, α is fixed to a set of values in sensitivity analysis. Based on the extended ℳ₂, we obtain censoring weights as π̂_i = π(C_i|Z_i, T_i; η̂, α), and then compute ${\hat{c}}_{g}^{1} (\hat{β}; t)$ and ${\hat{c}}_{g}^{2} (\hat{β}; t)$ for a range of pre-specified α values.

To illustrate the idea more clearly, we consider the case where ℳ₂ is a Cox PH model with the conditional hazard function h(c|Z, T) = h₀(c) exp (η^TZ + αT). Then, α = 0 implies CAR (i.e., C ⊥ T|Z) and α ≠ 0 implies NCAR (i.e., C ⊥̸ T|Z). The resulting survival function for C is π(c|Z, T; η, α) = exp {−Λ₀(c) exp(η^TZ + αT)} with the baseline cumulative hazard $Λ_{0} (c) = \int_{0}^{c} h_{0} (u) d u$ . We propose the following estimating equation to estimate η for each pre-specified value α₀:

U (η; α_{0}, Z_{i}, Y_{i}) = \sum_{i = 1}^{n} (\frac{δ_{i}}{{\hat{π}}_{i}} - 1) g (Z_{i})

(6)

where π̂_i = π̂(Y_i|Z_i, T_i; η, α₀) = exp {−Λ̂₀(Y_i) exp(η^TZ_i+α₀Y_i)} and g(Z_i) is a known function that maps Z_i to ℛ^p with p as the dimension of η, e.g., g(Z_i) = Z_i, which is used in the current work. It can be shown that the set of estimation equations (6) is unbiased for any g(Z_i), i.e., E{U(η; α, Z_i, Y_i)} = 0, if ℳ₂ is correctly specified. The estimate η̂ can be obtained using a Newton-Raphson algorithm; however, the challenge is to estimate the cumulative baseline hazard Λ̂₀(C_i). Along the lines of Scharfstein et al. (1999), we derive the modified non-parametric profile estimator adjusted to our current set-up based on the fact that E[δπ(u|Z,η)/π(Y|Z,η)|Z, Y, C ≥ u] = 1. The estimation of η and the estimation of Λ₀(C_i) are conducted iteratively until convergence and the algorithm is provided in Web Appendix B. We then compute π̂ using η̂ obtained from the estimating equation (6). Subsequently, the predictive accuracy estimators, ${\hat{c}}_{g}^{1}$ and ${\hat{c}}_{g}^{2}$ , are obtained based on (4) and (5) respectively by plugging in π̂.

2.5 High-dimensional Data

We investigate the approaches for obtaining ${\hat{c}}_{g}^{1} (\hat{β}; t)$ and ${\hat{c}}_{g}^{2} (\hat{β}; t)$ for ℳ₁ in the presence of high-dimensional data Z_i = {Z_i₁, Z_i₂, ···, Z_ip}^T with a relatively large p or p >n. For the purpose of exposition, we consider the case where the censoring model ℳ₂ is a Cox PH model with π(c|Z,η) = exp {−Λ₀(c) exp(η^TZ)}, and η can be estimated by maximizing the partial likelihood function denoted by 𝓟 ℒ(η; c) in the presence of low-dimensional data. For the high-dimensional setting, we propose to fit ℳ₂ using the penalization approach.

2.5.1 The Case of CAR

In the presence of high-dimensional covariates, we consider the use of the lasso and the elastic net penalties to obtain censoring weights. The penalized parameter estimates for the censoring model ℳ₂ are defined by

\hat{η} = {argmin}_{η} {- P L (η; c) + λ P (η; k)}

(7)

where $P (η; k) = P_{L_{1}} (η; k) + P_{L_{2}} (η; k) = k \sum_{i = 1}^{p} ∣ η_{i} ∣ + (1 - k) \sum_{i = 1}^{p} η_{i}^{2}$ , tuning parameter λ >0 indicates the amount of penalization, and k ∈(0, 1] quantifies the control between sparsity and group effect. When k = 1, P(η; k) becomes the lasso penalty (ℒ₁). We use the R functions, cv.glmnet and glmnet, to fit the penalized Cox PH model for censoring time, where the optimal λ is selected based on cross-validation. In particular, two approaches are considered for η̂ in ℳ₂. In the first approach, we obtain η̂ directly from the penalized Cox PH model; in the second approach, we use the penalized Cox PH model to conduct variable selection among Z and then fit a standard Cox PH model using the selected predictors. Using η̂, we first compute π̂(c|Z, η̂), and subsequently ${\hat{c}}_{g}^{1} (\hat{β}; t)$ and ${\hat{c}}_{g}^{2} (\hat{β}; t)$ .

2.5.2 The Case of NCAR

In the case of NCAR, it is not trivial to extend the sensitivity analysis in Section 2.4 to handle high-dimensional covariates. One intuitive two-step approach is as follows. The first step is to fit a regularized regression for the censoring model as described in Section 2.5.1 to select a subset of predictors that are potentially associated with censoring time. The second step is to conduct the sensitivity analysis proposed in Section 2.4 by utilizing only the variables selected from the first step, that is, compute ${\hat{c}}_{g}^{1} (\hat{β}; t)$ and ${\hat{c}}_{g}^{2} (\hat{β}; t)$ for a range of specified α values where the censoring weight π̂ is obtained using the subset of predictors selected from the first step. This approach is straightforward and can be implemented using existing software, which is adopted for the motivating data analysis.

An alternative approach is based on penalized estimating equations (Fu, 2003; Johnson et al., 2008; Johnson, 2008), allowing for simultaneous variable selection and parameter estimation. Given the proposed estimating equation (7) with a pre-specified value of α as α₀, we can have $U^{P} (η; α_{0}, Z_{i}, Y_{i}) = \sum_{i = 1}^{n} (\frac{δ_{i}}{{\hat{π}}_{i}} - 1) g (Z_{i}) - n λ \dot{P} (η; α_{0})$ , where $\dot{P} (η; α_{0}) = \frac{\partial P (η; α_{0})}{\partial η}$ is the derivative of the penalized estimating equations with respect to η. The validity of this approach in the settings of our interest will be pursued for future research.

3. Data Analysis

The prostate cancer study includes 108 patients with a censoring rate of 61.1%. The observed failure times with the range of 2–145 months are shown in Figure 1. PSA and all gene expression values are standardized to have mean 0 and unit standard derivation, and Gleason scores are dichotomized into binary categories as less than 7 or not (33 patients with Gleason scores <7 and 75 patients with Gleason scores ≥7). In the exploratory analysis, the genomic biomarkers are ranked and labeled according to the score-test p–values based on univariate Cox PH models for the failure time from the lowest to the highest, which is used as a pre-screening step in our analysis (Fan and Lv, 2008; Zhao and Li, 2012). In addition, the score-test p–values for PSA and Gleason scores for the failure time are 0.13 and 0.01 respectively, and those for the censoring time are 0.87 and 0.32 respectively. Here, we consider both possible censoring mechanisms, CAR and NCAR, for assessing predictive accuracy of prediction models of interest. Of note, the Lasso Cox PH model for the censoring time select 25 genomic biomarkers with non-zero parameter estimates.

The KM curves for the failure time and censoring time in the prostate cancer study

Under CAR, we evaluate five prediction models: 1) Cox PH model including PSA and Gleason (Model I); 2) Cox PH model including PSA, Gleason and biomarkers 2, 4, 8, 11, 14, 16, 22, 31, 46, 52, 63 selected in Long et al. (2011) using an earlier version of the data (Model II); 3) Cox PH model including PSA, Gleason and biomarkers 2, 3, 5, 6, 7, 13, 15, 26, 32, 43, 56, 83 selected from all 1,536 biomarkers based on Lasso Cox PH (Model III); 4) Cox PH model including PSA, Gleason and biomarkers 2, 5, 6, 7, 10, 13, 15 selected from the top 25 biomarkers based on Lasso Cox PH (Model IV); 5) Cox PH model including PSA, Gleason and biomarkers 2, 5, 6, 7, 10 selected from the top 10 biomarkers based on Lasso Cox PH (Model V). We adopt the following procedures for assessing predictive accuracy.

The observed data are randomly split into two datasets, a training data 𝓓 _train (50%) and a testing data 𝓓 _test (50%).
The prediction model of interest is fitted using the 𝓓 _train, and then the parameter estimates are used to compute the predictive risk scores for 𝓓 _test.
The censoring model is fitted using the whole data, and the estimated parameters are used for 𝓓 _test to obtain censoring weights.
The predictive accuracy estimated by ${\hat{c}}_{s}^{2}$ and ${\hat{c}}_{g}^{2}$ for each model are calculated using 𝓓 _test.

We repeat the above procedures 1,000 times, and report the average and standard deviation (SD) of the estimates for c(t) with t as the 25% percentile (16-month), the 75% percentile (68-month) and the maximum (164-month) of the follow-up time. The results under the assumption of CAR are summarized in Table 1. Two approaches of estimation of censoring weights are considered, Lasso-Cox1 and Lasso-Cox2, where Lasso-Cox1 estimates the censoring weights from a Cox PH model using the selected variables based on a Lasso Cox PH model (Tibshirani, 1997) for censoring time; Lasso-Cox2 utilizes censoring weights estimated directly from a Lasso Cox PH model (Zou, 2006; Yang and Zou, 2013). For all three truncated time points (16-month, 68-month and 164-month), Model III always has the highest estimates of predictive accuracy and is followed by Model IV, Model V, Model I and Model II across different methods for censoring weights. Also, the standard deviations from Model II are strictly the largest, representing the worst performance among all models. Of note, the predictive biomarkers in Model II (Long et al., 2011) were initially selected from an early version of the prostate cancer data and our estimates accounting for CAR are higher than what was reported in the previous work (Long et al., 2011). On the other hand, there still exist various degrees of differences between ${\hat{c}}_{s}^{2}$ and ${\hat{c}}_{g}^{2}$ , showing the necessity to adjust for the censoring mechanism. In addition, the biomarkers are correlated with mean pairwise correlation 0.50 and median pairwise correlation 0.53; we repeat the assessment using the elastic net penalty for obtaining censoring weights, and find that Model III remains the best prediction model with the highest predicted accuracy estimates (the results are not shown).

Table 1.

Summary of predictive accuracy estimates for five models of interest. Lasso-Cox1: censoring weights are estimated from a Cox PH model using the selected variables based on a Lasso-Coxph model. Lasso-Cox2: censoring weights are estimated directly from a Lasso-Cox model.

Model

{\hat{c}}_{s}^{2} (S D)

{\hat{c}}_{g}^{2} (S D)

{\hat{c}}_{s}^{2} (S D)

{\hat{c}}_{g}^{2} (S D)

{\hat{c}}_{s}^{2} (S D)

{\hat{c}}_{g}^{2} (S D)

t_Q₂₅=16(month)

t_Q₇₅=68(month)

t_MAX=164(month)

Lasso-Cox1

Model I

0.599(0.053)

0.599(0.054)

0.587(0.049)

0.590(0.052)

0.588(0.050)

0.589(0.056)

Model II

0.561(0.097)

0.540(0.068)

0.541(0.068)

0.539(0.065)

0.543(0.068)

Model III

0.723(0.072)

0.723(0.071)

0.705(0.056)

0.707(0.057)

0.704(0.055)

0.710(0.058)

Model IV

0.683(0.070)

0.680(0.071)

0.666(0.055)

0.672(0.057)

0.666(0.054)

0.668(0.059)

Model V

0.670(0.075)

0.665(0.076)

0.638(0.058)

0.638(0.060)

0.641(0.057)

0.638(0.061)

Lasso-Cox2

Model I

0.599(0.053)

0.577(0.103)

0.587(0.049)

0.545(0.084)

0.588(0.050)

0.546(0.084)

Model II

0.561(0.097)

0.550(0.124)

0.540(0.068)

0.528(0.098)

0.539(0.065)

0.528(0.098)

Model III

0.723(0.072)

0.726(0.095)

0.705(0.056)

0.715(0.081)

0.704(0.055)

0.715(0.081)

Model IV

0.683(0.070)

0.669(0.101)

0.666(0.055)

0.663(0.080)

0.666(0.054)

0.663(0.080)

Model V

0.670(0.075)

0.600(0.116)

0.638(0.058)

0.579(0.089)

0.641(0.057)

0.580(0.088)

Open in a new tab

Model I: only PSA and Gleason;

Model II: PSA, Gleason and biomarkers 2,4,8,11,14,16,22,31,46,52,63 based on Long et al. (2011);

Model III: PSA, Gleason and biomarkers 2,3,5,6,7,13,15,26,32,43,56,83 from all 1536 biomarkers;

Model IV: PSA, Gleason and biomarkers 2,5,6,7,10,13,15 from the top 25 biomarkers;

Model V: PSA, Gleason and biomarkers 2,5,6,7,10 from the top 10 biomarkers;

To further assess predictive accuracy for Models I–V under NCAR, we conduct a sensitivity analysis using the first approach described in Section 2.4 to compute ${\hat{c}}_{g}^{2}$ with different values of the sensitivity parameter α. Based on earlier investigation, the top 100 biomarkers are included in estimation of censoring weights for sensitivity analysis, and three biomarkers are initially selected based on lasso Cox PH model. Figure 2 presents the result of sensitivity analysis with α₀ = −0.02, −0.015, −0.01, −0.005, 0, 0.005, 0.01, 0.015, 0.02; ${\hat{c}}_{g}^{2}$ ranges from 0.47–0.74 for Model I, 0.41–0.58 for Model II, 0.66–0.75 for Model III, 0.47–0.72 for Model IV, and 0.41–0.65 for Model V. There is a roughly decreasing trend for all estimators as the specified sensitivity parameter α₀ increases. While the impact of α on predictive accuracy measures is not negligible, the estimated predictive accuracy for Model III is always the highest except for α₀ = 0.005 but still close to the highest value, further lending support to the superior prediction performance of Model III over the other alternative models.

Sensitivity analysis for the Prostate Cancer study.

4. Simulation Studies

We conduct extensive simulations to evaluate the four estimators of the predictive metric c(t), namely, ${\hat{c}}_{s}^{1}, {\hat{c}}_{s}^{2}, {\hat{c}}_{g}^{1}$ and ${\hat{c}}_{g}^{2}$ , in settings with low-dimensional and high-dimensional covariates under CAR and NCAR. To benchmark the bias of the four estimators, we compute the “true” predictive accuracy metric with a large sample size of 1, 000, 000 given by

c_{true} (β_{0}; t) = \frac{\sum_{i \neq j} I (S_{i} < S_{j}, T_{i} < T_{j}, T_{i} \leq t)}{\sum_{i \neq j} I (T_{i} < T_{j}, T_{i} \leq t)}

(8)

where the underlying true failure times and the calculated predicted scores S using the true parameter β₀ are used. Note that c_true(β₀; t) cannot be calculated in real data. The failure time T and the censoring time C are generated from Cox-exponential models (Bender et al., 2005). Next, we will focus on the cases of CAR, and the simulation settings and corresponding results under NCAR are provided in the Supplementary Materials.

First, in the settings with low-dimensional covariates, Z = (Z₁, Z₂, Z₃)^T, T and C are assumed to follow exponential distributions with hazard functions given by

\begin{array}{l} h (T ∣ Z) = λ_{0 T} exp (β^{T} Z) = λ_{0 T} exp (β_{1} Z_{1} + β_{2} Z_{2} + β_{3} Z_{3}) \\ h (C ∣ Z) = λ_{0 C} exp (η^{T} Z) = λ_{0 C} exp (η_{1} Z_{1} + η_{2} Z_{2} + η_{3} Z_{3}) \end{array}

(9)

where the constant baseline hazards λ₀_T set as 0.1 or 0.6 and λ₀_C = 0.1, the parameters β = (β₁, β₂, β₃)^T = (−1, 0.5, 2)^T and η = (η₁, η₂, η₃)^T = (1, 1, 1)^T, and the covariates Z₁ ~ Unif[0, 1], Z₂ ~ Unif [−1, 1] and Z₃ ~ N (0, 1). The event rates denoted by ζ are around 0.3 (λ₀_T = 0.1) and 0.6 (λ₀_T = 0.6). We evaluate two prediction models, the correctly specified Cox PH model including all three covariates of Z₁, Z₂ and Z₃ and an incorrectly specified AFT model with the log-logistic distribution including all three covariates of Z₁, Z₂ and Z₃. Hence, there are four combinations of the prediction model for survival time (ℳ₁) and the working model for censoring time (ℳ₂), namely, Cox(ℳ₁)-Cox(ℳ₂), Cox(ℳ₁)-AFT(ℳ₂), AFT(ℳ₁)-Cox(ℳ₂), and AFT(ℳ₁)-AFT(ℳ₂). The impact of mis-specification of the censoring models on predictive accuracy estimation will be investigated.

Second, in the settings with p = 10 or 100 covariates, Z, T and C are also generated from exponential distributions with the following hazard functions

\begin{array}{l} h (T ∣ Z) = λ_{0 T} exp (β^{T} Z) = λ_{0 T} exp (β_{1} Z_{1} + 0 \times Z_{2} + β_{3} Z_{3} + 0 \times Z_{4} + β_{5} Z_{5} + \dots) \\ h (C ∣ Z) = λ_{0 C} exp (η^{T} Z) = λ_{0 C} exp (η_{1} Z_{1} + 0 \times Z_{2} + η_{3} Z_{3} + 0 \times Z_{4} + η_{5} Z_{5} + \dots) \end{array}

(10)

where the constant baseline hazards λ₀_T as 0.05 or 2 and λ₀_C = 0.2, and the parameters β = (−1, 0, 0.5, 0, 2, 0, ···, 0)^T and η = (1, 0, 1, 0, 1, 0, ···, 0)^T. The covariates Z are generated from a multivariate standard normal distribution MV N(0, Σ), where Σ is an identity matrix (independence among predictors) or an exchangeable matrix with a correlation coefficient ρ = 0.8 (high pairwise-correlation among predictors) only in the setting with p = 100. The event rates ζ are about 0.3 (λ₀_T = 0.05) and 0.8 (λ₀_T = 2). In these settings, we evaluate the prediction performance of the correctly specified Cox PH model including Z₁, Z₃ and Z₅.

Each simulated data set is equally divided into two subsets 𝓓 _train and 𝓓 _test with sample size n = 50, 100, 500. The observed survival time is defined as Y_i = min(T_i, C_i) and the event indicator is defined as δ_i = I(T_i ≤ C_i) (i = 1, 2, …, n). The estimated prediction models are obtained from 𝓓 _train, and the predictive scores are calculated from 𝓓 _test. The censoring weights are estimated using the whole data 𝓓 because the censoring mechanism is the same in 𝓓 _train and 𝓓 _test which are obtained by random splitting. In particular, in the first scenario, all three covariates Z = (Z₁, Z₂, Z₃)^T are included in the Cox PH model or AFT model for the censoring times; while for the second scenario, three approaches are used to obtain the censoring weights, Cox representing that the censoring weights are obtained from a Cox PH model, and Lasso-Cox1 and Lasso-Cox2 are the same as describe in Section 3. Thereafter, we calculate the four predictive accuracy estimators defined in Section 2 and their differences compared with c_true(β₀; t). Note that the truncated time t is chosen as 20, which is above 75% percentile of the failure times. To summarize simulation results, 1,000 Monte Carlo data sets are generated for each setting. For each estimator, we report the absolute value of Monte Carlo mean bias denoted by $Δ {\hat{c}}_{s}^{1}, Δ {\hat{c}}_{s}^{2}, Δ {\hat{c}}_{g}^{1}$ and $Δ {\hat{c}}_{g}^{2}$ with Monte Carlo SD in parentheses.

Table 2 presents the simulation results for the setting with p = 3. In all cases, ${\hat{c}}_{g}^{2}$ performs the best due to the smallest bias among all the estimators. In particular, when the censoring model is correctly specified, $Δ {\hat{c}}_{g}^{2}$ is negligible. ${\hat{c}}_{g}^{2}$ is fairly robust against the mis-specification of the censoring model as an AFT model when the censoring rate is not high. $Δ {\hat{c}}_{g}^{2}$ becomes slightly larger as the censoring rate and sample size increase. In addition, ${\hat{c}}_{s}^{1}$ and ${\hat{c}}_{g}^{1}$ are less efficient than ${\hat{c}}_{s}^{2}$ and ${\hat{c}}_{g}^{2}$ , likely as a result of ${\hat{c}}_{s}^{1}$ and ${\hat{c}}_{g}^{1}$ using less information compared to ${\hat{c}}_{s}^{2}$ and ${\hat{c}}_{g}^{2}; Δ {\hat{c}}_{g}^{1}$ and $Δ {\hat{c}}_{g}^{2}$ are smaller than $Δ {\hat{c}}_{s}^{1}$ and $Δ {\hat{c}}_{s}^{2}$ respectively, likely as a result of ${\hat{c}}_{g}^{1}$ and ${\hat{c}}_{g}^{2}$ using information from the censoring model in estimation.

Table 2.

Simulation results for the settings with p = 3 and t = 50.

ℳ₁

ℳ₂

Δ {\hat{c}}_{s}^{1} (S D)

Δ {\hat{c}}_{s}^{2} (S D)

Δ {\hat{c}}_{g}^{1} (S D)

Δ {\hat{c}}_{g}^{2} (S D)

c_true

Cox

0.3

Cox

0.076(0.072)

0.020(0.049)

0.062(0.092)

0.000(0.062)

0.834

AFT^‡

0.069(0.075)

0.003(0.050)

500

Cox

0.068(0.020)

0.027(0.014)

0.040(0.040)

0.003(0.015)

AFT^‡

0.054(0.019)

0.011(0.013)

0.6

Cox

0.035(0.042)

0.010(0.035)

0.023(0.045)

0.001(0.034)

0.830

AFT^‡

0.029(0.042)

0.002(0.034)

500

Cox

0.031(0.012)

0.013(0.010)

0.010(0.016)

0.001(0.010)

AFT^‡

0.020(0.012)

0.000(0.010)

AFT^†

0.3

Cox

0.076(0.072)

0.020(0.049)

0.063(0.092)

0.000(0.061)

0.834

AFT^‡

0.069(0.076)

0.003(0.050)

500

Cox

0.068(0.020)

0.027(0.014)

0.041(0.040)

0.003(0.027)

AFT^‡

0.055(0.020)

0.010(0.009)

0.6

Cox

0.035(0.042)

0.010(0.034)

0.024(0.046)

0.001(0.034)

0.830

AFT^‡

0.029(0.042)

0.002(0.033)

500

Cox

0.031(0.012)

0.013(0.011)

0.010(0.016)

0.001(0.010)

AFT^‡

0.020(0.012)

0.000(0.010)

Open in a new tab

Note that both true underlying models for survival time and censoring time are Cox PH models; $Δ {\hat{c}}_{s}^{1} (S D)$ and $Δ {\hat{c}}_{s}^{2} (S D)$

are the same when ℳ₂ is Cox or AFT because they do not require the estimation of censoring weights.

^†

Mis-specifying the model for survival time as an AFT model with the log-logistic distribution.

^‡

Mis-specifying the model for censoring time as an AFT model with the log-logistic distribution.

Table 3 shows the simulation results for the settings with p = 10. When the approaches of Cox and Lasso-Cox1 are used to obtain the censoring weights, ${\hat{c}}_{g}^{2}$ performs better than the other estimators, followed by ${\hat{c}}_{s}^{2}, {\hat{c}}_{g}^{1}$ and ${\hat{c}}_{s}^{1}$ , and its bias is negligible. The standard deviation of $Δ {\hat{c}}_{g}^{2}$ is slightly larger than $Δ {\hat{c}}_{s}^{2}$ when the event rate is 0.3, but the difference attenuates towards 0 as the event rate increases to 0.8; unsurprisingly, the standard deviation of $Δ {\hat{c}}_{s}^{2}$ and $Δ {\hat{c}}_{g}^{2}$ are lower than $Δ {\hat{c}}_{s}^{1}$ and $Δ {\hat{c}}_{g}^{1}$ respectively. In addition, ${\hat{c}}_{g}^{2}$ achieves similar performance when the approaches of Cox and Lasso-Cox1 are used to obtain the censoring weights but its performance deteriorates when the approach of Lasso-Cox2 is used, suggesting that the approaches of Cox and Lasso-Cox1 are preferred in these settings.

Table 3.

Simulation results for the settings with p = 10 and t = 20. Cox: censoring weights are estimated from a Cox PH model. Lasso-Cox1: censoring weights are estimated from a Cox PH model using the selected variables based on a Lasso Cox PH model. Lasso-Cox2: censoring weights are estimated directly from a Lasso Cox PH model.

ℳ₂

Δ {\hat{c}}_{s}^{1} (S D)

Δ {\hat{c}}_{s}^{2} (S D)

Δ {\hat{c}}_{g}^{1} (S D)

Δ {\hat{c}}_{g}^{2} (S D)

c_true

0.3

Cox

0.088(0.074)

0.021(0.046)

0.055(0.095)

0.002(0.056)

0.858

Lasso-Cox1

0.051(0.093)

0.002(0.054)

Lasso-Cox2

0.102(0.082)

0.029(0.050)

100

Cox

0.082(0.046)

0.024(0.029)

0.034(0.064)

0.005(0.035)

Lasso-Cox1

0.029(0.062)

0.006(0.032)

Lasso-Cox2

0.097(0.052)

0.032(0.033)

500

Cox

0.077(0.020)

0.027(0.013)

0.016(0.031)

0.006(0.015)

Lasso-Cox1

0.013(0.031)

0.008(0.014)

Lasso-Cox2

0.092(0.022)

0.036(0.014)

0.8

Cox

0.026(0.034)

0.005(0.029)

0.013(0.036)

0.002(0.029)

0.846

Lasso-Cox1

0.009(0.033)

0.002(0.028)

Lasso-Cox2

0.040(0.048)

0.010(0.049)

100

Cox

0.024(0.023)

0.007(0.021)

0.004(0.022)

0.000(0.019)

Lasso-Cox1

0.004(0.023)

0.001(0.020)

Lasso-Cox2

0.036(0.028)

0.011(0.036)

500

Cox

0.022(0.010)

0.008(0.009)

0.002(0.010)

0.000(0.008)

Lasso-Cox1

0.003(0.010)

0.001(0.008)

Lasso-Cox2

0.033(0.011)

0.013(0.009)

Open in a new tab

Note that $Δ {\hat{c}}_{s}^{1} (S D)$ and $Δ {\hat{c}}_{s}^{2} (S D)$ are the same for Cox, Lasso-Cox1 and Lasso-Cox2 becasue they do not require the estimation of censoring weights.

Table 4 presents the simulation results with focus on IPCW c-statistic for the moderately high-dimensional setting with p = 100 covariates and n = 50 or 100. Two correlation structures are considered for the covariates, independence with ρ = 0 and pairwise correlation with ρ = 0.8. In these settings, the approach of Cox for estimating censoring weights is no longer applicable, thus the approaches of Lasso-Cox1 and Lasso-Cox2 are used. For the cases with ρ = 0.8, the elastic net is also considered for obtaining censoring weights to deal with high correlation among high-dimensional predictors, where Elastic-Net-Cox1 uses a Cox PH model including only covariates selected by an Elastic-net Cox PH model for censoring time with the weight of lasso versus ridge as k = 0.2 and Elastic-Net-Cox2 estimates censoring weights directly using an Elastic-net Cox PH model for censoring time. Similar to the findings in Table 3, for the scenarios with ρ = 0, $Δ {\hat{c}}_{g}^{2}$ using the approach of Lasso-Cox2 is larger compared to the approach of Lasso-Cox1, and the difference becomes less pronounced as the event rate increases. Still, ${\hat{c}}_{g}^{2}$ based on Lasso-Cox1 shows negligible bias and achieves the best performance. In the presence of high correlation (i.e., ρ = 0.8) among predictors, Elastic-Net-Cox1 performs the best evidenced by the smallest values of $Δ {\hat{c}}_{g}^{1}$ and $Δ {\hat{c}}_{g}^{2}$ compared with the other approaches for censoring weights, but Lasso-Cox1 also performs satisfactory because of its small or negligible bias when the sample size is as large as 100. Interestingly, when the event rate is high (i.e., ζ = 0.8) and the sample size is 100, all estimates, based on lasso or elastic net, are comparable in terms of bias. The results in Table 4 show that Lasso-Cox1 achieves the best or close to the best performance for n = 100 even when the correlation among variables is high. The simulation results for the cases with p = 1, 000 (not shown here) are similar to Table 4.

Table 4.

Simulation results for settings with p = 100 and t = 20. Z has an exchangeable correlation structure with ρ. Lasso-Cox1: censoring weights are estimated from a Cox PH model using the selected variables based on a Lasso Cox PH model; Lasso-Cox2: censoring weights are estimated directly from a Lasso Cox PH model; Elastic-Net-Cox1: censoring weights are estimated from a Cox PH model using the selected variables based on an Elastic-net Cox PH model; Elastic-Net-Cox2: censoring weights are estimated directly from an Elastic-net Cox PH model.

ℳ₂

Δ {\hat{c}}_{g}^{1} (S D)

Δ {\hat{c}}_{g}^{2} (S D)

c_true

0.3

Lasso-Cox1

0.059(0.092)

0.003(0.062)

0.860

Lasso-Cox2

0.107(0.085)

0.024(0.053)

100

Lasso-Cox1

0.037(0.064)

0.003(0.041)

Lasso-Cox2

0.099(0.054)

0.031(0.034)

0.8

Lasso-Cox1

0.014(0.035)

0.002(0.040)

0.847

Lasso-Cox2

0.046(0.090)

0.017(0.086)

100

Lasso-Cox1

0.007(0.023)

0.001(0.020)

Lasso-Cox2

0.042(0.063)

0.013(0.057)

0.8

0.3

Lasso-Cox1

0.009(0.049)

0.008(0.052)

0.915

Lasso-Cox2

0.018(0.054)

0.006(0.050)

Elastic-Net-Cox1

0.006(0.050)

0.003(0.053)

Elastic-Net-Cox2

0.018(0.054)

0.007(0.047)

100

Lasso-Cox1

0.012(0.033)

0.006(0.035)

Lasso-Cox2

0.016(0.034)

0.012(0.028)

Elastic-Net-Cox1

0.009(0.035)

0.002(0.037)

Elastic-Net-Cox2

0.016(0.034)

0.011(0.028)

0.8

Lasso-Cox1

0.004(0.027)

0.012(0.027)

0.903

Lasso-Cox2

0.010(0.033)

0.002(0.059)

Elastic-Net-Cox1

0.004(0.026)

0.002(0.030)

Elastic-Net-Cox2

0.010(0.028)

0.002(0.050)

100

Lasso-Cox1

0.003(0.018)

0.009(0.019)

Lasso-Cox2

0.009(0.019)

0.004(0.020)

Elastic-Net-Cox1

0.003(0.018)

0.003(0.019)

Elastic-Net-Cox2

0.009(0.019)

0.004(0.021)

Open in a new tab

Additional simulations are conducted to investigate the impacts of NCAR on predictive accuracy metric estimation for the settings with low- and high-dimensional covariates, focusing on ${\hat{c}}_{g}^{2}$ ; the detailed information is provided in Appendix C of the Supplementary Materials. Our simulation results in these settings show that the bias $Δ {\hat{c}}_{g}^{2}$ increases as the sensitivity parameter α moves away from its true value, particularly, when α = 0 (i.e. CAR). For the same α value, the magnitude of biases seems to be comparable in low-dimensional and high-dimensional scenarios and higher event rates (i.e. ζ = 0.6) lead to decreased biases.

5. Discussion

Based on our results, we recommend using ${\hat{c}}_{g}^{2}$ and conducting sensitivity analyses to assess the potential impacts of the NCAR assumption for both low-dimensional and high-dimensional data. In the data example, the sensitivity analysis shows that the impact of the sensitivity parameter α is not negligible. In addition, our simulations show that the assumption of CAR can lead to biased estimation of predictive accuracy if CAR does not hold; the magnitude of biases, though, seems to be comparable under the settings of low-and high-dimensionality. Under CAR, ${\hat{c}}_{g}^{2}$ achieves the best performance among the estimators considered in settings with low-dimensional or high-dimensional data. Also, ${\hat{c}}_{g}^{2}$ is fairly robust to the mis-specification of the censoring model when the censoring rate is not high. In the presence of high-dimensional data, the approach of Lasso-Cox1, which obtains censoring weights from a Cox PH model including covariates selected by a Lasso Cox PH model for the censoring time, achieves satisfactory performance with small to negligible bias even in the presence of high correlation.

There are several topics for future research. First, in the presence of high-dimensional data under the NCAR assumption, the proposed sensitivity analysis based on regularized estimating equations for survival endpoints needs further investigation. Second, it is of potential interest to investigate time-dependent IPCW c–statistic under the NCAR assumption. Third, the proposed approaches for assessing prediction models with survival outcomes can be extended to other predictive accuracy metrics such as net reclassification index and integrated discrimination index.

Supplementary Material

Supp Info

NIHMS757525-supplement-Supp_Info.pdf^{(280.8KB, pdf)}

Acknowledgments

Wang’s research was partially supported by the start-up funding from the Department of Public Health Sciences at Pennsylvania State Hereshey Medical Center and a KL2 career grant supported by the National Center for Advancing Translational Sciences, Grant KL2 TR000126. Long’s research was partially supported by NIH/NCI grants R03CA173770 and R03CA183006.

Footnotes

Supplementary Materials

Web Appendices A, B and C referenced in Sections 2 and 4 are available with this paper at the Biometrics website on Wiley Online Library.

The content is solely the responsibility of the authors and does not represent the views of the NIH.

Contributor Information

Ming Wang, Email: mwang@phs.psu.edu, Department of Public Health Sciences, College of Medicine, Pennsylvania State University, Hershey, PA, U.S.A.

Qi Long, Email: qlong@emory.edu, Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, U.S.A.

References

Bender R, Augustin T, Blettner M. Generating survival times to simulate cox proportional hazard models. Statistics in Medicine. 2005;24:1713–1723. doi: 10.1002/sim.2059. [DOI] [PubMed] [Google Scholar]
Cox D. Regression models and life-tables. Journal of the Royal Statistical Society Series B. 1972;34:187–202. [Google Scholar]
D’Agostino R, Griffith J, Schmidt C, Terrin N. Measures of evaluating model performance. Proceedings of the Biometrics Section, American Statistical Association. 1997:253–258. [Google Scholar]
Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space (with discussion) Journal of the Royal Statistical Society, Series B. 2008:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Freedman A, Seminara D, Gail M, Hartge P, Colditz G, Ballard-Barbash R, Pfeiffer R. Cancer risk prediction models: a workshop on development, evaluation, and application. Journal of the National Cancer Institute. 2005;97:715–723. doi: 10.1093/jnci/dji128. [DOI] [PubMed] [Google Scholar]
Freedman A, Slattery M, Ballard-Barbash R, Willis G, Cann B, Pee D, Gail M, Pfeiffer R. Colorectal cancer risk prediction tool for white men and women without known susceptibility. Journal of Clinical Oncology. 2009;27:686–693. doi: 10.1200/JCO.2008.17.4797. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fu W. Penalized estimating equations. Biometrics. 2003;59:126–132. doi: 10.1111/1541-0420.00015. [DOI] [PubMed] [Google Scholar]
Gerds T, Kattan M, Schumacher M, Yu C. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Statistics in Medicine. 2013;32:2173–2184. doi: 10.1002/sim.5681. [DOI] [PubMed] [Google Scholar]
Gonen M, Heller G. Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005;92:965–970. [Google Scholar]
Harrell F, Lee K, Mark D. Tutorial in biostatistics: multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine. 1996;15:361–387. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]
Heagerty P, Zheng Y. Survival model predictive accuracy and roc curves. Biometrics. 2005:92–105. doi: 10.1111/j.0006-341X.2005.030814.x. [DOI] [PubMed] [Google Scholar]
Heitjan D, Rubin D. Ignorability and coarse data. The Annals of Statistics. 1991;19:1681–2284. [Google Scholar]
Hoerl A, Kennard R. Ridge regression: Applications to nonorthogonal problems. Technometrics. 1970;12:69–82. [Google Scholar]
Johnson BA. Variable selection in semiparametric linear regression with censored data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2008;70:351–370. [Google Scholar]
Johnson BA, Lin D, Zeng D. Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Association. 2008;103:672–680. doi: 10.1198/016214508000000184. [DOI] [PMC free article] [PubMed] [Google Scholar]
Long Q, Chung M, Moreno C, Johnson B. Risk prediction for prostate cancer recurrence through regularized estimation with simultaneous adjustment for nonlinear clinical effects. Annals of Applied Statistics. 2011;5:2003–2023. doi: 10.1214/11-aoas458. [DOI] [PMC free article] [PubMed] [Google Scholar]
Long Q, Johnson B, Osunkoya A, Lai Y, Zhou W, Abramovitz M, Xia M, Bouzyk M, Nam R, Sugar L, Stanimirovi A, Leyland-Jones B, Seth A, Petros J, Moreno C. Protein-coding and microrna biomarkers of recurrence of prostate cancer following radical prostatectomy. American Journal of Pathology. 2011;179:46–54. doi: 10.1016/j.ajpath.2011.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Long Q, Zhang X, Johnson BA. Robust estimation of area under roc curve using auxiliary variables in the presence of missing biomarker values. Biometrics. 2011;67:559–567. doi: 10.1111/j.1541-0420.2010.01487.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Optenberg S, Clark J, Brawer M, Thompson I, Stein C, Friedrichs P. Development of a decision-making tool to predict risk of prostate cancer: the cancer of the prostate risk index (capri) test. Urology. 1997;50:665–672. doi: 10.1016/S0090-4295(97)00451-2. [DOI] [PubMed] [Google Scholar]
Pencina MJ, D’Agostino RB. Overall c as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Statistics in medicine. 2004;23:2109–2123. doi: 10.1002/sim.1802. [DOI] [PubMed] [Google Scholar]
Scharfstein D, Rotnitzky A, Robins J. Adjusting for nonignorable dropout using semiparametric nonresponse models. Journal of the American Statistical Association. 1999;94:1135–1146. [Google Scholar]
Siannis F, Copas J, Lu G. Sensitivity analysis for informative censoring in parametric survival models. Biostatistics. 2005;6:77–91. doi: 10.1093/biostatistics/kxh019. [DOI] [PubMed] [Google Scholar]
Steyerberg E, Vickers A, Cook N, Gerds T, Gonen M, Obuchowski N, Pencina M, Kattan M. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21:128–138. doi: 10.1097/EDE.0b013e3181c30fb2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tibshirani R. The lasso method for variable selection in the cox model. Statistics in Medicine. 1997;16:385–395. doi: 10.1002/(sici)1097-0258(19970228)16:4<385::aid-sim380>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]
Uno H, Cai T, Pencina M, D’Agostino R, Wei J. On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine. 2011;30:1105–1117. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang Y, Zou H. A cocktail algorithm for solving the elastic net penalized cox’s regression in high dimensions statistics and its interface. Statistics and Its Interface. 2013:167–173. [Google Scholar]
Zeng D, et al. Estimating marginal survival function by adjusting for dependent censoring using many covariates. The Annals of Statistics. 2004;32:1533–1555. [Google Scholar]
Zhao SD, Li Y. Principled sure independence screening for cox models with ultra-high-dimensional covariates. Journal of Multivariate Analysis. 2012:397–411. doi: 10.1016/j.jmva.2011.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]
Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B. 2005;67:301–320. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Info

NIHMS757525-supplement-Supp_Info.pdf^{(280.8KB, pdf)}

[R1] Bender R, Augustin T, Blettner M. Generating survival times to simulate cox proportional hazard models. Statistics in Medicine. 2005;24:1713–1723. doi: 10.1002/sim.2059. [DOI] [PubMed] [Google Scholar]

[R2] Cox D. Regression models and life-tables. Journal of the Royal Statistical Society Series B. 1972;34:187–202. [Google Scholar]

[R3] D’Agostino R, Griffith J, Schmidt C, Terrin N. Measures of evaluating model performance. Proceedings of the Biometrics Section, American Statistical Association. 1997:253–258. [Google Scholar]

[R4] Fan J, Lv J. Sure independence screening for ultrahigh dimensional feature space (with discussion) Journal of the Royal Statistical Society, Series B. 2008:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Freedman A, Seminara D, Gail M, Hartge P, Colditz G, Ballard-Barbash R, Pfeiffer R. Cancer risk prediction models: a workshop on development, evaluation, and application. Journal of the National Cancer Institute. 2005;97:715–723. doi: 10.1093/jnci/dji128. [DOI] [PubMed] [Google Scholar]

[R6] Freedman A, Slattery M, Ballard-Barbash R, Willis G, Cann B, Pee D, Gail M, Pfeiffer R. Colorectal cancer risk prediction tool for white men and women without known susceptibility. Journal of Clinical Oncology. 2009;27:686–693. doi: 10.1200/JCO.2008.17.4797. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Fu W. Penalized estimating equations. Biometrics. 2003;59:126–132. doi: 10.1111/1541-0420.00015. [DOI] [PubMed] [Google Scholar]

[R8] Gerds T, Kattan M, Schumacher M, Yu C. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Statistics in Medicine. 2013;32:2173–2184. doi: 10.1002/sim.5681. [DOI] [PubMed] [Google Scholar]

[R9] Gonen M, Heller G. Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005;92:965–970. [Google Scholar]

[R10] Harrell F, Lee K, Mark D. Tutorial in biostatistics: multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine. 1996;15:361–387. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]

[R11] Heagerty P, Zheng Y. Survival model predictive accuracy and roc curves. Biometrics. 2005:92–105. doi: 10.1111/j.0006-341X.2005.030814.x. [DOI] [PubMed] [Google Scholar]

[R12] Heitjan D, Rubin D. Ignorability and coarse data. The Annals of Statistics. 1991;19:1681–2284. [Google Scholar]

[R13] Hoerl A, Kennard R. Ridge regression: Applications to nonorthogonal problems. Technometrics. 1970;12:69–82. [Google Scholar]

[R14] Johnson BA. Variable selection in semiparametric linear regression with censored data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2008;70:351–370. [Google Scholar]

[R15] Johnson BA, Lin D, Zeng D. Penalized estimating functions and variable selection in semiparametric regression models. Journal of the American Statistical Association. 2008;103:672–680. doi: 10.1198/016214508000000184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Long Q, Chung M, Moreno C, Johnson B. Risk prediction for prostate cancer recurrence through regularized estimation with simultaneous adjustment for nonlinear clinical effects. Annals of Applied Statistics. 2011;5:2003–2023. doi: 10.1214/11-aoas458. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Long Q, Johnson B, Osunkoya A, Lai Y, Zhou W, Abramovitz M, Xia M, Bouzyk M, Nam R, Sugar L, Stanimirovi A, Leyland-Jones B, Seth A, Petros J, Moreno C. Protein-coding and microrna biomarkers of recurrence of prostate cancer following radical prostatectomy. American Journal of Pathology. 2011;179:46–54. doi: 10.1016/j.ajpath.2011.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Long Q, Zhang X, Johnson BA. Robust estimation of area under roc curve using auxiliary variables in the presence of missing biomarker values. Biometrics. 2011;67:559–567. doi: 10.1111/j.1541-0420.2010.01487.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Optenberg S, Clark J, Brawer M, Thompson I, Stein C, Friedrichs P. Development of a decision-making tool to predict risk of prostate cancer: the cancer of the prostate risk index (capri) test. Urology. 1997;50:665–672. doi: 10.1016/S0090-4295(97)00451-2. [DOI] [PubMed] [Google Scholar]

[R20] Pencina MJ, D’Agostino RB. Overall c as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Statistics in medicine. 2004;23:2109–2123. doi: 10.1002/sim.1802. [DOI] [PubMed] [Google Scholar]

[R21] Scharfstein D, Rotnitzky A, Robins J. Adjusting for nonignorable dropout using semiparametric nonresponse models. Journal of the American Statistical Association. 1999;94:1135–1146. [Google Scholar]

[R22] Siannis F, Copas J, Lu G. Sensitivity analysis for informative censoring in parametric survival models. Biostatistics. 2005;6:77–91. doi: 10.1093/biostatistics/kxh019. [DOI] [PubMed] [Google Scholar]

[R23] Steyerberg E, Vickers A, Cook N, Gerds T, Gonen M, Obuchowski N, Pencina M, Kattan M. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21:128–138. doi: 10.1097/EDE.0b013e3181c30fb2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Tibshirani R. The lasso method for variable selection in the cox model. Statistics in Medicine. 1997;16:385–395. doi: 10.1002/(sici)1097-0258(19970228)16:4<385::aid-sim380>3.0.co;2-3. [DOI] [PubMed] [Google Scholar]

[R25] Uno H, Cai T, Pencina M, D’Agostino R, Wei J. On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine. 2011;30:1105–1117. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Yang Y, Zou H. A cocktail algorithm for solving the elastic net penalized cox’s regression in high dimensions statistics and its interface. Statistics and Its Interface. 2013:167–173. [Google Scholar]

[R27] Zeng D, et al. Estimating marginal survival function by adjusting for dependent censoring using many covariates. The Annals of Statistics. 2004;32:1533–1555. [Google Scholar]

[R28] Zhao SD, Li Y. Principled sure independence screening for cox models with ultra-high-dimensional covariates. Journal of Multivariate Analysis. 2012:397–411. doi: 10.1016/j.jmva.2011.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Zou H. The adaptive lasso and its oracle properties. Journal of the American Statistical Association. 2006;101:1418–1429. [Google Scholar]

[R30] Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B. 2005;67:301–320. [Google Scholar]

PERMALINK

Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic

Ming Wang

Qi Long

Summary

1. Introduction

1.1 A Prostate Cancer Study

1.2 Assessing Predictive Accuracy

2. Methodology

2.1 Predictive Accuracy Metric: Truncated c–statistic

2.2 Estimation of Truncated c–statistic

2.3 Asymptotic Properties of ${\hat{c}}_{g}^{2} (\hat{β}; t)$ under CAR

Theorem 2.1

2.4 Sensitivity Analysis under NCAR

2.5 High-dimensional Data

2.5.1 The Case of CAR

2.5.2 The Case of NCAR

3. Data Analysis

Figure 1.

Table 1.

Figure 2.

4. Simulation Studies

Table 2.

Table 3.

Table 4.

5. Discussion

Supplementary Material

Acknowledgments

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic

Ming Wang

Qi Long

Summary

1. Introduction

1.1 A Prostate Cancer Study

1.2 Assessing Predictive Accuracy

2. Methodology

2.1 Predictive Accuracy Metric: Truncated c–statistic

2.2 Estimation of Truncated c–statistic

2.3 Asymptotic Properties of c^g2(β^;t) under CAR

Theorem 2.1

2.4 Sensitivity Analysis under NCAR

2.5 High-dimensional Data

2.5.1 The Case of CAR

2.5.2 The Case of NCAR

3. Data Analysis

Figure 1.

Table 1.

Figure 2.

4. Simulation Studies

Table 2.

Table 3.

Table 4.

5. Discussion

Supplementary Material

Acknowledgments

Footnotes

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.3 Asymptotic Properties of ${\hat{c}}_{g}^{2} (\hat{β}; t)$ under CAR