Skip to main content
Biostatistics (Oxford, England) logoLink to Biostatistics (Oxford, England)
. 2014 Apr 11;15(4):757–773. doi: 10.1093/biostatistics/kxu010

Random survival forests for competing risks

Hemant Ishwaran 1,*, Thomas A Gerds 2, Udaya B Kogalur 3, Richard D Moore 4, Stephen J Gange 5, Bryan M Lau 5
PMCID: PMC4173102  PMID: 24728979

Abstract

We introduce a new approach to competing risks using random forests. Our method is fully non-parametric and can be used for selecting event-specific variables and for estimating the cumulative incidence function. We show that the method is highly effective for both prediction and variable selection in high-dimensional problems and in settings such as HIV/AIDS that involve many competing risks.

Keywords: AIDS, Brier score, Competing risks, C-index, Cumulative incidence function, Ensemble

1. Introduction

Individuals subject to competing risks are observed from study entry to the occurrence of the event of interest, a competing event, or often, before the individual can experience one of the events, that person is right censored. Formally, let Inline graphic be the event time for the Inline graphicth subject, Inline graphic, and let Inline graphic be the event type, Inline graphic, where Inline graphic. Let Inline graphic denote the censoring time for individual Inline graphic such that the actual time of event Inline graphic is unobserved and one only observes Inline graphic and the event indicator Inline graphic. When Inline graphic, the individual is said to be censored at Inline graphic; otherwise if Inline graphic, the individual is said to have an event of type Inline graphic at time Inline graphic. The observed data are Inline graphic where Inline graphic is a Inline graphic-dimensional vector of covariates.

We are interested in predicting events and in the discovery of risk factors. For the latter, we shall distinguish between risk factors for the cause-specific hazard and risk factors for the cumulative incidence. The cause-specific hazard function for event Inline graphic given covariates Inline graphic is

1.

Here, Inline graphic is the event-free survival probability function given Inline graphic. The cause-specific hazard function describes the instantaneous risk of event Inline graphic for subjects that currently are event-free. Factors found to change the instantaneous event risk are associated with the biological mechanism behind event Inline graphic. On the other hand, the probability that an event occurs in a specific time period, say Inline graphic, depends on the cause-specific hazards of the other events (Gray, 1988). The probability of an event is determined using the cumulative incidence function (CIF), defined as the probability of experiencing an event of type Inline graphic by time Inline graphic; i.e. Inline graphic. The CIF and cause-specific hazard function are related according to

1. (1.1)

Informally speaking, event Inline graphic can only occur for those surviving other risks. A covariate that reduces the cause-specific hazard of a competing risk increases the event-free survival probability and thereby indirectly increases the cumulative incidence of event Inline graphic. Thus, covariates found to change the Inline graphic-year risk of event Inline graphic (i.e. the cumulative incidence) are those that change the cause-specific hazard function of event Inline graphic and those that change the cause-specific hazard functions of the competing risks.

When the aim is to assist decision-making and for patient counseling we are interested in Inline graphic-year predictions and in finding covariates that affect the cumulative incidence. On the other hand, to understand and discuss treatment options for the biological mechanism that drives the risk of a specific event, we focus on the cause-specific hazard function.

In this paper, we propose a new approach to competing risks that builds on the framework of random survival forests (RSF) (Ishwaran and others, 2008), an extension of Breiman's random forests (Breiman, 2001) to right-censored survival settings. Our novel approach benefits from the many useful properties of forests and has following the important features: (a) it directly estimates the CIF; (b) it provides accurate prediction performance; (c) it models non-linear effects and interactions; (d) it can be used for event-specific selection of risk factors; (e) it can be used effectively in high-dimensional settings; and (f) it is free of model assumptions.

Section 2 describes the main parameters which we estimate by using ensembles. Section 3 describes the competing risks forest algorithm, introduces terminal node estimators used for constructing ensembles, and describes splitting rules for growing competing risk trees suitable for either cause-specific hazard or CIF inference. The prediction error for the proposed ensemble estimators and variable selection are discussed in Sections 4 and 5. Section 6 studies the performance of our method using synthetic data. In Section A of supplementary material available at Biostatistics online (http://www.biostatistics.oxfordjournals.org), we consider performance over a collection of well-known data sets. Section 7 utilizes RSF to identify event-specific variables using the Johns Hopkins HIV Clinical Cohort, a large database involving over 6000 HIV patients.

2. Parameters of interest

2.1. Expected number of life years lost and cause-Inline graphic mortality

In addition to estimating the CIF, we propose a 1D summary of the cumulative incidence referred to as the expected number of life years lost due to cause Inline graphic (Andersen, 2012). In right-censored data, it is not feasible to get a reliable estimate of the expected lifetime. Therefore, for a fixed time point Inline graphic we consider the restricted mean lifetime conditional on Inline graphic: Inline graphic. The truncation time point Inline graphic is chosen such that the probability of being uncensored at Inline graphic is bounded away from zero: Inline graphic. In practice, we will typically set Inline graphic in accordance with the observed follow-up period (see Section 3). We extend the notation of Andersen (2012) to the case with covariates and note the relation Inline graphic, which holds for all values Inline graphic and all Inline graphic. The expected number of years lost before time Inline graphic is

2.1.

Our summary value is Inline graphic, which the above shows equals the expected number of life years lost due to cause Inline graphic before time Inline graphic. We shall also call Inline graphic the cause-Inline graphic mortality.

2.2. Terminal node estimators

We describe non-parametric estimators of the event-free survival function, the cause-specific CIF, and mortality. The estimators are described here using the entire learning data set, but in implementation they are calculated within the terminal node of a RSF tree and then aggregated to form the ensemble (see Section 3.1).

Let Inline graphic denote the Inline graphic distinct and ordered event times from Inline graphic. Let Inline graphic be the number of type Inline graphic events at Inline graphic, and Inline graphic be the number of type Inline graphic events in Inline graphic. Define also Inline graphic, the total number of events occurring at time Inline graphic, Inline graphic, the total number of events occurring in Inline graphic, and Inline graphic, the number of individuals at risk (event-free and uncensored) just prior to Inline graphic. The Nelson–Aalen estimator for the cumulative event-specific hazard function Inline graphic is given by

2.2.

where Inline graphic. The Kaplan–Meier estimator for the event-free survival function is given by

2.2.

We use the Aalen–Johansen estimator (Aalen and Johansen, 1978) to estimate Inline graphic:

2.2.

The cause-Inline graphic mortality is estimated by Inline graphic. We set Inline graphic to be the largest observed time Inline graphic.

3. Competing risk forests

A RSF (Ishwaran and others, 2008) is an collection of randomly grown survival trees. Each tree is grown using an independent bootstrap sample of the learning data using random feature selection at each node. RSF trees are generally grown very deeply with many terminal nodes (the ends of the tree). Trees in competing risk forests are grown similarly. What differs are the splitting rules used to grow the tree (Section 3.3) and the estimated values calculated within the terminal nodes used to define the ensemble (Section 3.1).

To grow a competing risk forest, we highlight two conceptually different approaches:

  1. Separate competing risk trees are grown for each of the Inline graphic events in each bootstrap sample. The splitting rules used to grow the trees are event-specific.

  2. A single competing risk tree is grown in each bootstrap sample. The splitting rules are either event-specific, or combine event-specific splitting rules across the Inline graphic events.

The second approach is more efficient (especially for high-dimensional problems and large data settings), sufficient for most tasks, and what we do in this article. In the next subsections, we describe how to calculate various ensembles useful for competing risks and provide details of competing risk trees. The forest algorithm is then summarized in Section 3.4.

3.1. Event-specific ensembles

Let Inline graphic denote the learning data. As stated earlier, a RSF tree is grown using an independent bootstrap sample of the learning data. Let Inline graphic be the number of times case Inline graphic occurs in bootstrap sample Inline graphic. To define the CIF for the Inline graphicth tree, take a case's covariate Inline graphic and drop it down the tree. Let Inline graphic denote the indices for cases from the learning data whose covariates share the terminal node with Inline graphic. Denoting node-specific event counts by Inline graphic and the number at risk by Inline graphic, we define Inline graphic's CIF as

3.1.

where Inline graphic is Inline graphic's Kaplan–Meier estimate of event-free survival. The ensemble estimates of the CIF and the cause-Inline graphic mortality, respectively, equal

3.1.

For reporting an internal error rate, we use out-of-bag (OOB) ensembles. By standard bootstrap theory, each bootstrap sample leaves out approximately 37% of the data. The OOB data are used to construct the OOB ensemble. Let Inline graphic be the index set of trees where Inline graphic; i.e. Inline graphic records trees where case Inline graphic is OOB. The OOB ensemble estimates of the CIF and the cause-Inline graphic mortality are, respectively, given by

3.1.

The OOB predicted value for a case does not use event time outcome information for that case, and, therefore, because it is a cross-validation based estimator, it can be used for estimation of the prediction error.

3.2. Event-free survival ensembles

An efficient method to analyze event-free survival probability is to simply use the tree-specific estimators already computed from the competing risks forests, which saves the computation time needed to grow a separate forest. Thus, we estimate the forest event-free survival using the ensemble Inline graphic.

3.3. Splitting rules

Here, we describe two splitting rules that can be used to grow competing risk trees. For notational convenience, we describe these rules for the root node using the entire learning data, but the idea extends obviously to any tree node and to bootstrap data.

As before, let Inline graphic denote the survival times and event indicators, and let Inline graphic be the distinct event times. Suppose that the proposed split for the root node is of the form Inline graphic and Inline graphic for a continuous predictor Inline graphic (this can be obviously generalized to categorical variables). Such a split forms two daughter nodes containing two new sets of competing risk data. To indicate these data, we use a subscript of Inline graphic and Inline graphic for the left and right daughter nodes, and denote by Inline graphic and Inline graphic the cause-Inline graphic specific hazard rates in the left and the right daughter nodes, respectively. Similarly, define Inline graphic and Inline graphic to be the CIF for the left and the right daughter nodes, respectively.

The number of individuals at risk at time Inline graphic in the left and right daughter nodes are, respectively, Inline graphic and Inline graphic, where Inline graphic, Inline graphic, and Inline graphic is the Inline graphic-predictor for individual Inline graphic. The number of individuals who are risk at time Inline graphic is Inline graphic. The number of type Inline graphic events at time Inline graphic for the left and right daughters is, respectively,

3.3.

and Inline graphic is the total number of type Inline graphic events at Inline graphic. Define also Inline graphic to be the largest time on study in the parent node and the two daughters, respectively.

3.3.1. Generalized log-rank test

Our first splitting rule is the log-rank test. In the setting with competing risk, this is a test of the null hypothesis Inline graphic for all Inline graphic. The test is based on the weighted difference of the cause-specific Nelson–Aalen estimates in the two daughter nodes. Specifically, for a split at the value Inline graphic for variable Inline graphic, the splitting rule is

3.3.1. (3.1)

where the variance estimate is given by

3.3.1.

Time-dependent weights Inline graphic are used to make the test more sensitive to early or late differences between the cause-specific hazards. The choice Inline graphic corresponds to the standard log-rank test which has optimal power for detecting alternatives where the cause-specific hazards are proportional. The best split is found by maximizing Inline graphic over Inline graphic and Inline graphic.

3.3.2. Gray's test

The cause-Inline graphic specific log-rank splitting rule (3.1) is useful if the main purpose is to detect variables that affect the cause-Inline graphic specific hazard. It may not be optimal if the purpose is also prediction of cumulative event probabilities. In this case, better results may be obtained with splitting rules that select variables based on their direct effect on the cumulative incidence. For this reason, we model our second splitting rule after Gray's test (Gray, 1988), which tests the null hypothesis Inline graphic for all Inline graphic. For notational simplicity, consider analysis of event Inline graphic and assume Inline graphic; that is, we pool all events not equal to event Inline graphic. Gray's statistic for testing the null is

3.3.2.

where Inline graphic Here, the variance estimate is estimated based on the asymptotic normal representation under the null hypothesis; see bib11 for details.

In the special case where the censoring time is known for those cases that have an event before the end of follow-up, it is possible to obtain the score statistic of Gray's test by a simple modification of the log-rank test statistic. This is achieved by substituting in (3.1) for Inline graphic the modified risk set:

3.3.2.

This motivates our modified splitting rule. The splitting rule based on the score statistic that uses the modified risk sets is denoted by Inline graphic and given by substituting Inline graphic for Inline graphic and Inline graphic for Inline graphic in (3.1). Note that if the censoring time is not known for those cases that have an event before the end of follow-up, the largest observed time is used, and the statistic Inline graphic is still a good (and computationally efficient) approximation of Gray's test statistic; see Fine and Gray (1999, Section 3.2).

3.3.3. Composite splitting rules

If the aim is to predict the CIF of all causes simultaneously, or if interest is in identifying variables that are important for any cause, it can be useful to combine the cause-specific splitting rules across the event types:

3.3.3. (3.2)
3.3.3. (3.3)

The best split is found by maximizing over Inline graphic and Inline graphic. Note that we have ignored the dependence in the test statistics in defining the variance. We do so because these types of calculations are not suitable for random forest trees. As these trees are grown deeply, tree nodes typically have few observations, which makes estimation of a covariance matrix problematic due to the limited data and will result in a poorly performing split-statistic. We should remark that a bias may occur with (3.3) if the censoring times remain unknown for cases that have an event before the end of the follow-up. However, our empirical results indicate that, for the purpose of building competing risk forests, the modified Gray splitting rule performs very well.

3.4. Competing risks forest algorithm

The steps required to construct a competing risks forest can be summarized as follows.

  1. Draw Inline graphic bootstrap samples from the learning data.

  2. Grow a competing risk tree for each bootstrap sample. At each node of the tree, randomly select Inline graphic candidate variables. The node is split using the candidate variable that maximizes a competing risk splitting rule.

  3. Grow the tree to full size under the constraint that a terminal node should have no less than Inline graphic unique cases.

  4. Calculate Inline graphic and Inline graphic for each tree, Inline graphic.

  5. Take the average of each estimator over the Inline graphic trees to obtain its ensemble.

4. Prediction performance

4.1. Performance metrics

To assess prediction performance, we use the concordance index and the prediction error defined by the integrated Brier score (BS). The concordance index (C-index) is related to the area under the receiver operating characteristic curve and estimates the probability that, in a randomly selected pair of cases, the case that fails first had a worse predicted outcome. The BS is the squared difference between actual and predicted outcome.

Individuals are ranked by ensemble cause-Inline graphic mortality. We say that case Inline graphic has a higher risk of event Inline graphic than case Inline graphic if Inline graphic. Wolbers and others (2013) described a time-truncated concordance index for competing risks, which in our setting is

4.1.

Thus, the ensemble prediction of the cumulative incidence is concordant with the outcome if either the case with the higher cause-Inline graphic mortality has event Inline graphic before the other case has an event of cause Inline graphic or if the other case has a competing event. We also consider the time-dependent BS (Graf and others, 1999; Gerds and Schumacher, 2006) and its integral (IBS) to assess the performance of the ensemble CIF:

4.1.

4.2. OOB estimate of prediction error

Denote Inline graphic for the right-censored observations in a validation data set of size Inline graphic. Based on these data, the prediction error can be estimated using inverse probability of censoring weights (IPCWs) (Gerds and Schumacher, 2006; Wolbers and others, 2013). This technique requires an estimate of the censoring distribution. Let Inline graphic denote the so-called reverse Kaplan–Meier estimate of the censoring distribution. We shall assume that the censoring times are independent of the covariates and the event times and the event type. Thus, Inline graphic provides an unbiased estimate of the probability of being uncensored at time Inline graphic. To estimate Inline graphic we define weights Inline graphic and Inline graphic. The OOB-IPCW estimate at the largest observation time Inline graphic is

4.2.

where Inline graphic, Inline graphic and Inline graphic. Using weights Inline graphic (Binder and others, 2009), the OOB estimate of the integrated BS for event Inline graphic is given by

4.2.

Note that extremely large weights may occur, but can be avoided by evaluating the IPCW statistics at an earlier time point Inline graphic.

5. Variable selection

5.1. Variable importance

RSF variable selection typically involves filtering variables on the basis of variable importance (VIMP). VIMP measures the increase (or decrease) in prediction error for the forest ensemble when a variable is randomly “noised-up” (Breiman, 2001). A large positive VIMP shows that the prediction accuracy of the forest is substantially degraded when a variable is noised-up; thus a large VIMP indicates a potentially predictive variable.

In Breiman's original definition, VIMP is calculated by noising up a variable by permuting its value randomly. A more effective noising-up method, and one used throughout this paper, is random node assignment (Ishwaran and others, 2008). In random node assignment, cases are dropped down a tree and randomly assigned to a daughter node whenever the parent node splits on the target variable. This is more effective than permutation since it leads to a random assignment regardless of the type of variable. For example, permuting a discrete variable with, say, two values may not lead to a sufficiently noised-up feature.

Both non-event-specific and event-specific VIMP can be readily calculated for competing risks. To compute event-specific VIMP, we estimate the prediction error as described in Section 4.2. Then we noise up the data by random node assignment, and recompute the prediction error. The difference in these two values gives the VIMP for each variable for each event Inline graphic.

5.2. Minimal depth

Minimal depth assesses the predictiveness of a variable by the depth of the first split of a variable relative to the root node of a tree (Ishwaran, Kogalur, Gorodeski and others, 2010). The smaller this value, the more predictive is the variable. There are unique advantages to using minimal depth in a competing risk setting. First, unlike VIMP, there is an easily derived minimal depth threshold that can be used for selecting variables. Secondly, minimal depth is non-event-specific, and therefore by fitting a single forest it can be used to identify all variables that affect long-term event probabilities. On the other hand, while non-event-specific analyses are useful, it may also be important to identify variables that are event-specific. In Section 6.2, we describe a simple way to combine minimal depth with event-specific VIMP.

6. Empirical results

6.1. Simulations

We used simulations based on Cox-exponential hazard models Inline graphic of two competing events (Inline graphic) given a vector of covariates Inline graphic. In all simulations, we set Inline graphic. Six continuous predictors Inline graphic were drawn independently from a standard normal distribution and six binary predictors Inline graphic from a binomial distribution with success probability of 50%. We set

6.1.

such that variables Inline graphic, Inline graphic, Inline graphic, Inline graphic have an effect on the hazard of event 1 only, variables Inline graphic, Inline graphic, Inline graphic, Inline graphic have an effect on the hazard of event 2 only, and variables Inline graphic, Inline graphic, Inline graphic, Inline graphic have an effect on both hazards. The effect size Inline graphic for the continuous variables was set to Inline graphic and for the discrete variables the effect size Inline graphic was set to 1.5. This was our “linear model”. The additive structure of the linear model was changed in our “quadratic model”. Here, the squared variables, Inline graphic, have an additional effect Inline graphic on the event-specific hazards where

6.1.

Finally, we consider an “interaction model”. Additional interaction effects were added to the linear model of the form

6.1.

We set the effect sizes of the interaction terms to

6.1.

In all three simulation models, Inline graphic independent noise variables were drawn independently from a standard normal distribution and added to the simulated data sets. We set Inline graphic in our low-dimensional scenarios and Inline graphic in our high-dimensional scenarios. In all settings, independent right censoring was induced by drawing censoring times from an exponential distribution with rate Inline graphic. This yielded approximately 33% censored observations.

6.2. Forest models

The R-package randomForestSRC (Ishwaran and Kogalur, 2013) was used for computations. For each simulation experiment, 1000 trees were grown using the log-rank splitting rule (3.2) and the modified Gray's splitting rule (3.3). Terminal node size was set at Inline graphic (the default software setting). Randomized splitting was used. Within each parent node, for each of the randomly selected candidate variables, “nsplit” randomly selected split points were chosen (this is in contrast to non-random splitting where all possible split points for each of the candidate variables are considered). The tree node was split on that variable and random split point maximizing the absolute value of the split-statistic. We set nsplit Inline graphic. A small nsplit value is necessary in settings involving a mixture of discrete and continuous variables to avoid biasing splits toward continuous variables (Loh and Shih, 1997; Ishwaran, Kogalur, Gorodeski and others, 2010).

We fit RSF using log-rank splitting (3.2) for each event using weights Inline graphic and Inline graphic. We denote the resulting forests as Inline graphic and Inline graphic. Because we focus on performance over event 1 only (for ease of interpretation), we only report the results for Inline graphic. Additionally, three forests were fit using Gray's modified splitting rule (3.3): Inline graphic used Inline graphic; Inline graphic used Inline graphic; Inline graphic used Inline graphic. Only Inline graphic and Inline graphic are reported.

Variables were selected using minimal depth variable selection Ishwaran, Kogalur, Gorodeski and others (2010). Those variables whose event-specific VIMP was positive, and that met a minimal depth threshold (estimated from the forest), represented the final selected set of variables. As noted in Ishwaran, Kogalur and others (2010), the number of variables selected at each node, Inline graphic, referred to as “mtry”, should be set high when using minimal depth in high-dimensional applications. In our high-dimensional simulations (Inline graphic), we used Inline graphic. The default setting Inline graphic was used in low-dimensions (Inline graphic). See Ishwaran, Kogalur and others (2010) for further discussion on setting tuning parameters in high dimensions.

For comparison, we used four alternative methods. For the first, we used the proportional subhazard method of Fine and Gray (1999), abbreviated as FGInline graphic and FGInline graphic. Computations were implemented using the R-software package cmprsk (Gray, 2006). For the second method, we used cause-specific Cox regression (abbreviated as Inline graphic and Inline graphic) for each of the competing events. Predictions of the CIF were obtained by combining the Cox models using (1.1). Computations were implemented using the R-software riskRegression (Gerds and others, 2012). For both approaches, we specified additive effects of the predictor variables and resorted to selecting variables by using Inline graphic-values. A cutoff of 5% was used. For the third method, we applied a stepwise selection algorithm (CRRstep) to the Fine-Gray regression models as proposed in Kuk and Varadhan (2013). We used backward elimination as implemented in the R-package crrstep with an Akaike information criterion selection criterion (Varadhan and Kuk, 2013). For the fourth method, we used Cox-likelihood based boosting (Binder and Schumacher, 2008), abbreviated as CoxBoostInline graphic and CoxBoostInline graphic. This uses boosting to fit proportional subhazards as in Fine and Gray (1999). Computations were implemented using the CoxBoost R-software package (Binder, 2009). The optimal number of boosting iterations was estimated using 10-fold cross-validation. A boosting penalty of 100 was used.

6.3. Simulation results

The simulations were repeated 1000 times independently and results were averaged over the runs. To estimate prediction performance (Table 1), in each simulation run we generated a training set with Inline graphic independent observations and a test set with Inline graphic independent observations. The C-index and the integrated BS were truncated at a sufficiently low quantile of the observed event time distribution. A lower benchmark for prediction performance was obtained in each simulation study by fitting a null model which ignores all covariates. An upper benchmark was obtained by fitting the data generating model to each training data set, i.e. the combination of two cause-specific Cox regression models that were given the correct linear predictor (including quadratic and interaction terms) and no noise variables.

Table 1.

Cox-exponential simulations Inline graphic replications)

Low-dimensional simulations
Inline graphic, Inline graphic, Inline graphic
Linear model
Quadratic model
Interaction model
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
NMInline graphic 15.4 (0.7) 15.1 (0.6) 15.5 (0.6)
Inline graphic 10.1 (0.6) 82.1 (1.4) 9.5 (0.7) 79.4 (2.2) 10.8 (0.7) 80.3 (1.5)
Inline graphic 13.6 (0.7) 75.9 (2.3) 13.9 (0.7) 72.1 (2.6) 15.1 (0.6) 61.6 (2.5)
Inline graphic 13.2 (0.7) 76.5 (2.1) 13.6 (0.6) 72.0 (2.4) 15.0 (0.6) 61.3 (2.6)
Inline graphic 13.1 (0.6) 77.2 (2) 13.5 (0.6) 72.9 (2.3) 15.0 (0.6) 61.9 (2.5)
CoxBoostInline graphic 11.4 (0.7) 79.6 (1.9) 13.9 (0.8) 65.7 (4) 15.4 (0.7) 55.9 (4.1)
FGInline graphic 11.7 (0.9) 79.5 (1.8) 14.6 (1.0) 66.4 (2.7) 16.5 (0.9) 57.5 (2.5)
Inline graphic 10.9 (0.7) 80.1 (1.7) 14.4 (1.0) 66.6 (2.6) 16.5 (0.9) 57.6 (2.5)
CRRstepInline graphic 13.6 (2.1) 62.7 (18.4) 14.8 (0.9) 55.4 (12.1) 15.7 (0.7) 49.7 (9.1)
High-dimensional simulations
Inline graphic, Inline graphic, Inline graphic
Linear model
Quadratic model
Interaction model
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
NMInline graphic 15.4 (0.7) 15.1 (0.6) 15.5 (0.6)
DGMInline graphic 10.1 (0.6) 82.0 (1.4) 9.8 (0.9) 79.3 (2.2) 10.8 (0.7) 80.3 (1.5)
Inline graphic 14.9 (0.7) 67.4 (2.9) 14.8 (0.7) 65.0 (3.6) 15.5 (0.7) 53.3 (2.5)
Inline graphic 14.9 (0.7) 65.8 (3.5) 14.6 (0.7) 64.5 (3.4) 15.6 (0.6) 52.2 (2.4)
Inline graphic 14.8 (0.7) 68.3 (3.1) 14.4 (0.6) 66.7 (3.2) 15.6 (0.6) 52.7 (2.5)
CoxBoostInline graphic 13.5 (1.0) 72.0 (4) 14.8 (0.8) 58.4 (5.7) 15.7 (0.7) 51.5 (2.9)

Performance measures are average (standard deviation) of test set C-index Inline graphic and integrated BS Inline graphic. NM is the null model which assigns the same predicted CIF to each observation and Inline graphic is the data generating model fitted to the training set.

Based on Table 1, we draw the following conclusions:

  1. In the low-dimensional linear simulations, Fine–Gray, Cox, and CoxBoost are better than RSF.

  2. For the low-dimensional quadratic and interaction model, RSF outperforms the other methods.

  3. RSF is better than CoxBoost in the high-dimensional quadratic and interaction model simulations, but CoxBoost is better in the linear model.

  4. The event-specific RSF models Inline graphic and Inline graphic tended to be slightly better than the composite model Inline graphic in the low-dimensional simulations, but this trend was less pronounced in the high-dimensional simulations, and in some cases it was reversed (high-dimensional linear model).

  5. We were not able to calculate Fine–Gray or Cox in the high-dimensional simulations. This is expected of unregularized methods which perform poorly in high-dimensional problems.

To assess VIMP, we calculated selection rates across the 1000 runs (Tables 2 and 3). True positive rates were summarized separately for each of the predictors Inline graphic. False positive rates were averaged across noise variables. Based on Tables 2 and 3, we draw the following conclusions:

  1. The log-rank splitting forest Inline graphic performs best in identifying only those variables affecting the event 1 cause-specific hazard (i.e. Inline graphic and Inline graphic). Recall that log-rank splitting is designed to test for differences in the cause-specific hazard: the results confirm the efficacy of the approach.

  2. The Gray composite splitting forest Inline graphic is designed to discover all variables affecting the event 1 CIF, which are Inline graphic. We find it does a good job doing so. Furthermore, in nearly all simulations, it achieves the smallest false positive rate over the noise variables.

Table 2.

Variable selection frequencies Inline graphic from low-dimensional simulation study Inline graphic

Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Noise
Linear model
Inline graphic 34.4 37 17.1 17.6 85.3 85.4 42.4 42.2 16.7 18 94.1 94.1 2.0
Inline graphic 90.3 91.2 22 19.7 39.5 42.4 94.1 94.5 17.5 18.8 39.1 39.2 7.5
Inline graphic 88.4 88.9 6.5 6 81.8 81.6 93.2 94.4 2 2.6 88.4 86.6 5.2
CoxBoostInline graphic 99.9 99.7 78.2 75.8 91.4 91.6 99.9 100 82.5 80.5 95 94.4 37.6
Inline graphic 99.7 100 7.1 7.1 99.9 99.5 99.9 99.9 7 5.5 100 99.9 0
Inline graphic 99.6 99.6 9.5 8.2 99.4 99.6 99.9 99.8 8.9 8.5 99.8 99.6 8.9
FGInline graphic 97.8 97.3 41 42 70 72.2 99.2 99.3 48.7 46.9 79.8 77.4 8.9
CRRstepInline graphic 48.8 48.9 31.2 31.5 42.7 42.6 49 49.1 35.2 33.5 44.9 45.1 11.5
Quadratic model
Inline graphic 79.1 42.9 35.9 14.1 91.8 97.9 8.2 10 6.1 3.7 31.1 31.9 3.8
Inline graphic 99.9 84.1 23.3 18.9 59.2 42 31.1 32.9 7.4 7.5 10.6 9.6 7.6
Inline graphic 100 80.7 5.5 6.2 96.4 83.6 27.5 30.9 2.5 2.4 23.8 22.5 6.4
CoxBoostInline graphic 92 83.5 60.9 46.3 57.9 70.1 68.3 66.4 37.5 38.6 46.2 43.4 28.7
Inline graphic 100 97.2 7.1 6.7 99.6 97.6 72.1 74.2 7.5 7.2 72.8 71.4 0
Inline graphic 92.9 66.7 10.5 9.3 76.6 69.9 45.3 46.3 8.2 9.6 41.1 37.8 8.4
FGInline graphic 80.3 66.1 34.8 18.9 30.4 44.5 39.3 40.7 12.1 12.3 18.4 16.5 7.2
CRRstepInline graphic 43.6 35.9 23.3 18 26.1 28.1 27.8 28.7 13.6 13 16.5 15.5 9.7
Interaction model
Inline graphic 28.3 27.2 18.1 18.5 65.2 59.5 20.5 27.7 8.9 11 66.5 57 14.2
Inline graphic 50.1 57.2 20.2 18.9 27.5 26.6 45.5 61.1 10.4 13.9 19.8 17.4 17.1
Inline graphic 48.1 57.4 17.4 16.6 43.3 44.6 46.2 62.1 5.8 5.2 45.4 42.3 17.7
CoxBoostInline graphic 23.3 17 16.6 16.4 16.7 15 50.2 59.8 22.4 28.9 31.5 27.6 14.9
Inline graphic 99.1 96.9 7 6.4 97.4 97.1 56 45 6.1 6.1 54.1 50.3 0
Inline graphic 19 8.3 8.7 7.6 14.1 7.9 46.8 59.1 7.9 6.7 47.4 40.4 8.0
FGInline graphic 15.2 7 8.5 6.5 7.2 6.7 40.2 54.5 12 14.4 22.5 19.5 6.5
CRRstepInline graphic 15.8 5.7 6 5.8 5.8 5.9 19.8 22 9.1 5 7.4 6.2 5.7

Variables Inline graphic have an effect on the hazard of event Inline graphic onlyInline graphic variables Inline graphicInline graphicInline graphicInline graphic have an effect on the hazard of event Inline graphic onlyInline graphic and variables Inline graphicInline graphicInline graphicInline graphic have an effect on both hazards. Shown are the true positive rates separately for variables Inline graphicInline graphic. False positive rates over the noise variables are given in columns labeled “Noise”.

Table 3.

Variable selection frequencies Inline graphic from high-dimensional simulation study Inline graphic, Inline graphic, Inline graphic

Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Noise
Linear model
Inline graphic 15.3 15 7.7 7 54.8 53.9 21.4 23.3 9.5 9.9 83.4 81.5 1.6
Inline graphic 53.5 51.7 5.5 6.6 16.2 17 64 67.1 6.4 4.9 11.6 10 3.4
Inline graphic 44.5 44.8 1.4 1.4 35.7 35.3 56.6 59.8 0.3 0.1 48.2 44.6 1.4
CoxBoostInline graphic 89.9 91.2 26.8 27.3 47.2 46.5 93.6 93.6 33.6 31.3 57.8 53.6 3.9
Inline graphic 99.9 99.8 8.4 6.3 99.8 99.7 100 100 6.2 4.7 99.8 100 0
Quadratic model
Inline graphic 60.8 22.2 24.5 11.9 82.7 95.4 4.8 4.5 3.6 4.1 16.2 18.4 1.5
Inline graphic 99.4 30.7 6 12.4 48.4 8.2 12.3 11 1.4 2.4 2 2.7 3.7
Inline graphic 99.1 31.1 3 5.8 93.5 40.4 13.4 12.7 0.6 1.2 8.6 9.5 4.5
CoxBoostInline graphic 68.7 32.7 13.4 8 16.7 18.3 18.1 18.7 3.2 4.7 5.6 5.1 1.8
DGMInline graphic 100 96.6 7.5 7.7 99.8 97.9 69.8 72.8 6.1 8.2 71.2 70.9 0
Interaction model
Inline graphic 3.8 4.4 4.5 3.5 10 9.8 7.7 13.4 6.6 7.3 38.1 33 2.6
Inline graphic 10.2 9.7 6.2 4.2 5.3 6.4 15.1 25.7 2.7 3.3 4.3 3.9 4.4
Inline graphic 10.8 12.3 7.3 4.4 10.5 11.6 15.8 25.8 1.6 1.2 17.7 15.9 5.4
CoxBoostInline graphic 2.6 1.8 1.4 1.2 1.3 1.6 15.5 25.7 3.9 5.2 6 6.1 1.2
Inline graphic 97.9 97 6.8 7.7 98.5 97.3 56.8 46.8 8.5 7.7 54 47.9 0

Variables Inline graphic, Inline graphic, Inline graphic, Inline graphic have an effect on the hazard of event 1 only, variables Inline graphic, Inline graphic, Inline graphic, Inline graphic have an effect on the hazard of event 2 only, and variables Inline graphic, Inline graphic, Inline graphic, Inline graphic have an effect on both hazards. Shown are the true positive rates separately for variables Inline graphicInline graphic. False positive rates over the noise variables are given in columns labeled “Noise”.

7. Highly active antiretroviral therapy for HIV infection

The Johns Hopkins HIV Clinical Cohort is a longitudinal, dynamic, clinical cohort of HIV-infected patients receiving primary care through the Johns Hopkins AIDS Service, which provides care to a large proportion of HIV-infected patients in the Baltimore metropolitan area (Moore, 1998). From this cohort, we identified 2960 individuals initiating effective antiretroviral therapy between 1996 and 2005 and for these we wished to predict time to all-cause mortality, and time to AIDS-defining illnesses after the initiation of effective treatment. Variables included laboratory measurements (CD4 nadir, HIV-RNA levels, total lymphocyte counts, and hemoglobin, albumin, and creatinine levels) as well as non-laboratory measurements (prior diagnosis of an AIDS-defining illness, prophylaxis for Pneumocystis jiroveci pneumonia, sex, race, history of injection drug use, history of heavy alcohol use, heroin use, cocaine use, and a medical history of personality disorder, anxiety, depression, schizophrenia, or suicide attempt). All marker measurements were restricted to the measurement closest to the time of initiation of highly active antiretroviral therapy (HAART) within a window starting 1 year prior to treatment with the exception of nadir CD4 counts. Nadir CD4 counts was the lowest CD4 that was measured prior to the initiation of effective treatment.

Including death, there were 15 competing risk outcomes. Forests were fit using the modified Gray splitting rule (3.3) with weights Inline graphic for all Inline graphic. The same tuning parameters were used as before except for terminal node size, which was set to Inline graphic. This value was determined by optimizing the OOB event averaged C-index using a small subset of the original data (Inline graphic; the remaining data were used for the analysis). In addition, in order to determine cause-specific hazard risk factors, we fit a separate RSF to each event Inline graphic utilizing log-rank splitting (3.2) with weights Inline graphic for Inline graphic.

Figure 1 displays the ensemble CIF for each of the 15 outcomes from the RSF analysis using the composite Gray splitting rule. The CIF's have been groups with similar ranges for better visualization. Most apparent is that death has a near uniform higher incidence rate than all other events. Some AIDS illnesses have incidence rates that peak rapidly. For example, incidence for non-Hodgkin's lymphoma increases rapidly and then begins to flatten after 4 years.

Fig. 1.

Fig. 1.

Averaged ensemble CIF for all 15 events from the HAART study using RSF. CIFs have been grouped by similar vertical ranges for better visualization.

Table 4 lists the minimal depth and event-specific VIMP for each variable for the top five most frequent outcomes, which includes death, the most frequently occurring event. Minimal depth values were obtained using Gray's splitting; event-specific VIMP were obtained using log-rank splitting. Variables selected by minimal depth (a total of 11) represent factors affecting Inline graphic-year predictions for all events. The top three variables are nadir CD4 count, albumin level, and total lymphocyte count. These three factors, however, have different cause-specific hazard effects as seen by their event-specific VIMP. For death, albumin level is the most influential factor; nadir CD4 count is influential for the four other events; and total lymphocyte counts are influential for HIV encephalopathy. The importance of albumin for death is not surprising as it is a marker for many general health issues such as liver disease, malnutrition, renal disease, and dehydration. However, it is not a marker for immune function, whereas nadir CD4 is a marker for immune system damage. Candidiasis, pneumocystis pneumonia, HIV encephalopathy, and mycobacterium avium complex are all infection related: thus it is not surprising that nadir CD4 is influential for these outcomes.

Table 4.

Minimal depth and event-specific VIMP for risk factors from HAART analysis for the top five most frequent outcomes

VIMP
Minimal depth (all events) Death Candediasis PCP HIV encephalopathy MAC
Nadir CD4 prior to HAART 1.14 0.45 4.81 8.88 5.28 6.89
Albumin level 1.81 5.51 0.60 Inline graphic0.08 2.87 Inline graphic0.01
Total lymphocyte counts 2.21 1.36 1.02 1.89 2.49 1.85
Hemoglobin level 2.48 0.98 0.65 0.77 1.48 2.58
Creatinine level 3.24 2.40 0.10 0.51 0.91 1.58
Injected drug use 3.32 0.64 0.20 0.17 0.23 0.10
HIV-RNA levels 3.53 0.12 0.51 2.38 0.24 3.50
Age 3.63 1.67 0.00 0.09 0.39 1.79
Pre-2000 3.93 0.00 0.26 0.06 0.15 Inline graphic0.06
AIDS prior to HAART 4.14 0.32 0.46 1.23 0.42 2.46
PCP prophylaxis 4.69 0.04 1.58 0.85 0.16 0.99
History of hepatitis C 5.06 0.54 Inline graphic0.02 0.26 1.10 0.09
Race 5.64 0.06 1.12 0.08 0.33 0.21
Heterosexual 5.74 0.05 Inline graphic0.05 Inline graphic0.02 Inline graphic0.14 Inline graphic0.05
History of mental illness 5.90 0.18 0.01 0.05 Inline graphic0.10 Inline graphic0.02
Sex 6.26 0.20 0.08 0.04 Inline graphic0.08 Inline graphic0.03
History of hepatitis B 6.36 0.18 Inline graphic0.03 Inline graphic0.08 Inline graphic0.03 Inline graphic0.03
Cocaine 6.43 0.08 0.00 Inline graphic0.01 Inline graphic0.05 Inline graphic0.03
Men sex with men 6.49 0.02 0.02 0.11 Inline graphic0.02 Inline graphic0.10
Depression 6.54 0.01 Inline graphic0.04 Inline graphic0.05 Inline graphic0.04 0.03
Heroin 6.68 0.07 0.04 0.01 Inline graphic0.02 0.05
Current smoker 7.04 0.01 Inline graphic0.06 Inline graphic0.11 0.03 Inline graphic0.03
Suicide attempt 7.51 0.01 Inline graphic0.02 0.01 Inline graphic0.04 0.00
Alcohol 7.53 0.01 Inline graphic0.02 Inline graphic0.01 Inline graphic0.03 Inline graphic0.04
Anxiety 7.60 0.00 0.00 0.00 Inline graphic0.00 0.00
Personality disorder 7.60 0.00 0.00 0.00 0.00 0.00
Schizophrenia 7.60 0.00 0.00 0.00 0.00 0.00
Smoking history 7.60 0.00 0.00 Inline graphic0.00 0.00 0.00
Inline graphic 73.9 70.5 78.1 80.0 87.7

PCP, pneumocystis pneumonia; MAC, mycobacterium avium complex.

The minimal depth threshold for selecting variables is Inline graphic(indicated by a horizontal line separating significant variables from non-significant variables). The event-specific C-index is listed in the last row under the entry Inline graphic

8. Discussion

In this paper, we described a novel extension of RSF to competing risk settings. We introduced new splitting rules for growing competing risk trees and described several ensemble estimators useful for competing risks. These included ensembles for the CIF as well as event-specific estimates of mortality. We described a novel non-parametric method for event-specific variable selection and showed how minimal depth, a new variable selection method for RSF, could be used for identifying non-event-specific variables. Our two splitting rules, log-rank splitting and the modified Gray's splitting rule, are designed to test different null hypotheses. Log-rank splitting tests for equality of the cause-specific hazard, while the modified Gray's splitting rule tests for equality of the CIF. We showed how event-specific VIMP and minimal depth variable selection could be used individually or simultaneously with these rules to identify variables specific to certain events or common to all events.

RSF computations were implemented using the randomForestSRC package. In the future, we plan to complement the package with a Java application that will allow users to restore a RSF analysis for prediction on new data. This would make it possible to apply competing risk prediction in clinical settings (see Section B of supplementary material available at Biostatistics online (http://www.biostatistics.oxfordjournals.org) for further discussion). Computational load is always an issue in large-scale problems and we mention two strategies to combat this. One is to utilize randomized splitting via “nsplit”. This not only mitigates bias but also greatly reduces computational times. A second strategy is to utilize the OpenMP enabled package of randomForestSRC (http://www.ccs.miami.edu/~hishwaran/rfsrc.html), which implements parallel processing. This approximately reduces computational time linearly with the number of CPU's which can translate into substantial computational gains.

Supplementary material

Supplementary material is available at http://biostatistics.oxfordjournals.org.

Funding

This work was supported by the National Institutes of Health [R01CA16373 to H.I., K01-AI071754, U01-AI42590, and R01-DA011602 to R.D.M., S.J.G., and B.M.L.]; and the National Science Foundation [DMS 114899 to H.I.].

Supplementary Material

Supplementary Data

Acknowledgements

We gratefully acknowledge the help of two referees whose reviews improved the manuscript. Conflict of Interest: None declared.

References

  1. Aalen O., Johansen S. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scandinavian Journal of Statistics. 1978;5:141–150. [Google Scholar]
  2. Andersen P. K. A note on the decomposition of number of life years lost according to causes of death. Research Report. 2012 doi: 10.1002/sim.5903. University of Copenhagen, Department of Biostatistics, 2. [DOI] [PubMed] [Google Scholar]
  3. Binder H. CoxBoost: Cox Models by Likelihood Based Boosting for a Single Survival Endpoint or Competing Risks. R package version 1.1. http://cran.r-project.org . [Google Scholar]
  4. Binder H., Allignol A., Schumacher M., Beyersmann J. Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics. 2009;25:890–896. doi: 10.1093/bioinformatics/btp088. [DOI] [PubMed] [Google Scholar]
  5. Binder B., Schumacher M. Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models. BMC Bioinformatics. 2008;9:14. doi: 10.1186/1471-2105-9-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Breiman L. Random forests. Machine Learning. 2001;45:5–32. [Google Scholar]
  7. Fine J. P., Gray R. J. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 1999;446:496–509. [Google Scholar]
  8. Gerds T. A., Schumacher M. Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biometrical Journal. 2006;6:1029–1040. doi: 10.1002/bimj.200610301. [DOI] [PubMed] [Google Scholar]
  9. Gerds T. A., Scheike T. H., Andersen P. K. Absolute risk regression for competing risks: interpretation, link functions, and prediction. Statistics in Medicine. 2012;31:3921–3930. doi: 10.1002/sim.5459. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Graf E., Schmoor C., Sauerbrei W. F., Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine. 1999;18:2529–2545. doi: 10.1002/(sici)1097-0258(19990915/30)18:17/18<2529::aid-sim274>3.0.co;2-5. [DOI] [PubMed] [Google Scholar]
  11. Gray R. J. A class of K-sample tests for comparing the cumulative incidence of a competing risk. The Annals of Statistics. 1988;16:1141–1154. [Google Scholar]
  12. Gray R. J. cmprsk: Subdistribution Analysis of Competing Risks. R package version 2.1-7. http://cran.r-project.org . [Google Scholar]
  13. Ishwaran H., Kogalur U. B. randomForestSRC: Random Forests for Survival, Regression and Classification (RF-SRC) R package version 1.4.0. http://cran.r-project.org . [Google Scholar]
  14. Ishwaran H., Kogalur U. B., Blackstone E. H., Lauer M. S. Random survival forests. The Annals of Applied Statistics. 2008;2:841–860. [Google Scholar]
  15. Ishwaran H., Kogalur U. B., Chen X., Minn A. J. Random survival forests for high-dimensional data. Statistical Analysis and Data Mining. 2010;4:115–132. [Google Scholar]
  16. Ishwaran H., Kogalur U. B., Gorodeski E. Z., Minn A. J., Lauer M. S. High-dimensional variable selection for survival data. Journal of the American Statistical Association. 2010;105(489):205–217. [Google Scholar]
  17. Kuk D., Varadhan R. Model selection in competing risks regression. Statistics in Medicine. 2013;32(18):3077–3088. doi: 10.1002/sim.5762. [DOI] [PubMed] [Google Scholar]
  18. Loh W.-Y., Shih Y.-S. Split selection methods for classification trees. Statistica Sinica. 1997;7:815–840. [Google Scholar]
  19. Moore R. D. Understanding the clinical and economic outcomes of HIV therapy: the Johns Hopkins HIV clinical practice cohort. Journal of Acquired Immune Deficiency Syndrome and Human Retrovirology. 1998;17(Suppl. 1):38. doi: 10.1097/00042560-199801001-00011. [DOI] [PubMed] [Google Scholar]
  20. Varadhan R., Kuk D. crrstep: Stepwise Covariate Selection for the Fine & Gray Competing Risks Regression Model. R package version 2013-02.12. http://cran.r-project.org . [Google Scholar]
  21. Wolbers M., Koller M. T., Witteman J. C. Concordance for prognostic models with competing risks. Research Report. 2013 University of Copenhagen, Department of Biostatistics, 3. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press

RESOURCES