Abstract
The cumulative incidence function is commonly reported in studies with competing risks. The aim of this paper is to compute the treatment-specific cumulative incidence functions, adjusting for potentially imbalanced prognostic factors among treatment groups. The underlying regression model considered in this study is the proportional hazards model for a subdistribution function [1]. We propose estimating the direct adjusted cumulative incidences for each treatment using the pooled samples as the reference population. We develop two SAS macros for estimating the direct adjusted cumulative incidence function for each treatment based on two regression models. One model assumes the constant subdistribution hazard ratios between the treatments and the alternative model allows each treatment to have its own baseline subdistribution hazard function. The macros compute the standard errors for the direct adjusted cumulative incidence estimates, as well as the standard errors for the differences of adjusted cumulative incidence functions between any two treatments. Based on the macros’ output, one can assess treatment effects at predetermined time points. A real bone marrow transplant data example illustrates the practical utility of the SAS macros.
Keywords: Competing risks, Cumulative incidence function, Cox model, Subdistribution
1. Introduction
In biomedical studies, problems often arise in analyzing competing risks data, in which each subject is at risk of failure from K different causes and failure due to one risk precludes the occurrence of any other risk. With competing risks data, one practical problem is to evaluate treatment efficacy as well as assess the effects of other predictors on the cumulative incidence of a specific risk. Researchers are often interested in comparing the cumulative incidence functions for patients receiving different treatments. The standard statistical approach is to construct the cause-specific hazard functions for all risks and model the cumulative incidence function as a function of all cause-specific hazards [2, 3, 4, 5]. This approach has been criticized for the following drawbacks: first, the hazards of all the risks need to be correctly modeled even when only one risk is of study interest; second, the cumulative incidence does not possess a monotonic relation to the covariate effect and it may be difficult to conclude the effect of a specific covariate on the cumulative incidence curve; third, it may be difficult to identify which specific covariates have time-varying effects on the cumulative incidence function.
Recently, some new regression approaches have been proposed to model the cumulative incidence function directly. Fine and Gray [1] proposed a Cox type regression model on the subdistribution hazard function, λ1(t; z) = − d log{1 − F1(t; z)}/dt = λ10(t) exp(βTz), where F1(t; z) and λ10(t) are the cumulative incidence function for given covariates z and the baseline subdistribution hazard function for risk 1, respectively. Sun et al. [6] considered an alternative flexible model for the subdistribution hazard. Klein and Andersen [7] proposed a pseudo-value approach to conduct regression analysis on the cumulative incidence function at pre-fixed time points. Scheike et al. [8] constructed a linear transformation model on the cumulative incidence function based on binomial regression models. Direct regression modeling on the cumulative incidence function has been frequently adopted in biomedical research, however, the interpretation is somewhat awkward. Recently, Zhang and Fine [9] proposed inferences for the difference, ratio and odds ratio between cumulative incidence functions based on nonparametric estimates.
Some statistical packages have been developed to implement the aforementioned approaches. Fine and Gray’s (FG) model has been implemented in the cmprsk package in R. This package includes several useful features such as estimation of the cumulative incidence function and time-dependent covariates. Klein et al. [10] developed a SAS macro and an R function for the pseudo-value approach. The direct binomial regression modeling approach has been implemented in the R package timereg, which can be used to fit a class of flexible models and develop model identification procedures [11].
Direct adjustment is a common method to tackle the issue of imbalanced distributions of the risk factors (covariates) between treatment groups, when the treatment-specific survival quantities need to be reported. Chen and Tsiatis [12] proposed to compare the direct adjusted restricted mean lifetimes of different treatments for time to event data. The direct adjustment method has also been adopted for univariate survival data to generate the overall survival curves for different treatments [13, 14]. For this approach, each subject’s predicted survival probability is estimated with all possible treatment assignments, regardless of the actual treatment received. The direct adjusted survival probability of one treatment is obtained by averaging these predicted survival probabilities of this treatment over the pooled samples. The objective of this paper is to study direct adjusted cumulative incidence estimates for treatments.
We adopted the FG model [1] as the underlying regression model and developed a SAS macro to compute the direct adjusted cumulative incidence curves for treatments. In biomedical studies, often the treatment effect varies over time. To allow the treatments to have time-varying effects, we considered a stratified proportional subdistribution hazards model and developed an alternative SAS macro implementing the proposed stratified FG model. The macros also compute the standard errors for the adjusted cumulative incidence functions and for the difference between two cumulative incidence estimates. We demonstrate our SAS macros with bone marrow transplant data from the Center for International Blood and Marrow Transplant Research (CIBMTR).
The outline of the remainder of the paper is as follows. In Section 2, we briefly describe the Cox-type model on a subdistribution hazard function and present the direct adjusted cumulative incidence curves for treatment groups. The SAS macros are introduced in Section 3. Bone marrow transplant data is analyzed in Section 4 to demonstrate our macros. The concluding remarks are given in Section 5.
2. Direct adjusted cumulative incidence curve
Let Tj and Cj be the event time and right censoring time of the jth individual and let εj denote the failure type. We assume that we observe n independent identically distributed (i.i.d.) replications of {(Xj, Δj, Zj), j = 1, …, n}, where Xj = min(Tj, Cj), Δj = εjℐ(Tj ≤ Cj), and Zj = (Zj,1, …, Zj,p)T is the vector of covariates. We assume that (Tj, εj) are independent of Cj given covariates of Zj.
To directly assess covariate effects on the cumulative incidence of a specific risk, Fine and Gray [5] proposed a Cox-type model for the subdistribution hazard function,
| (2.1) |
where λ10(·) is an arbitrary nonnegative baseline subdistribution hazard function.
Fine and Gray proposed a weighted score equation using the inverse probability of censoring weight (IPCW) technique to adjust for right censoring and estimate β by solving the weighted score equation, denoted as β̂. Let Λ̂10(t) be the Breslow-type estimator for the cumulative baseline subdistribution hazard function . Then, for given z, the predicted cumulative incidence, F1(t; z) = P(T ≤ t, ε = 1|z) can be estimated as F̂1(t; z) = 1 − exp {−Λ̂10(t) exp (β̂T z)}. Fine and Gray studied the large sample properties and derived consistent variance estimates for β̂, Λ̂10(t) and F̂1(t; z).
For the K sample competing risks data, the data can be summarized as (Xij, Δij, Zij) for i = 1, ⋯, K and j = 1, ⋯, ni. First, we consider fitting a regular FG model,
| (2.2) |
where exp(γk) is the subdistribution hazards ratio between treatment k and treatment 1, for k = 2, ⋯, K and γ1 = 0.
In many cancer studies, the treatments often have time-varying effects. To allow the time-varying effects, we consider a stratified FG model which extends model (2.2) to
| (2.3) |
where each stratum has its own baseline subdistribution hazard function λ10,k(t), for k = 1, …, K. In general, any categorical covariate can be considered as a stratifying covariate. In this paper and the developed SAS macros, we only consider the treatment groups as stratifying covariates. The stratified weighted score equations can be used to estimate β. The predicted cumulative incidence for the kth treatment with covariate value z can be estimated by F̂1k(t, z) = 1 − exp {−Λ̂10,k(t) exp (βT z)}, where Λ̂10,k(t) is the Breslow type estimator for .
The direct adjustment method has been widely adopted in biomedical researches to compute the average survival probabilities for each treatment using the pooled samples as the reference population [12, 13, 14]. Utilizing this approach to competing risks data, each subject’s predicted cumulative incidence probability is estimated with all possible treatment assignments, regardless of the actual treatment received. The direct adjusted cumulative incidence probability of one treatment is obtained by averaging these predicted cumulative incidence probabilities of this treatment over the pooled samples. The direct adjusted cumulative incidences of different treatments are computed for populations with the same prognostic factors, and hence can be directly compared to reveal the treatment effect.
When the proportional subdistribution hazards assumption is valid for both treatments and covariates, model (2.2) can be considered. The adjusted cumulative incidence function of the kth treatment is given by
where . When the treatment effects change over time, model (2.3) can be utilized and similarly, the direct adjusted cumulative incidence for treatment k is given by averaging the predicted cumulative incidence functions over all strata,
can be estimated by plug-in estimators, denoted as , respectively. The variance estimation of directly follows Fine and Gray’s result and is omitted in this paper. Following Fine and Gray’s approach, we derived the variance estimation for under model (2.3), which is given in the appendix.
It is important to evaluate the significance of the differences between two direct adjusted cumulative incidence curves. For treatments k and l, it is desirable to construct confidence intervals as well as a confidence band for [12]. Based on the confidence interval at a given time point, one can conclude whether two treatments yield significantly different cumulative incidence rates on a particular cause of failure up to the given time point. The explicit variance estimation for the difference is given in the appendix as well.
3. The SAS macros
We have developed two SAS macros, %CIFCOX and % CIFSTRATA, for computing the direct adjusted cumulative incidence curves and relevant standard errors based on model (2.2) and (2.3), respectively. The macros have been saved in two text files, “CIFCOX.txt” and “CIFSTRATA.txt”. The input data set structure and the macro parameters are the same for these two macros. In this section, we use %CIFSTRATA to illustrate data preparation and how to invoke the macros.
The macro should be loaded to the current session by using the %INCLUDE statement. Suppose that the text file is saved in the root directory of a flash drive (E:). Write the following statement in the program to load the macro.
%INCLUDE ”E:\CIFSTRATA.txt”;
To use the macro, one has to prepare a SAS data set with the variables for the failure time, failure causes, treatment group index, and the covariates. The data set name and the variables need to be explicitly specified when invoking the macro. Below is the statement that invokes that macro, followed by the explanation of the macro parameters.
%CIFSTRATA(inputdata, time, cause, group, covlist, outdata);
where
| inputdata | the input SAS data set name; |
| time | the failure time variable; |
| cause | the variable indicating censoring or type of failure with 0 for censoring, 1 for risk of interest and 2 for all other causes of failure; |
| group | the treatment group index and coded as {1, 2, …, K}; |
| covlist | the list of all covariates; |
| outdata | the SAS output data set name. |
The macro prints the regression parameter estimates, the estimated variance-covariance matrix, and the estimated baseline cumulative subdistribution hazard in the output window. The result regarding the direct adjusted cumulative incidences is both printed in the output window and saved in the output data set specified by the macro parameter. The output data set includes the time variable, the direct adjusted cumulative incidences (cif1, ⋯, cifK) and their standard errors (se1, ⋯, seK), as well as the standard errors of the differences between any two direct adjusted cumulative incidences (se1_2, ⋯, se(K_1) K).
4. The example
For illustration purpose, we consider a transplant outcome data set from The Center for International Blood and Marrow Transplant Research (CIBMTR) [15, 16]. The CIBMTR is comprised of clinical and basic scientists who confidentially share data on their blood and bone marrow transplant patients with CIBMTR Data Collection Center located at the Medical College of Wisconsin. The CIBMTR is a repository of information about results of transplants at more than 450 transplant centers worldwide. The objective of the study was to compare the outcomes of bone marrow transplants (BMT) from HLA-identical siblings donors (n=1224), HLA-matched unrelated donors (n=383) and HLA-mismatched unrelated donors (n=108). For BMT studies, cancer relapse and death in complete remission (without relapse) are two competing events. In real medical studies, researchers need to report the results from all competing risks. In this paper, we only report the relapse results for the three treatment groups to illustrate the macros.
For illustration purpose, we only adjust for disease types which are significantly imbalanced between the three treatment groups (see Table 1). The proportionality assumption was tested by adding a time-dependent covariate using the crr function in the cmprsk pack-age. The test indicated that the treatments have time-varying effects and the disease types have constant effects on relapse (see Table 2).
Table 1.
Distribution of patients’ disease types in different donor groups.
| Type of leukemia | HLA-identical sibling N (%) |
Matched unrelated N (%) |
Mismatched unrelated N (%) |
p-value* |
|---|---|---|---|---|
| ALL AML CML |
248 (20%) 449 (37%) 527 (43%) |
62 (16%) 70 (18%) 251 (66%) |
30 (28%) 18 (17%) 60 (55%) |
< 0.001 |
p-value is based on Pearson Chi-square test.
Table 2.
Test for proportionality for relapse.
| β̂ | SE | p-value | |
|---|---|---|---|
| log(t)×(Matched unrelated) log(t)×(Mismatched unrelated) |
−0.316 −0.330 |
0.080 0.163 |
< 0.001 0.042 |
| log(t)×AML log(t)×CML |
−0.085 0.143 |
0.073 0.111 |
0.246 0.197 |
We use our developed macros to fit an FG model (2.2) and a stratified FG model (2.3) for the subdistribution hazard of relapse to assess the effects of different donor types. For model (2.2), the donor groups are treated as the covariates in the regression. For model (2.3), the donor groups are treated as the strata. The input data set, indata, should be prepared to include these variables: time, cause and group. In this example, the variable cause takes value 1 for relapse, 2 for transplant-related mortality, defined as death in complete remission, and 0 for censored observations. The variable group takes value 1 for HLA-identical sibling donor, 2 for HLA-matched unrelated donor, and 3 for HLA-mismatched unrelated donor. The input data set also includes covariates of disease type: AML (1 if disease is AML, 0 otherwise), CML (1 if disease is CML, 0 otherwise). The unstratified and stratified FG models are invoked by the statements %CIFCOX(indata, time, cause, group, AML CML, outdata) and %CIFSTRATA(indata, time, cause, group, AML CML, outdata), respectively.
The output data set includes the direct adjusted cumulative incidence estimates and estimated standard errors. We generate the direct adjusted cumulative incidence curves of relapse for different types of donor (see Figures 1–2). Figure 2 shows that the adjusted cumulative incidence functions of relapse for HLA-matched sibling and matched unrelated transplants were crossing at 2 years since transplant. It indicates that the ratios of the subdistribution hazards between these two donor types are not constant over time, which matches the test results given in Table 2. The test for proportionality and Figure 2 indicate that the stratified FG model is more appropriate for this data set. In practice, we recommend fitting a stratified FG model which is more flexible and allows the treatment groups to have time-varying effects. For illustration, we report predicted cumulative incidence rates for relapse by both a regular FG model and a stratified FG model.
Fig.1.
Direct adjusted curves of relapse based on a Cox model of the subdistribution
Fig.2.
Direct adjusted curves of relapse based on a stratified Cox model of the subdistribution
The results from fitting an FG model and a stratified FG model are given in Table 3. Both models indicate that, compared to patients with ALL, patients with CML are associated with a significantly lower risk of relapse. From the regular FG model (2.2), mismatched unrelated transplants have a lower relapse rate than HLA-matched sibling transplants, but this is because mismatched unrelated transplants have a much higher treatment related death rate. We ran the crr function in the cmprsk package to fit the FG model for this data set, and it agreed with the output of the macro %CIFCOX. The cmprsk package cannot be used for a stratified FG model. The stratified FG model was implemented in the macro %CIFSTRATA.
Table 3.
Result of the FG model and Stratified FG model on the subdistribution of relapse
| FG model | Stratified FG model | |||||
|---|---|---|---|---|---|---|
| β̂ | SE | p-value | β̂ | SE | p-value | |
| Matched unrelated Mismatched unrelated |
−0.043 −0.966 |
0.154 0.363 |
0.779 0.008 |
|||
| AML CML |
−0.062 −1.036 |
0.139 0.150 |
0.653 < 0.001 |
−0.073 −1.039 |
0.139 0.150 |
0.598 < 0.001 |
The SAS macros report the direct adjusted cumulative incidences for each treatment group. The explicit estimates at year 5 after transplant are given in Table 4. The output data set also includes the standard errors for the differences between every two adjusted cumulative incidence curves, which can be used in pairwise comparison between treatments. The results of the treatment comparisons at year 5 are given in Table 5. The results indicate that the sibling donors and matched unrelated donors have higher cumulative incidences of relapse than the mismatched unrelated donors.
Table 4.
Estimated cumulative incidence probability of relapse at year 5 after transplant
| Donor type | FG model | Stratified FG model | |||||
|---|---|---|---|---|---|---|---|
| SE | 95% CI | SE | 95% CI | ||||
| HLA-identical sibling Matched unrelated Mismatched unrelated |
0.230 0.221 0.096 |
0.014 0.026 0.032 |
(0.202, 0.260) (0.169, 0.273) (0.033, 0.159) |
0.242 0.206 0.084 |
0.015 0.028 0.028 |
(0.211, 0.272) (0.151, 0.261) (0.029, 0.139) |
|
Table 5.
Estimated difference in cumulative incidence of relapse and 95% confidence interval at year 5 after transplant
| Difference | FG model | |||
|---|---|---|---|---|
| SE | 95% CI | p-value | ||
| Sibling − Matched unrelated Sibling − Mismatched unrelated Matched − Mismatched |
0.008 0.133 0.125 |
0.029 0.035 0.041 |
(−0.049, 0.066) (0.064, 0.202) (0.044, 0.206) |
0.777 < 0.001 0.003 |
| Difference | Stratified FG model | |||
| SE | 95% CI | p-value | ||
| Sibling − Matched unrelated Sibling − Mismatched unrelated Matched − Mismatched |
0.035 0.158 0.122 |
0.032 0.032 0.039 |
(−0.028, 0.098) (0.094, 0.221) (0.045, 0.199) |
0.271 < 0.001 0.002 |
Estimation of the variances of the direct adjusted cumulative incidence curves based on the FG model is computationally expansive. We optimized the programs to make the macros efficient in terms of running time. One technique was to use close initial values in the numerical algorithm. In the macros, we obtained the initial values from the regression coefficient estimates of a Cox model on the cause-specific hazard. For this example, with a relatively large sample size of 1715 observations, our macros required a tolerable amount of running time to generate the results. To give readers some idea of the amount of time needed for various sample sizes, we tested our macros on some subsets of the data and reported the times in Table 6. The computer used for the testing was a Dell desktop PC with a 3.0 GHZ processor and 3.25 GB of RAM. The run time for a medium-sized sample is about half a minute, which is practical for most purposes.
Table 6.
The run times for various sample sizes
| Sample size | %CIFCOX | %CIFSTRATA |
|---|---|---|
| Entire data set (n=1715) First 1000 observations First 500 observations |
1 min 38 sec 30 sec 4 sec |
2 min 00 sec 32 sec 4 sec |
5. Concluding remarks
We have presented two SAS macros to compute the direct adjusted cumulative incidence functions for different treatments based on the unstratified and stratified FG models for a subdistribution hazard function. The treatment effect on a specific cause of failure can be assessed by constructing a confidence interval for the difference between two direct adjusted estimates. An inference based on a confidence interval, however, is only valid for assessing the effect at a given time point. For treatment comparisons at several time points or among several treatments, one needs to adjust for multiple comparisons by adopting an appropriate method such as the Bonferroni method. The more general problem of assessing treatment effect over a period of time can be solved by constructing a confidence band for the differences between cumulative incidence functions. A simulation approach can be utilized to generate the confidence band [1], but it is not available in our SAS macros. A computationally efficient method will be needed in developing such a SAS macro.
The developed macros are available to the public and can be found at our website: “http://www.biostat.mcw.edu/software/SoftMenu.html”.
Acknowledgments
Mei-Jie Zhang was supported by National Cancer Institute grant 5R01CA054706.
Appendix A
The sample is given by (Xij, Δij, Zij), for i = 1, ⋯, K; j = 1, ⋯, ni; , where Xij = min(Tij, Cij) is the observed time with Tij and Cij the failure time and censoring time, respectively. Δij ∈ (0, 1, 2) indicates censoring and causes of failure. Zij is the vector of covariates. Define the following counting processes, Nij(t) = I(Tij ≤ t, Δij = 1) and Yij(t) = I{(Tij ≤ t)∪(Tij < t, Δij = 2)}. The weight is given by ŵij(t) = rij(t)Ĝ(t)/Ĝ(Xij∧t) where rij(t) = I(Cij ≥ Tij ∧ t) and Ĝ is the Kaplan-Meier estimator of the survival function for the censoring time. Also define
where a⊗2 = aaT for a column vector a.
For the stratified FG model (2.3), the variance of can be estimated by . The explicit expression for Ĵ1,k, ij(t) is
For the difference between the kth and lth treatment, The variance of can be estimated by , where
where
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicts of interest
There are no conflicts of interest.
References
- 1.Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J. Amer. Statist. Assoc. 1999;94:496–509. [Google Scholar]
- 2.Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. New York: Springer-Verlag; 1993. [Google Scholar]
- 3.Cheng SC, Fine JP, Wei LJ. Prediction of cumulative incidence function under the proportional hazards model. Biometrics. 1998;54:219–228. [PubMed] [Google Scholar]
- 4.Shen Y, Cheng SC. Confidence bands for cumulative incidence curves under the additive risk model. Biometrics. 1999;55:1093–1100. doi: 10.1111/j.0006-341x.1999.01093.x. [DOI] [PubMed] [Google Scholar]
- 5.Scheike TH, Zhang MJ. Extensions and applications of the cox-aalen survival model. Biometrics. 2003;59:1036–1045. doi: 10.1111/j.0006-341x.2003.00119.x. [DOI] [PubMed] [Google Scholar]
- 6.Sun LQ, Liu JX, Sun JG, Zhang MJ. Modelling the subdistribution of a competing risk. Statistica Sinica. 2006;16(4):1367–1385. [Google Scholar]
- 7.Klein JP, Andersen PK. Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics. 2005;61:223–229. doi: 10.1111/j.0006-341X.2005.031209.x. [DOI] [PubMed] [Google Scholar]
- 8.Scheike TH, Zhang MJ, Gerds TA. Predicting cumulative incidence probability by direct binomial regression. Biometrika. 2008;95:205–220. [Google Scholar]
- 9.Zhang MJ, Fine JP. Summarizing differences in cumulative incidence functions. Statistics in Medicine. 2008;27:4939–4949. doi: 10.1002/sim.3339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Klein JP, Gerster M, Andersen PK, Tarima S, Perme MP. Sas and r functions to compute pseudo-values for censored data regression. Computer Methods and Programs Biomedicine. 2008;89:289–300. doi: 10.1016/j.cmpb.2007.11.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Scheike TH, Zhang MJ. Flexible competing risks regression modeling and goodness-of-fit. Lifetime Data Analysis. 2008;14:464–483. doi: 10.1007/s10985-008-9094-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen PY, Tsiatis AA. Causal inference on the difference of the restricted mean lifetime between two groups. Biometrics. 2001;57:1030–1038. doi: 10.1111/j.0006-341x.2001.01030.x. [DOI] [PubMed] [Google Scholar]
- 13.Gail MH, Byar DP. Variance calculations for direct adjusted survival curves, with applications to testing for no treatment effect. Biometrical Journal. 1986;28:587–599. [Google Scholar]
- 14.Zhang X, Loberiza FR, Klein JP, Zhang MJ. A sas macro for estimation of direct adjusted survival curves based on a stratified cox regression model. Computer Methods and Programs in Biomedicine. 2007;88:95–101. doi: 10.1016/j.cmpb.2007.07.010. [DOI] [PubMed] [Google Scholar]
- 15.Szydlo R, Goldman JM, Klein JP, Gale RP, Ash RC, Bach FH, Bradley BA, Casper JT, Flomenberg N, Gajewski JL, Gluckman E, Henslee-Downey PJ, Hows JM, Jacobsen N, Kolb HJ, Lowenberg B, Masaoka T, Rowlings PA, Sondel PM, van Bekkum DW, van Rood JJ, Vowels MR, Zhang MJ, Horowitz MM. Results of allogeneic bone marrow transplants for leukemia using donors other than hla-identical siblings. J Clin Oncol. 1997;15(5):1767–1777. doi: 10.1200/JCO.1997.15.5.1767. [DOI] [PubMed] [Google Scholar]
- 16.Logan BR, Zhang MJ, Klein JP. Regression models for hazard rates versus cumulative incidence probabilities in hematopoietic cell transplantation data. Biology of Blood and Marrow Transplantation. 2006;12:107–112. doi: 10.1016/j.bbmt.2005.09.005. [DOI] [PubMed] [Google Scholar]


