This novel adjustment method converts inappropriate Cox hazard ratios for more accurate interpretation of published immune checkpoint inhibitor trials.
Key Points
Question
How can inappropriate Cox hazard ratios (HRs) in immune checkpoint inhibitor trials be converted to appropriate proportional hazards (PH) cure model treatment-effect estimates (HR for short-term survivors and difference in proportions [DP] for long-term survivors) to provide better guidance for clinical decision-making?
Findings
The proposed Cox-TEL (Cox PH–Taylor expansion adjustment for long-term survival data) method was applied to simulated data, which showed that Cox-TEL–converted values (defined as Cox-TEL HR and Cox-TEL DP) were close to PH cure model estimates. The accuracy of Cox-TEL was further verified in a real-world melanoma data set.
Meaning
Cox HRs may lead to misinterpretation of drug efficacy in immune checkpoint inhibitor trials; the Cox-TEL method can convert inappropriate Cox HRs to appropriate treatment-effect estimates using published trials results as inputs, which may have enormous influence in clinical decision-making.
Abstract
Importance
In immune checkpoint inhibitor (ICI) trials, long tails and crossovers in survival curves—which violate the proportional hazards (PH) assumption—are commonly observed, making cure or restricted mean survival time models preferable for analysis of ICI survival data. Cox PH analysis, however, still appears in major medical journals, leading to potential misinterpretation of clinical significance.
Objective
To convert inappropriate Cox hazard ratios (HRs) to appropriate PH cure model treatment-effect estimates (HR for short-term survivors and difference in proportions [DP] for long-term survivors) for more accurate interpretation of published ICI trials.
Design and Setting
This study uses the Taylor expansion technique to demonstrate the mathematical relationship between Cox PH and PH cure models for data with long-term survival, and based on this relationship, proposes the Cox-TEL (Cox PH–Taylor expansion adjustment for long-term survival data) adjustment method. The proposed Cox-TEL method requires only 2 inputs: the reported Cox HRs and Kaplan-Meier–estimated survival probabilities.
Results
Comprehensive simulations show the strength of the proposed method in terms of power, bias, and type I error rate; these results, which are close to PH cure model estimates, were further verified in a melanoma data set (N = 285; Cox HR = 0.71; 95% CI, 0.51-0.91; Cox-TEL HR = 0.83; 95% CI, 0.60-1.07; PH cure HR = 0.86; 95% CI, 0.61-1.11; Cox-TEL DP = 0.10; 95% CI, 0.01-0.23; PH cure DP = 0.10; 95% CI, 0.00-0.21). The magnitude of potential difference between reported and adjusted HRs using real-world ICI trial results is demonstrated. For example, in the CheckMate 067 trial (nivolumab/ipilimumab combination therapy vs ipilimumab), the Cox HR was 0.54 (95% CI, 0.44-0.67), and the Cox-TEL HR was 0.90 (95% CI, 0.73-1.11).
Conclusions and Relevance
The findings of this study suggest the need to revisit published ICI survival data analysis to address potential misinterpretation. The Cox-TEL method not only is designed for this purpose, but also is user friendly and easy to implement using published clinical trial data and a freely available R software package.
Introduction
The Kaplan-Meier (KM) estimator and the Cox proportional hazards (PH) model are standard methods for survival analysis in oncology drug development. Prior to the introduction of immune checkpoint inhibitor (ICI) therapy, with achievable long-term survival among patients with advanced-stage cancers, these 2 methods together worked well for clinical trial outcome interpretation. In ICI trials, however, long tails and crossovers in survival curves may violate the PH assumption, making Cox PH less appropriate in the context of comparing ICIs with other therapies.
Several statistical models have been proposed to better analyze data with a long tail in the KM survival curve, for example, restricted mean survival time1,2,3 and cure models.4 Restricted mean survival time—the difference in mean survival times between study arms within a restricted window—provides an alternative to hazard ratio (HR) when the PH assumption is violated. On the other hand, cure models, including the PH cure model,5,6,7,8,9 consider population survival as a mixture of patients without long-term survival (short-term survivors [STS]) and patients in the long-tail segment of the survival curve (long-term survivors [LTS]). Cure models not only consider survival probabilities among STS (ie, HR), but also evaluate and compare proportions of LTS between arms (ie, difference in proportions [DP]). Therefore, cure models seem ideal for analyzing ICI trial data but have not been widely adopted, resulting in a critical need for an adjustment method able to convert inappropriate Cox HRs to approximate PH cure model treatment-effect estimates (ie, HR and DP) for better interpretation of ICI trial results.
Methods
We propose an adjustment method, Cox-TEL (Cox PH–Taylor expansion adjustment for long-term survival data), which converts inappropriate Cox HRs to appropriate treatment-effect estimates based on the mathematical relationship between Cox PH and PH cure models. The only data required to perform the adjustment are Cox HRs with CIs and survival probabilities excerpted from KM curves, which are often made available in the published literature. The Vanderbilt University Medical Center Institutional Review Board determined that review was not required because the study does not qualify as human subject research per 45 CFR §46.102(e)(1). Informed consent was not needed because the study used publicly available, deidentified data from published articles.
The PH cure model assumes that the study population has 2 patient groups, namely, STS and LTS; STS eventually experience an event, while LTS do not. Based on this assumption, STS and LTS are evaluated separately, with HRs calculated for STS and DPs for LTS. If LTS are not observed in a study population, the PH cure model collapses to Cox PH. A long plateau in the KM curve suggests the existence of an LTS group; thus, visual assessment provides a simple but informal method for considering LTS. More formal methodology is described in eDiscussion in the Supplement.
The proposed Cox-TEL method generates an adjustment factor, determined by Taylor polynomials, to convert Cox HR to Cox-TEL HR and DP, with corresponding CIs; these metrics are approximations of PH cure HR and DP, respectively. Figure 1 illustrates the Cox-TEL schema, with computation details in eMethods in the Supplement. The statistical software package described in this article was developed in R, version 3.6.1 (R Foundation).
Figure 1. Schema of the Proposed Cox-TEL Adjustment Method.
The only data required to perform the adjustment are Cox HRs with CIs and survival probabilities excerpted from Kaplan-Meier survival curves. Cox-TEL indicates Cox proportional hazards–Taylor expansion adjustment for long-term survival data.
Results
Data Examples
We used simulated data and real-world data from published ICI trials10,11,12 to evaluate the proposed method. In 4 simulation scenarios (eFigure 1 and eTable 1 with details in the eSimulations in the Supplement), Cox-TEL HRs and DPs approximate those computed with the PH cure model (eFigures 2 through 7 and eTables 2 and 3 in the Supplement). The following are real-world data illustrations.
Adjuvant Interferon Alfa-2b vs Best Supportive Care in a Cohort of Patients With Surgically Resected Melanoma
To examine performance of Cox-TEL in real-world data, we first considered relapse-free survival data13 from 285 patients with surgically resected melanoma randomized to treatment with adjuvant high-dose interferon alfa-2b or best supportive care (Eastern Cooperative Oncology Group trial EST 1684).10 The KM survival curve suggested better relapse-free survival in the treatment arm with a Cox HR of 0.71 (95% CI, 0.51-0.91) within an 8-year follow-up period (Figure 2). However, the long survival curve tails suggested violation of the PH assumption.
Figure 2. Cox Hazard Ratio (HR) and Kaplan-Meier Curves of Relapse-Free Survival in a Melanoma Cohort.
Data are from 285 patients with surgically resected melanoma within an 8-year follow-up period. IFN indicates interferon alfa-2b.
To address this issue, we applied the Cox-TEL adjustment with the following survival probabilities excerpted from KM survival curves: at t = 2, 4, 6, 8 (years), control arm, 0.36, 0.28, 0.26, 0.25; treatment arm, 0.48, 0.39, 0.35, 0.35; with height of plateau, 0.25 and 0.35, respectively. Details for time point (t′js) selection are included in eDiscussion in the Supplement.
The Cox-TEL HR for STS was 0.83 (95% CI, 0.60-1.07), and DP for LTS was 0.10 (95% CI, 0.01-0.23). The 95% CI of the Cox-TEL HR crossed 1; thus, STS survival difference between arms would not be considered statistically significant. On the other hand, interferon alfa-2b treatment showed a higher proportion of LTS (35%) compared with control (25%). To assess performance of Cox-TEL on this real-world data, we applied PH cure (smcure13), which computed an STS HR of 0.86 (95% CI, 0.61-1.11) and an LTS DP of 0.10 (95% CI, 0.00-0.21)—estimates very close to the Cox-TEL results, suggesting the reliability of the proposed method.
ICIs in Patients With Advanced Melanoma or Non–Small Cell Lung Cancers
For real-world performance as well as illustration of potential misinterpretation with Cox HR in ICI studies, we next considered CheckMate 017/05711 and CheckMate 06712 findings before and after Cox-TEL adjustment (Table). In CheckMate 017/057,11 the Cox HR for OS was 0.70 (95% CI, 0.61-0.81), indicating superiority of nivolumab over docetaxel. In the KM curves, however, an early crossover was seen, suggesting patients without benefit from nivolumab treatment. Consistent with this visual assessment, we computed a Cox-TEL HR of 0.89 (95% CI, 0.77-1.03) and DP of 0.10 (95% CI, 0.05-0.15) (Table), findings that indicated no survival advantage in STS and small (though statistically significant) effect size in LTS.
Table. Changes in Treatment-Effect Estimates Before and After Cox-TEL Adjustment.
| ICI trial | Treatments (n1, n0)a | Cox HR (95% CI) | Cox-TEL HRs and DPs | |||
|---|---|---|---|---|---|---|
| HR (95% CI) | DP (95% CI) | Proportion of LTS arm 1 vs arm 0 | t′js, mob | |||
| CheckMate 017/05711 3-y updated | Nivolumab (427) vs docetaxel (427) | 0.70 (0.61-0.81) | 0.89 (0.77-1.03) | 0.10 (0.05-0.15) | 0.14 vs 0.04 | 6, 12, …, 48 |
| CheckMate 06712 | Niv + ipilimumab (314) vs ipilimumab (315) | 0.54 (0.44-0.67) | 0.90 (0.73-1.11) | 0.25 (0.15-0.35) | 0.52 vs 0.27 | 6, 12, …, 54 |
| Nivolumab (316) vs ipilimumab (315) | 0.65 (0.53-0.79) | 0.94 (0.77-1.15) | 0.19 (0.09-0.29) | 0.46 vs 0.27 | 6, 12, …, 54 | |
Abbreviations: Cox-TEL, Cox proportional hazards–Taylor expansion adjustment for long-term survival data; DP, differences in proportions; HR, hazard ratio; ICI, immune checkpoint inhibitor; LTS, long-term survivors.
Denotes the sample sizes in both study arms.
Denotes the chosen time points in the extraction of survival probabilities.
The CheckMate 067 trial12 compared nivolumab/ipilimumab combination therapy vs ipilimumab, or nivolumab alone vs ipilimumab, in advanced melanoma. Cox HRs were 0.54 (95% CI, 0.44-0.67) and 0.65 (95% CI, 0.53-0.79), respectively. These results suggested that either combination therapy or nivolumab alone is superior to ipilimumab alone. After adjustment, however, neither nivolumab (Cox-TEL HR = 0.94; 95% CI, 0.77-1.15) nor nivolumab/ipilimumab combination (Cox-TEL HR = 0.90; 95% CI, 0.73-1.11) showed superiority to ipilimumab monotherapy for STS. For LTS, Cox-TEL DP was larger in the combination setting (0.25; 95% CI, 0.15-0.35) than with monotherapy (0.19; 95% CI, 0.09-0.29).
In both CheckMate data sets,11,12 the larger the magnitude of Cox-TEL DP, the smaller the Cox HR. This inverse association may reflect contribution of this specific LTS population to Cox HR, which in turn leads to result misinterpretation.
Discussion
This study proposes a useful adjustment method, Cox-TEL, which converts inappropriate Cox HRs to Cox-TEL HRs and Cox-TEL DPs—treatment-effect estimates that approximate the PH cure model and are better suited to interpret ICI trial results. When LTS proportion exceeds 0.5, Cox-TEL adjustment may produce biased approximations; moreover, when working with raw data, PH cure models may be considered before Cox-TEL. Without raw data, however, we provide a simple way to check the PH assumption for STS survival (see eDiscussion in the Supplement). Although trials powered for Cox PH can be underpowered for the PH cure model,14 and therefore for Cox-TEL adjustment, the proposed adjustment method nevertheless offers insight into appropriate model selection and data interpretation for clinicians, filling an important gap for oncologists before cure models are adopted in ICI study design and data analysis.
Limitations
Performance of Cox-TEL depends on the proportion of LTS. Cox-TEL HR gradually underestimates true HR when this proportion increases, and this unsatisfactory result further affects the CI estimates for HR and DP (eTable 4 in the Supplement). Large bias results when (1) Cox HR is far from the true HR, particularly when due to a large proportion of LTS, or (2) use of low-order Taylor polynomials creates poor approximation. Using higher-order Taylor polynomials can reduce bias of Cox-TEL HR relative to true HR, and the algorithm provides automatic selection of polynomial order to minimize bias. See eDiscussion in the Supplement for details.
Conclusions
As presented in this study, when long-tail segments were observed in KM survival curves (eTable 5 in the Supplement for the judgment of long tails), the data structure violated the PH assumption; thus, a cure model may be a better method for data analysis. Cox-TEL converts Cox HRs to approximated PH cure model treatment-effect estimates based on a ready-to-use R package and 2 required inputs: survival probability excerpted from KM curves and Cox HRs. Information obtained from Cox-TEL adjustment provides oncologists an opportunity to rethink ICI trial results, which may have clinical influence before cure models are widely used in ICI trial analyses.
eMethods. Details for the proposed Cox-TEL adjustment method.
eSimulations. Detailed simulation settings and results.
eDiscussion. Selection of t′js for survival probabilities for Cox-TEL and detailed discussion for limitations of Cox-TEL.
eFigure 1. Survival curves of the two treatment arms in the four scenarios.
eFigure 2. Box plots for estimated HRs with sample sizes n0 and n1 in the four scenarios.
eFigure 3. Box plots for estimated HRs with sample sizes 2n0 and 2n1 in the four scenarios.
eFigure 4. Box plots for estimated DPs with sample sizes n0 and n1 in the four scenarios.
eFigure 5. Box plots for estimated DPs with sample sizes 2n0 and 2n1 in the four scenarios.
eFigure 6. Probability of rejecting HR = 1 in the four scenarios.
eFigure 7. Probability of rejecting DP = 0 in the four scenarios.
eTable 1. Treatment effects in the four scenarios for short-term survivors and long-term survivors in both arms.
eTable 2. Hazard ratios between short-term survivors and differences in proportions of long-term survivors with sample sizes n0 and n1.
eTable 3. Hazard ratios between short-term survivors and differences in proportions of long-term survivors with sample sizes 2n0 and 2n1.
eTable 4. Performance of the proposed method when π1 = 0.6, 0.5, 0.4, 0.3 (the proportion of long-term survivors in arm 1 increases) and π0 = 0.9 with all other parameters as in scenario 1 and sample sizes n0 and n1.
eTable 5. Ratios of the distance between the largest observation time and the largest uncensored observation time to the largest uncensored observation time across different median survival settings.
References
- 1.Royston P, Parmar MK. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med Res Methodol. 2013;13:152. doi: 10.1186/1471-2288-13-152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Uno H, Claggett B, Tian L, et al. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. J Clin Oncol. 2014;32(22):2380-2385. doi: 10.1200/JCO.2014.55.2208 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kim DH, Uno H, Wei LJ. Restricted mean survival time as a measure to interpret clinical trial results. JAMA Cardiol. 2017;2(11):1179-1180. doi: 10.1001/jamacardio.2017.2922 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Farewell VT. The use of mixture models for the analysis of survival data with long-term survivors. Biometrics. 1982;38(4):1041-1046. doi: 10.2307/2529885 [DOI] [PubMed] [Google Scholar]
- 5.Kuk AYC, Chen C-H. A mixture model combining logistic regression with proportional hazards regression. Biometrika. 1992; 79(3): 531-541. doi: 10.1093/biomet/79.3.531 [DOI] [Google Scholar]
- 6.Peng Y, Dear KBG. A nonparametric mixture model for cure rate estimation. Biometrics. 2000;56(1):237-243. doi: 10.1111/j.0006-341X.2000.00237.x [DOI] [PubMed] [Google Scholar]
- 7.Sy JP, Taylor JM. Estimation in a Cox proportional hazards cure model. Biometrics. 2000;56(1):227-236. doi: 10.1111/j.0006-341X.2000.00227.x [DOI] [PubMed] [Google Scholar]
- 8.Lu W. Maximum likelihood estimation in the proportional hazards cure model. Ann Inst Stat Math. 2008; 60(3): 545-574. doi: 10.1007/s10463-007-0120-x [DOI] [Google Scholar]
- 9.Corbière F, Commenges D, Taylor JM, Joly P. A penalized likelihood approach for mixture cure models. Stat Med. 2009;28(3):510-524. doi: 10.1002/sim.3481 [DOI] [PubMed] [Google Scholar]
- 10.Kirkwood JM, Strawderman MH, Ernstoff MS, Smith TJ, Borden EC, Blum RH. Interferon alfa-2b adjuvant therapy of high-risk resected cutaneous melanoma: the Eastern Cooperative Oncology Group Trial EST 1684. J Clin Oncol. 1996;14(1):7-17. doi: 10.1200/JCO.1996.14.1.7 [DOI] [PubMed] [Google Scholar]
- 11.Vokes EE, Ready N, Felip E, et al. Nivolumab versus docetaxel in previously treated advanced non-small-cell lung cancer (CheckMate 017 and CheckMate 057): 3-year update and outcomes in patients with liver metastases. Ann Oncol. 2018;29(4):959-965. doi: 10.1093/annonc/mdy041 [DOI] [PubMed] [Google Scholar]
- 12.Hodi FS, Chiarion-Sileni V, Gonzalez R, et al. Nivolumab plus ipilimumab or nivolumab alone versus ipilimumab alone in advanced melanoma (CheckMate 067): 4-year outcomes of a multicentre, randomised, phase 3 trial. Lancet Oncol. 2018;19(11):1480-1492. doi: 10.1016/S1470-2045(18)30700-9 [DOI] [PubMed] [Google Scholar]
- 13.Cai C, Zou Y, Peng Y, Zhang J. smcure: an R-package for estimating semiparametric mixture cure models. Comput Methods Programs Biomed. 2012;108(3):1255-1260. doi: 10.1016/j.cmpb.2012.08.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wang S, Zhang J, Lu W. Sample size calculation for the proportional hazards cure model. Stat Med. 2012;31(29):3959-3971. doi: 10.1002/sim.5465 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
eMethods. Details for the proposed Cox-TEL adjustment method.
eSimulations. Detailed simulation settings and results.
eDiscussion. Selection of t′js for survival probabilities for Cox-TEL and detailed discussion for limitations of Cox-TEL.
eFigure 1. Survival curves of the two treatment arms in the four scenarios.
eFigure 2. Box plots for estimated HRs with sample sizes n0 and n1 in the four scenarios.
eFigure 3. Box plots for estimated HRs with sample sizes 2n0 and 2n1 in the four scenarios.
eFigure 4. Box plots for estimated DPs with sample sizes n0 and n1 in the four scenarios.
eFigure 5. Box plots for estimated DPs with sample sizes 2n0 and 2n1 in the four scenarios.
eFigure 6. Probability of rejecting HR = 1 in the four scenarios.
eFigure 7. Probability of rejecting DP = 0 in the four scenarios.
eTable 1. Treatment effects in the four scenarios for short-term survivors and long-term survivors in both arms.
eTable 2. Hazard ratios between short-term survivors and differences in proportions of long-term survivors with sample sizes n0 and n1.
eTable 3. Hazard ratios between short-term survivors and differences in proportions of long-term survivors with sample sizes 2n0 and 2n1.
eTable 4. Performance of the proposed method when π1 = 0.6, 0.5, 0.4, 0.3 (the proportion of long-term survivors in arm 1 increases) and π0 = 0.9 with all other parameters as in scenario 1 and sample sizes n0 and n1.
eTable 5. Ratios of the distance between the largest observation time and the largest uncensored observation time to the largest uncensored observation time across different median survival settings.


