Skip to main content
JNCI Journal of the National Cancer Institute logoLink to JNCI Journal of the National Cancer Institute
. 2015 May 8;107(8):djv133. doi: 10.1093/jnci/djv133

Detecting Overall Survival Benefit Derived From Survival Postprogression Rather Than Progression-Free Survival

Satoshi Morita 1,2,, Kentaro Sakamaki 1,2, Guosheng Yin 1,2
PMCID: PMC4609552  PMID: 25956357

Abstract

Broglio and Berry (2009) examined the impact of survival postprogression (SPP) on overall survival (OS) when progression-free survival (PFS) was used to assess treatment effect in metastatic cancer. Their simulation studies found no statistical difference in OS because of dilution effect from SPP, although there was a statistical difference in PFS between treatment arms. Recently, two phase III clinical trials showed efficacy of experimental treatments in OS, but not PFS. These results seem counterintuitive, because it may be reasonable to consider that the effect of treatment in prolonging PFS can influence OS prolongation. We conducted simulations to examine the role of SPP in OS under the assumption that only SPP, and not PFS, differed between treatment arms. We also explored the impact of patient heterogeneity on the OS analysis. Our study offers a reasonable explanation for the two phase III trials and recommends further discussion of PFS as an adequate endpoint and what role SPP might play in OS to evaluate current treatment regimens.


Broglio and Berry (1) examined the impact of survival postprogression (SPP) on detecting the treatment effect on overall survival (OS) in metastatic cancer. By partitioning OS into progression-free survival (PFS) and SPP (ie, OS = PFS + SPP), they carried out statistical simulation studies under the assumption that PFS differed between treatment arms while SPP was the same. The simulation studies found that there was no statistically significant difference in OS because of the dilution effect from SPP, although there was a statistically significant difference in PFS between treatment arms. The longer the SPP, the more dilution was induced, because SPP represents a larger proportion of OS. As a result, SPP may provide an explanation for the conflicting results between PFS and OS, which also casts doubt on the appropriateness of using PFS as a surrogate endpoint for OS (2–4).

Recently, an interesting phenomenon in the relation between PFS and OS was observed in two phase III clinical trials for metastatic breast cancer (mBC) (5) and KRAS wild-type metastatic colorectal cancer (6). In contrast to what Broglio and Berry (1) studied, these trials showed statistically significant efficacy of the experimental treatments in terms of OS, but not of PFS. The results from the two trials appear to be counterintuitive, because based on our prior knowledge it may be reasonable to consider that the effect of a treatment in prolonging PFS can be reflected in the prolongation of OS. Therefore, we sought to identify the situations in the two aforementioned phase III trials in which there was no difference in PFS but a statistically significant difference in OS between treatment arms, and what role SPP played in these cases. Furthermore, we investigated what would happen if the difference in SPP between treatment arms depended on patient subgroups. This was motivated by an exploratory subgroup analysis that suggested a much larger difference in OS between treatment arms in patients with triple-negative mBC than in patients whose cancers were not triple negative (5,7). To answer these questions, we conducted statistical simulations following the framework of Broglio and Berry (1).

We generated hypothetical clinical trials with particular features of PFS and SPP under several clinical scenarios by mimicking the two actual phase III trials mentioned above. We assumed that SPP differed between treatment arms in several ways, as summarized in Supplementary Table 1 (available online), while there was no difference in PFS between treatment arms. We first assumed that SPP was on average longer in the experimental arm (arm E) than in the control arm (arm C). Next, we considered a situation in which SPP was longer in arm E than in arm C in one patient subgroup (subgroup 1), while there was no difference in SPP between arms in the complementary subgroup (subgroup 2). We used four median PFS (mPFS) values (3, 6, 9, and 12 months) to examine the impact of PFS time on the OS analysis. We considered a two-arm comparison (arms E vs C) with an equal number of patients allocated to the arms; three total sample sizes were used, 600, 800, and 1000. The patient accrual rate was set at 30 patients per month. Thus, the patient accrual periods were 20.0, 26.7, and 33.3 months for 600, 800, and 1000 patients, respectively. We assumed an additional nine-month follow-up period after the patient enrollment was finished.

Table 1.

Summary of hazard ratios, P values, and power values for the overall survival time from 50 000 simulated trials*

ϕ Median PFS, mo n = 600 n = 800 n = 1000
Median HR†(95% CI) P‡ Power§ Median HR† (95% CI) P‡ Power§ Median HR† (95% CI) P‡ Power§
100% 3 0.69 (0.57 to 0.84) <.001 0.965 0.69 (0.56 to 0.81) <.001 0.995 0.69 (0.60 to 0.79) <.001 1.000
6 0.72 (0.58 to 0.90) .003 0.844 0.73 (0.61 to 0.86) <.001 0.951 0.73 (0.62 to 0.84) <.001 0.987
9 0.74 (0.59 to 0.94) .015 0.691 0.75 (0.62 to 0.91) .003 0.842 0.76 (0.64 to 0.89) .001 0.930
12 0.76 (0.58 to 0.98) .029 0.566 0.77 (0.62 to 0.94) .010 0.722 0.77 (0.65 to 0.92) .003 0.835
80% 3 0.74 (0.61 to 0.90) .002 0.859 0.74 (0.63 to 0.87) <.001 0.955 0.74 (0.64 to 0.85) <.001 0.989
6 0.77 (0.62 to 0.96) .018 0.663 0.78 (0.65 to 0.92) .004 0.824 0.78 (0.67 to 0.90) .001 0.921
9 0.79 (0.63 to 1.00) .049 0.500 0.80 (0.66 to 0.96) .022 0.662 0.80 (0.68 to 0.94) .005 0.787
12 0.80 (0.62 to 1.03) .093 0.396 0.81 (0.66 to 0.99) .041 0.534 0.82 (0.69 to 0.97) .019 0.654
50% 3 0.83 (0.69 to 1.00) .054 0.487 0.83 (0.71 to 0.97) .018 0.648 0.83 (0.72 to 0.95) .006 0.772
6 0.85 (0.69 to 1.05) .143 0.322 0.85 (0.72 to 1.01) .070 0.448 0.85 (0.73 to 0.99) .031 0.568
9 0.87 (0.69 to 1.09) .220 0.234 0.87 (0.72 to 1.05) .134 0.320 0.87 (0.74 to 1.02) .080 0.410
12 0.87 (0.68 to 1.12) .276 0.185 0.88 (0.72 to 1.07) .198 0.244 0.88 (0.74 to 1.04) .141 0.313
20% 3 0.93 (0.77 to 1.12) .427 0.124 0.93 (0.79 to 1.08) .334 0.162 0.92 (0.81 to 1.06) .262 0.205
6 0.94 (0.76 to 1.16) .545 0.095 0.94 (0.79 to 1.11) .458 0.115 0.94 (0.81 to 1.09) .377 0.143
9 0.94 (0.75 to 1.19) .620 0.079 0.95 (0.79 to 1.14) .543 0.093 0.95 (0.81 to 1.11) .481 0.109
12 0.95 (0.74 to 1.22) .672 0.072 0.95 (0.78 to 1.16) .598 0.079 0.95 (0.80 to 1.12) .557 0.092

* The proportion of subgroup 1 is denoted by ϕ, with ϕ = 100% corresponding to the case with no subgroups. Four median progression-free survival (PFS) values (3, 6, 9, and 12 months) are used to examine the impact of PFS time on the overall survival analysis. The hazard ratio (HR) compares experimental and control arms for three sample sizes, n = 600, 800, and 1000, with an equal number of patients allocated to the two arms. An HR value smaller than 1.0 indicates a beneficial effect of the experimental treatment. CI = confidence interval; HR = hazard ratio.

† The median HR value represents the 50th percentile and the 95% interval denotes the 2.5 and 97.5 percentile values from 50 000 simulations.

‡ The P value is obtained from the log-rank test for the simulated trial matched with the median HR.

§ The statistical power (the probability of statistical significance) for the overall survival analysis is computed as the number of P values under .05 divided by the simulation repetition number (= 50 000).

Assuming that the distributions of PFS and SPP were exponential, we generated each patient’s PFS and SPP times and then summed them to obtain the OS time of that patient. The period of PFS was calculated from the time point of treatment initiation to the time point when disease progression was occurred, while that of SPP was calculated from the time point of disease progression to the time point of death. Thus, the period of OS was calculated from the time point of treatment initiation to the time point of death. For simplicity, censoring of the patient follow-up occurred only at the end of the nine-month follow-up period. We simulated 50 000 trials under each scenario. For each simulated trial, we estimated the OS survival curves using the Kaplan-Meier method for arms E and C and then compared them using the hazard ratio (HR) and log-rank test. We also computed the statistical power for the OS analysis as the number of statistical significance (P < .05) divided by the simulation repetition number. All the program codes were written using the R statistics package (version 3.1.0, Windows) (8).

Figure 1 presents typical examples of the OS curves obtained with a sample size of 600. The first row represents the case with all the patients from subgroup 1. A statistically significant difference (P < .05) between the two survival curves was seen in all four panels. As mPFS became shorter, the OS benefit from the experimental treatment was more statistically significant; that is, the difference in SPP between the two arms had a greater influence on the OS curves. As shown in the next three rows in Figure 1, the OS curves grew closer, as the proportion of patients in subgroup 1, ϕ, became smaller. For a ϕ of 20%, the survival benefit of arm E completely disappeared even when the mPFS was three months; that is, in the case where SPP accounted for more than 67% of OS.

Figure 1.

Figure 1.

Typical examples of overall survival (OS) curves estimated using the Kaplan-Meier method from the simulation studies under the total sample size n = 600. Each panel of plots is from a single simulated trial for which the hazard ratio (HR) and P value are estimated. Dotted and solid lines represent the curves of the experimental and control arms, respectively. The four columns (starting from the left) show the OS curves for median progression-free survival (PFS) times of 3, 6, 9, and 12 months, respectively. The first to fourth rows are those obtained with the following proportions of subgroup 1: ϕ = 100%, 80%, 50%, and 20%, respectively. Note that ϕ = 100% corresponds to the case where all patients belong to subgroup 1; this subgroup is defined by having a longer survival postprogression (SPP) in arm E than in arm C (median SPP = 9 and 6 months, respectively).

Table 1 summarizes the results from the 50 000 simulated trials, grouped by sample size. As was the case in Figure 1, the subgroup proportion and the relative lengths of PFS and SPP had a substantial impact on the OS analysis. In order to obtain an 80% statistical power for detecting a between-treatment OS difference of three months, it was shown that 600 patients might be sufficient when mPFS is three or six months if all the patients were from subgroup 1. If less than half of the patients were from subgroup 1, more than 1000 patients were needed to show an improvement in OS.

A natural follow-up question is, “Is the assumption that only SPP, and not PFS, differed between treatment arms realistic and could this sort of phenomenon happen in real clinical trials?” One possible answer may involve the influence of subsequent treatments on SPP. For instance, if the health conditions of patients in arm E did not deteriorate much compared with those in arm C because an experimental treatment was less toxic, patients in arm E could be more likely to receive more intensive subsequent treatment (eg, chemotherapy) with higher compliance than those in arm C. In addition, it is possible that an experimental treatment induced an antitumor immunity that might have a markedly positive effect on SPP. If patients in arm E were left with less compromised health and better immune function, this might lead to longer SPP in arm E and thus longer OS, even if PFS did not differ between arms E and C. In order to explicate these phenomena, it may be important to establish well-validated systems or markers to measure aspects of patients’ functional status that can adequately predict their SPP. Furthermore, despite the fact that we performed extensive simulations, our study has several limitations. Our simulations were conducted using a specific statistical model to generate PFS and SPP in a specific context with respect to the lengths of PFS and SPP, samples size, and patient enrollment and follow-up periods. It might be, therefore, difficult to apply the findings obtained in this study to more general situations.

Substantial previous studies evaluated the surrogacy of PFS for OS in metastatic cancers and demonstrated that the relationship between PFS and OS may differ for different types of cancer. For example, PFS was shown to be highly correlated with OS in colorectal, ovarian, and renal cell cancers (9–12), but not in breast or non–small cell lung cancers (13,14). In addition, it has been widely discussed that the influence of postprogression therapy may be one of the reasons why improvement of PFS fails to translate into improved OS in some disease and treatment settings (1–4). The two real phase III trials (5,6) further cast doubt on whether PFS is an adequate endpoint to evaluate current treatment regimens for metastatic cancers. In a context where comparing treatment sequences (strategies) rather than treatment arms during a specific treatment queue is an important objective, it may be more desirable to use other time-to-event measurements, for instance time to failure of strategy (TFS) (15), in place of PFS.

Taking the recent vigorous development of targeted agents into account, it may be critical to sufficiently consider the possibility that a particular treatment benefits only a subpopulation of patients when designing clinical trials in terms of the length of follow-up and time point of data analysis. As a possible example that could result in contradictory PFS and OS results (not only within a single trial but also between trials), consider a situation with two patient subgroups where an experimental treatment benefits only the subgroup with a shorter PFS and OS. In a phase III trial, PFS and OS are compared between the experimental and control arms in the entire population of patients at two interim analyses and one final analysis. Our additional simulation study (Supplementary Figure 1, available online) revealed that both PFS and OS were statistically significantly different between treatment arms at the first analysis. At the second analysis with longer follow-up, the PFS curves came closer because of the influence of late progressions observed in the second patient subgroup and became nonsignificant, yet OS remained statistically significant. At the final analysis, the OS curves also came closer, resulting in nonsignificant differences for OS as well. Furthermore, if one can find a sufficiently sensitive subpopulation, for instance during a phase II clinical trial, it may be possible to power a subsequent phase III trial by defining a more focused study population by enriching for patients with specific characteristics (16–18).

Funding

SM’s work was supported in part by a Grant-in-Aid for Scientific Research C-24500345 from the Ministry of Health, Labour, and Welfare of Japan. GY’s work was supported in part by a grant (705613) from the Research Grants Council of Hong Kong.

Supplementary Material

Supplementary Data

Authors had full responsibility for design of the study, the collection of the data, analysis and interpretation of the data, the decision to submit the manuscript for publication, and the writing the manuscript. There are no conflicts of interest in this study. We thank the senior editor and the referees for their thoughtful and constructive comments and suggestions.

References

  • 1. Broglio K R, Berry D A. Detecting an overall survival benefit that is derived from progression-free survival. J Natl Cancer Inst. 2009;101(23):1642–1649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Booth CM, Eisenhauer EA. Progression-free survival: meaningful or simply measurable? J Clin Oncol. 2012;30(10):1030–1033. [DOI] [PubMed] [Google Scholar]
  • 3. Saad ED, Katz A, Buyse M. Overall survival and post-progression survival in advanced breast cancer: a review of recent randomized clinical trials. J Clin Oncol. 2010;28(11):1958–1962. [DOI] [PubMed] [Google Scholar]
  • 4. Korn EL, Freidlin B, Abrams JS. Overall survival as the outcome for randomized clinical trials with effective subsequent therapies. J Clin Oncol. 2011;29(17):2439–2442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Cortes J, O’Shaughnessy J, Loesch D, et al. Eribulin monotherapy versus treatment of physician’s choice in patients with metastatic breast cancer (EMBRACE): a phase 3 open-label randomised study. Lancet. 2011;377(9769):914–923. [DOI] [PubMed] [Google Scholar]
  • 6. Heinemann V, von Weikersthal LF, Decker T, et al. FOLFIRI plus cetuximab versus FOLFIRI plus bevacizumab as first-line treatment for patients with metastatic colorectal cancer (FIRE-3): a randomised, open-label, phase 3 trial. Lancet Oncol. 2014;15(10):1065–1075. [DOI] [PubMed] [Google Scholar]
  • 7. Twelves C, Cortes J, Vahdat L, et al. Efficacy of eribulin in women with metastatic breast cancer: a pooled analysis of two phase 3 studies. Breast Cancer Res Treat. 2014;148(3):553–561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria: 2014; http://www.R-project.org/. [Google Scholar]
  • 9. Buyse M, Burzykowski T, Carroll K, et al. Progression-free survival is a surrogate for survival in advanced colorectal cancer. J Clin Oncol. 2007;25(33):5218–5224. [DOI] [PubMed] [Google Scholar]
  • 10. Giessen C, Laubender RP, Ankerst DP, et al. Progression-free survival as a surrogate endpoint for median overall survival in metastatic colorectal cancer: literature-based analysis from 50 randomized first-line trials. Clin Cancer Res. 2013;19(1):225–235. [DOI] [PubMed] [Google Scholar]
  • 11. Bast RC, Thigpen JT, Arbuck SG, et al. Clinical trial endpoints in ovarian cancer: report of an FDA/ASCO/AACR Public Workshop. Gynecol Oncol. 2007;107(2):173–176. [DOI] [PubMed] [Google Scholar]
  • 12. Halabi S, Rini B, Escudier B, et al. Progression-free survival as a surrogate endpoint of overall survival in patients with metastatic renal cell carcinoma. Cancer. 2014;120(1):52–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Burzykowski T, Buyse M, Piccart-Gebhart MJ, et al. Evaluation of tumor response, disease control, progression-free survival, and time to progression as potential surrogate end points in metastatic breast cancer. J Clin Oncol. 2008;26(12):1987–1992. [DOI] [PubMed] [Google Scholar]
  • 14. Soria JC, Massard C, Le Chevalier T. Should progression-free survival be the primary measure of efficacy for advanced NSCLC therapy? Ann Oncol. 2010;21(12):2324–2332. [DOI] [PubMed] [Google Scholar]
  • 15. Chibaudel B, Bonnetain F, Shi Q, et al. Alternative end points to evaluate a therapeutic strategy in advanced colorectal cancer: evaluation of progression-free survival, duration of disease control, and time to failure of strategy--an Aide et Recherche en Cancerologie Digestive Group Study. J Clin Oncol. 2011;29(31):4199–4204. [DOI] [PubMed] [Google Scholar]
  • 16. Yin G. Clinical Trial Design: Bayesian and Frequentist Adaptive Methods. Wiley: Hoboken: 2012; 297–309. [Google Scholar]
  • 17. Freidlin B, McShane LM, Polley MY, et al. Randomized phase II trial designs with biomarkers. J Clin Oncol. 2012;30(26):3304–3309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Morita S., Yamamoto H, Sugitani Y. Biomarker-based Bayesian randomized phase II clinical trial design to identify a sensitive patient subpopulation. Stat Med. 2014;33(23):4008–4018. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from JNCI Journal of the National Cancer Institute are provided here courtesy of Oxford University Press

RESOURCES