Skip to main content
JNCI Journal of the National Cancer Institute logoLink to JNCI Journal of the National Cancer Institute
. 2009 Nov 9;101(23):1642–1649. doi: 10.1093/jnci/djp369

Detecting an Overall Survival Benefit that Is Derived From Progression-Free Survival

Kristine R Broglio 1, Donald A Berry 1,
PMCID: PMC4137232  PMID: 19903805

Abstract

Background

Whether progression-free survival (PFS) or overall survival (OS) is the more appropriate endpoint in clinical trials of metastatic cancer is controversial. In some disease and treatment settings, an improvement in PFS does not result in an improved OS.

Methods

We partitioned OS into two parts and expressed it as the sum of PFS and survival postprogression (SPP). We simulated randomized clinical trials with two arms that had respective medians for PFS of 6 and 9 months. We assumed no treatment difference in median SPP. We found the probability of a statistically significant benefit in OS for various median SPP and observed P values for PFS. We compared the sample sizes required for PFS vs OS for various median SPP. We compare our results with the literature regarding surrogacy of PFS for OS by use of the correlation between hazard ratios for PFS and OS. All statistical tests were two-sided.

Results

For a trial with observed P value for improvement in PFS of .001, there was a greater than 90% probability for statistical significance in OS if median SPP was 2 months but less than 20% if median SPP was 24 months. For a trial requiring 280 patients to detect a 3-month difference in PFS, 350 and 2440 patients, respectively, were required to have the same power for detecting a real difference in OS that is carried over from the 3-month benefit in PFS when the median SPP was 2 and 24 months.

Conclusions

Addressing SPP is important in understanding treatment effects. For clinical trials with a PFS benefit, lack of statistical significance in OS does not imply lack of improvement in OS, especially for diseases with long median SPP. Although there may be no treatment effect on SPP, its variability so dilutes the OS comparison that statistical significance is likely lost. OS is a reasonable primary endpoint when median SPP is short but is too high a bar when median SPP is long, such as longer than 12 months.


CONTEXT AND CAVEATS

Prior knowledge

It is still controversial as to whether progression-free survival (PFS) or overall survival (OS) is the most appropriate endpoint in clinical trials of metastatic cancer.

Study design

Clinical trials with two arms having respective medians for PFS of 6 and 9 months were simulated. OS was the sum of PFS and survival postprogression (SPP). Probabilities of a benefit in OS were determined for various median SPP, by assuming no treatment-related difference in SPP, and for observed P values for PFS. Sample sizes required for various PFS and OS values were determined.

Contribution

OS was a reasonable primary endpoint when median SPP was short but was too high a bar when median SPP was long (eg, longer than 12 months).

Implications

As therapies for metastatic cancer improve, SPP would be expected to increase, which may decrease the utility of OS as a clinical endpoint.

Limitations

Simulations considered a specific difference in median PFS, accrual rate, and follow-up time. PFS and SPP were assumed to follow exponential distributions. The assumption that there was no difference in SPP may not be correct in a particular circumstance.

From the Editors

Whether progression-free survival (PFS) or overall survival (OS) is the more appropriate endpoint in clinical trials in patients with metastatic cancer is controversial (15). In some disease and treatment settings, an improvement in PFS does not result in an improved OS. Although there is general agreement that OS is more relevant clinically, powering a trial to show an OS benefit can be challenging. The added statistical power requires trade-offs, which include slowing drug development, increasing the cost of medical care, and using patient resources that might be better allocated by investigating other therapies.

The standard assumption in determining sample size in a clinical trial is that the hazard ratio associated with treatment is constant over time. This assumption has no empirical justification. The hazard of death is likely to change after disease progression, for example. So, if there is a difference in PFS between treatments, this standard assumption is flawed. As an illustration of this problem, consider a randomized clinical trial that is comparing two treatment regimens. The experimental treatment leads to a statistically significant improvement in PFS but not in OS. Does this observation mean there is no OS benefit? How likely is such an observation if, in fact, there is an OS benefit? Can a PFS benefit be taken to imply an OS benefit even though the observed OS benefit is not statistically significant?

We address these questions by use of a simple device. We partition OS into two parts by expressing it as the sum of PFS and survival postprogression (SPP) [ie, OS = PFS + (OS − PFS)]. The standard definition of “progression” includes death from any cause and so the progression event may be death. If the progression event is death, then SPP equals 0; otherwise, SSP is greater than 0. SPP is a time-to-event measure that can be analyzed by using standard methods of survival analysis.

Patients in clinical trials of metastatic cancer usually continue on their assigned regimen until they show signs of disease progression. At progression, there are a variety of choices open to the patient and his or her physician, including staying on the same regimen, crossing over to another treatment arm in the trial, or switching to another regimen entirely such as participating in another clinical trial or receiving no additional therapy. The continuation strategy at the time of progression is not randomized; therefore, it is impossible to compare the efficacy of such strategies in an unbiased fashion. All continuation strategies are biased. An extreme example is “no therapy.” A patient whose progression event is death will receive no therapy and so no therapy will perform badly. But restricting the analysis to patients who are alive at progression is also biased. For example, patients with poor performance status at progression have poor prognoses and so are likely to receive no therapy or only palliative therapy.

The heterogeneity and lack of randomization in postprogression strategies make it difficult to compare the original randomized regimens on the basis of OS.

The primary focus of the hypothetical clinical trial setting in this article is OS. However, PFS contains information about OS. The purpose of our study was to describe the critical role of SPP in judging the utility of PFS for making inferences about OS. We addressed the relationship between PFS and OS and possible surrogacy of PFS for OS from three perspectives: clinical trial design, clinical trial analysis, and meta-analysis. From the perspective of clinical trial design, we determine the power available for detecting a benefit in OS for a randomized trial designed to detect a benefit in PFS, as well as the sample size necessary to have sufficient power to detect a benefit in OS. From the perspective of clinical trial analysis, we calculate the probability of detecting a statistically significant difference in OS on the basis of an observed statistical significance of PFS. To show the perspective of a meta-analysis meant to address whether PFS is a surrogate for OS, we mimic the literature convention of presenting assessments of the empirical relationship between hazard ratios for PFS and OS within subsets of trials (610).

Methods

We used simulation methods to generate clinical trials having particular characteristics of PFS and SPP. We assumed a treatment benefit in PFS and no treatment effect on SPP. Therefore, the treatment-related improvement in PFS carries over exactly to a treatment-related improvement in OS.

We considered a clinical trial with equal numbers of patients randomly assigned to two arms. In the simulations, the control arm had median PFS of 6 months and the experimental arm had median PFS of 9 months. Both distributions were exponential. Because OS is the sum of PFS and SPP, we simulated each patient’s OS by adding to it an SPP that was generated from an exponential distribution that was the same for both treatments. Patients were accrued at the rate of 30 patients per month, with arrival times generated from a uniform distribution. We assumed an additional 9 months of follow-up after accrual was complete. The total trial duration was the accrual period plus 9 months. Patient's results were censored as of that time. To have power of 80%, 85%, or 90% and a two-sided statistical significance level of .05 for PFS required accruing 280, 310, or 364 patients, respectively.

For each scenario, we simulated 50 000 trials. At the end of each simulated trial, we compared treatments with respect to PFS and OS by use of a log-rank test. We estimated the probability of statistical significance as the number of simulated trials having a two-sided P value of less than .05 divided by 50 000. To determine the sample size necessary to have sufficient power to detect an OS improvement, we used trial and error, adjusting the sample size until the desired probabilities for detecting OS were achieved. For statistical significance in OS given statistical significance in PFS, the denominator was the number of trials for which PFS was statistically significant at the specified level. We carried out simulations for a variety of median SPP values.

For calculations mimicking the meta-analysis methodology of Sherrill et al. (6), for example, we considered a set of 67 trials (which may also be viewed as 67 clusters of clinical sites within a smaller number of trials) and simulated the results for each trial. We assigned each trial a median PFS for the control regimen by selecting randomly between 4 and 18 months. We selected the logarithm of the hazard ratio for PFS for the experimental regimen vs the control regimen from a normal distribution, with a mean of 0 and SD of 0.35. This distribution gave most values for PFS hazard ratios between 0.5 and 2.0. Sample sizes had a log-normal distribution, with a mean of 795 and SD of 845. This distribution gave most total sample sizes between 100 and 3000. The other assumptions were as described for the single trial. We calculated the hazard ratios for PFS and OS for each trial by use of a proportional hazards model and plotted the 67 pairs of hazard ratios. We estimated the association between the estimated hazard ratios for PFS and OS from a linear regression model weighted by the number of patients in each trial. Although the sample size and PFS times for each trial were fixed, the median SPP was varied (0, 3, 6, 9, 12, 18, and 24 months). The meta-analysis results were based on a single simulation of the 67 trials.

All simulations were performed in R, version 2.4.1 (11). All statistical tests were two-sided.

Results

We present three example simulations of PFS and OS (Figure 1). Each row is a single simulated trial with 280 patients, designed to have 80% power to detect the stated difference in PFS. The leftmost plot shows PFS, and the other plots in the row show OS and differ only in median SPP (6, 12, and 18 months). The hazard ratios and P values shown are those observed for the single plot.

Figure 1.

Figure 1

Three typical examples of Kaplan-Meier progression-free survival (PFS) curves and associated overall survival (OS) curves from the simulations. Each row of plots is an example of a single simulated trial. The leftmost plot shows PFS, simulated to have median PFS of 6 months (control) and 9 months (experimental). The other three plots in each row show OS and differ only in median SPP (6, 12, and 18 months). The hazard ratios and P values shown are those observed for the single simulated example. These three examples were typical of the simulations carried out. PFS and OS were compared by the log-rank test, and all statistical tests were two-sided. HR = hazard ratio; med = median. Solid line = control arm; dashed line = experimental arm.

These examples showed the variability in PFS and OS from trial to trial and also the changes in the difference between treatment arms as the median SPP increases. When PFS was statistically significant, individual simulations tended to show a statistically significant benefit in OS for shorter median SPP times. However, even when there was a true underlying PFS benefit, the OS curves may overlap or the direction of the effect may be reversed depending on the actual observed difference in PFS, the simulated SPP, and sampling variability.

The first example in Figure 1 shows a statistically significant benefit for PFS, but not for OS, regardless of median SPP. For larger median SPP, the OS curves come closer together and the hazard ratio gets closer to 1.00. In the second example, PFS is not statistically significant, with P = 0.097. OS happens to be significant when median SPP is 6 months, but significance is lost when median SPP is 12 or 18 months. In the third example, PFS is significantly longer for the experimental arm than for the control arm. The comparison is still significant when median SPP is 6 months. Significance is (barely) lost when median SPP is 12 months, and the comparison is actually reversed when median SPP is 18 months, with OS favoring the control arm. All three examples were simulated to have a 3-month OS advantage for the experimental arm, with this advantage stemming from the corresponding improvement in (simulated) PFS.

Summaries of the OS results from the simulations are shown in Tables 1 and 2. Table 1 shows the median and 95% interval for OS hazard ratios by median SPP. The 95% interval of the OS hazard ratios extends from the 2.5 percentile to the 97.5 percentile for 50 000 simulations. The median hazard ratio for the simulated values of PFS was 0.67. The median hazard ratio for PFS was 0.67 in the simulations. The median hazard ratio for OS ranged from 0.69 (95% interval = 0.51–0.91) when median SPP was 2 months to 0.76 (95% interval = 0.42–1.33) when median SPP was 24 months. Table 2 shows the median and 95% interval for the OS hazard ratios and P values as well as the power available for detecting a difference in OS by the power for the difference in PFS. As above, the 95% interval extends from the 2.5 percentile to the 97.5 percentile for 50 000 simulations. For a trial with 80% power for detecting PFS, the median P value for OS was .013 (95% interval = <.001–.600) when median SPP was 2 months, .199 (95% interval = .001–.946) when median SPP was 12 months, and .322 (95% interval = .005–.971) when median SPP was 24 months.

Table 1.

Summary of hazard ratios for overall survival (OS)*

Median SPP, mo Median OS, HR (95% interval)
2 0.687 (0.514–0.909)
4 0.710 (0.517–0.966)
6 0.727 (0.511–1.023)
8 0.736 (0.502–1.068)
10 0.746 (0.491–1.100)
12 0.749 (0.479–1.140)
14 0.752 (0.470–1.174)
16 0.758 (0.462–1.207)
18 0.759 (0.448–1.241)
20 0.762 (0.440–1.277)
22 0.763 (0.428–1.304)
24 0.763 (0.416–1.333)
*

The 95% interval of the OS hazard ratio values extends from the 2.5 percentile to the 97.5 percentile for 50 000 simulations. HR = hazard ratio; SPP = survival postprogression.

Table 2.

Summary of overall survival (OS) P values and power when the power for progression-free survival (PFS) is 80%, 85%, and 90%*

Median SPP, mo 80% power for PFS
85% power for PFS
90% power for PFS
Median P (95% interval) OS power Median P (95% interval) OS power Median P (95% interval) OS power
2 .013 (<.001–.600) 0.696 .009 (<.001–.498) 0.748 .004 (<.001–.351) 0.822
4 .041 (<.001–.814) 0.535 .030 (<.001–.750) 0.589 .016 (<.001–.627) 0.672
6 .082 (.002–.888) 0.415 .063 (<.001–.867) 0.461 .040 (<.001–.811) 0.540
8 .125 (.001–.915) 0.335 .103 (.003–.905) 0.370 .069 (.002–.876) 0.441
10 .167 (.001–.935) 0.279 .141 (.001–.929) 0.317 .104 (.004–.901) 0.370
12 .199 (.001–.946) 0.244 .170 (.001–.937) 0.276 .132 (.001–.923) 0.322
14 .227 (.002–.951) 0.217 .202 (.001–.947) 0.238 .160 (.001–.936) 0.286
16 .257 (.002–.956) 0.194 .229 (.002–.949) 0.214 .190 (.001–.939) 0.258
18 .276 (.003–.957) 0.183 .253 (.002–.952) 0.195 .212 (.002–.946) 0.234
20 .298 (.003–.963) 0.167 .272 (.003–.956) 0.182 .235 (.002–.951) 0.214
22 .310 (.004–.966) 0.156 .289 (.003–.962) 0.169 .251 (.002–.952) 0.197
24 .322 (.005–.971) 0.146 .304 (.003–.966) 0.160 .265 (.003–.958) 0.188
*

The 95% interval extends from the 2.5 percentile to the 97.5 percentile of the OS P values from 50 000 simulations. SPP = survival postprogression.

We next investigated the probability of finding a statistically significant difference in OS for various lengths of median SPP for trials designed to have a power of 80%, 85%, or 90% to detect an improvement in median PFS of 6–9 months (Figure 2 and Table 2). As median SPP increases, the power for detecting a statistically significant difference in OS decreases.

Figure 2.

Figure 2

Probability of statistically significant differences in overall survival (OS) as a function of median survival postprogression (SPP). The three curves were indexed by the power for detecting the actual median progression-free survival (PFS) benefit that was simulated, 6 vs 9 months (ie, powers of 90%, 85%, and 80%).

If median SPP is only 2 months, there is a strong association between PFS and OS. Thus, for a study designed to have 80% power to detect the stated difference in PFS, the probability of also estimating a statistically significant difference in OS was high (ie, 70%). Similarly, for studies with 85% and 90% power, the respective probabilities of detecting a statistically significant difference in OS were 75% and 82%. If median SPP is 12 months, the respective probabilities of detecting a statistically significant OS difference decreased substantially to 24%, 28%, and 32% for powers of 80%, 85%, and 90% for PFS, respectively. If median SPP was 24 months, the respective power for OS decreased to 15%, 16%, and 19%.

We next investigated the approximate total trial sample sizes necessary to achieve sufficient power for detecting a statistically significant difference in OS (Figure 3). This hypothetical study requires a total of 280 patients to have 80% power to detect a difference in PFS. For this same scenario to have 80% power to detect a difference in OS, also at the level of a P value less than .05, 350 patients were required when median SPP was 2 months, 600 patients when median SPP was 6 months, 1050 when median SPP was 12 months, and 2440 when median SPP was 24 months. For the primary objective of PFS, this study would be complete in 18 months. However, for a primary objective of OS, this study would require 29, 44, and 90 months for median SPP values of 6, 12, and 24 months, respectively.

Figure 3.

Figure 3

Sample sizes required for detecting a statistically significant difference in overall survival by median survival postprogression (SPP). The three curves were indexed by the power for overall survival (ie, powers of 90%, 85%, and 80%).

On the basis of a trial with 80% power to detect a 3-month improvement in median PFS, we calculated the associated probability of finding a statistically significant improvement in OS (Table 3). If the log-rank statistic for PFS was barely statistically significant (P = .05), the probability of a statistically significant improvement in the OS ranged from 33% when median SPP was 2 months to 8% when median SPP was 24 months. If the log-rank statistic for PFS was highly statistically significant (eg, P < .001) and SPP was short (median of 2 months), then there was a greater than 90% probability for statistical significance in OS. However, when median SPP was as long as 24 months, this probability was less than 20%.

Table 3.

Probability of finding a statistically significant benefit (two-sided P value of .05 by the log-rank test) in overall survival (OS) depending on the observed P value for progression-free survival (PFS)*

Median SPP, mo Probability
P  = .05 P  = .03 P  = .01 P  = .001
2 0.33 0.46 0.75 0.97
4 0.23 0.29 0.46 0.73
6 0.18 0.23 0.33 0.54
8 0.15 0.19 0.27 0.43
10 0.13 0.15 0.22 0.35
12 0.12 0.14 0.19 0.30
14 0.11 0.13 0.17 0.26
16 0.10 0.12 0.15 0.23
18 0.09 0.11 0.14 0.21
20 0.08 0.11 0.13 0.20
22 0.08 0.11 0.13 0.19
24 0.08 0.10 0.12 0.18
*

This probability depends on the median SPP (added to PFS) and on the observed P value of treatment effect for PFS. Median PFS in the control treatment arm was 6 months.

We also explored the association between the estimated hazard ratios for PFS and OS for a simulated meta-analysis by assuming various median SPP, including 0, 3, 6, 9, 12, 18, and 24 months (Figure 4). We present results of 67 hypothetical trials, each with fixed sample size and PFS times; we varied only median SPP. Each trial is represented by an open circle, with area proportionate to the sample size of the trial.

Figure 4.

Figure 4

Association between progression-free survival (PFS) and overall survival (OS) for a single simulation of 67 trials. Each study had a randomly selected sample size and PFS hazard ratio, which remains fixed across scenarios while median survival postprogression (SPP) times were allowed to vary (0, 3, 6, 9, 12, 18, and 24 months). Hazard ratios (HRs) for PFS and OS were estimated with a proportional hazards model, and the correlation was estimated from a linear regression model weighted by the number of patients in each trial. The size of the circle is relative to the total sample size of the study. The diagonal line is the fitted weighted linear regression line.

If median SPP is 0, then there is perfect association between the hazard ratio for PFS and for OS (correlation coefficient R = 1.00). As the median SPP increases, the correlation between the two hazard ratio estimates weakened (eg, R = .88 when median SPP = 9 months; R = .83 when median SPP = 12 months; and R = .57 when the median SPP = 24 months). Thirty-four of the 67 trials showed statistical significance for PFS. Of these 34 trials, 28 (82%), 23 (68%), 17 (50%), 17 (50%), 14 (41%), and 9 (26%) also showed statistical significance for OS at respective median SPPs of 3, 6, 9, 12, 18, and 24 months.

Qualitatively, results of this single simulation are typical of other simulations. In particular, the association between PFS and OS hazard ratios weakens as median SPP increases. Similarly, the proportion of trials with a statistically significant OS hazard ratio decreases with increasing median SPP.

Discussion

Our study has a number of important implications. The first and simplest is that it shows the importance of addressing SPP in a randomized clinical trial. Typically, there will be no statistical difference in SPP by treatment group. Second, we showed that a statistical benefit in PFS will likely be lost in OS when the median SPP is moderate (comparable to the control median PFS) and very likely lost when the median SPP is large. Third, powering trials to show a benefit in OS is very difficult in a disease with long median SPP. Finally, although we did not focus on whether PFS can be considered to be a surrogate endpoint for OS, we show that treatment comparisons of SPP can elucidate this question.

Our study has several limitations. Our simulations considered a specific difference in median PFS, accrual rate, and follow-up time. However, our qualitative conclusions apply generally. Similarly, our qualitative conclusions do not depend on the assumed benefit of experimental treatment or on the assumed sample sizes. Also, we took PFS and SPP to follow exponential distributions. This assumption has appeal for its interpretability and simplicity, but our overall conclusions apply for other distributions as well.

Our most important assumption was that there was no treatment difference in SPP. This assumption may not be correct in a particular circumstance, but it has intuitive appeal and empirical justification. Protocol therapy is not usually continued after progression; it is reasonable to expect that a therapy no longer being used is no longer benefiting the patient. And a therapy that is effective in delaying progression is unlikely to have a negative residual effect after it is withdrawn. Moreover, given the current state of treatment for metastatic cancer, there are few if any options for prolonging survival after progression, regardless of initial therapy.

The assumption of no differential treatment effect after progression has empirical justification over a variety of diseases and settings (12,13). Situations in which SPP depends on treatment occur when there are biases in the assessment of progression. A possible example is when unblinded investigators assess progression. Such biases can be addressed with a blinded central review, including when it is based on only a random sample of patients.

Two additional types of bias that are associated with SPP are possible in a randomized trial involving an experimental cancer drug vs a standard treatment. Both types of bias (usually) favor the standard treatment, and both can be assessed in advance and prevented by using an appropriate clinical trial design. One is crossing over, which applies whether or not the investigators are blinded. Suppose the experimental regimen is more effective than the standard. Patients who have been assigned to the experimental regimen are switched to a less effective therapy (provided they are well enough to be treated) or to no therapy at the time of disease progression. Some patients whose disease is progressing on the standard regimen will be switched to the experimental regimen. Because we assumed that the experimental drug is effective, SPP for those patients could be longer on the standard arm than on the experimental arm. So, with regard to OS, the randomized comparison is really that of the up-front experimental regimen vs experimental regimen delayed until disease progression. The survival benefits of the up-front experimental regimen are diminished by the differential effect in SPP. In the extreme, OS will be the same in both treatment arms—even though the experimental drug is more effective.

The second type of bias applies when the investigators are unblinded, which is typical in cancer drug trials. It occurs when the experimental drug is less effective than the standard therapy. Investigators may have undue faith in the benefit of the experimental regimen and continue some patients on that regimen even after disease progression. Such patients would be deprived of potentially effective next-line therapy, and SPP would be longer on the standard regimen. So just as for the previous type of bias, this one usually favors the standard treatment.

In our study, we found that when there was a true treatment benefit in PFS and no treatment effect on SPP, the probability of also observing a statistically significant difference in OS depended on the length of median SPP and the magnitude of the observed PFS difference. Patient heterogeneity and variability in treatment decisions made after disease progression diluted the OS differences between treatment arms. This dilution was greater for increasing values of median SPP. When the median SPP was small, there was usually a statistically significant benefit in OS when there was a statistically significant treatment benefit in PFS. Longer periods of SPP added randomness diluting treatment effect and, making statistical significance in OS decreasingly likely.

We considered a clinical trial powered to detect a substantial treatment benefit in median PFS from 6 to 9 months. To detect a difference in OS with the same power required sample sizes that were more than twice as large when the median SPP was only 6 months and nearly 10 times as large when the median SPP was 24 months. In the latter circumstance, OS is an unrealistic primary endpoint. It is reasonable to expect that some benefit in PFS will carry over into OS, but when there is no benefit in survival after progression, which we claim to be typical in metastatic cancer, insisting on statistical significance in OS is too high a hurdle. Powering studies of metastatic cancer for OS rather than PFS can result in very large trials and a much longer time to develop drugs.

Similarly, when median SPP was short, PFS and OS hazard ratios were highly correlated. In a meta-analysis setting, the correlation weakens as median SPP increases. These results are consistent with conclusions from actual meta-analyses investigating the possible surrogacy of PFS for OS. The simulated meta-analysis results when the median SPP was longer than 12 months are consistent with those of published meta-analyses in metastatic breast cancer. Specifically, Burzykowski et al. (14) analyzed individual level data from 11 trials of first-line treatment and reported a weak correlation coefficient of .48 between hazard ratios for PFS and OS. Hackshaw et al. (8) considered published reports of 42 randomized trials of first-line treatment with 5-fluorouracil, adriamycin, and cyclophosphamide or with 5-fluorouracil, epirubicin, and cyclophosphamide and reported a correlation coefficient of .71 between time to progression and OS. Sherrill et al. (6) considered published reports of 67 trials and reported a correlation coefficient of .54 between hazard ratios for PFS and hazard ratios for OS.

The simulated meta-analysis results when the median SPP was less than 6 months are consistent with those of meta-analyses in colon cancer. Buyse et al. (9) reported a correlation coefficient of .99 between OS and PFS in trials of advanced colorectal cancer. They concluded that PFS should be considered as a surrogate for OS in this disease. The median SPP of non–small cell lung cancer is intermediate between metastatic breast and advanced colon cancer. When Buyse et al. (15) compared docetaxel with vinca alkaloids in trials in non–small cell lung cancer, they found that hazard ratios of PFS and of OS were highly correlated (R = .85). This result is similar to our simulated meta-analysis result when median SPP is 9 months.

We did not specifically address whether PFS is a reasonable surrogate for OS. We focused instead on PFS and asked whether it was reasonable to expect that a treatment benefit in PFS carried over to OS. Our answer was “It depends.” For clinical trials with a PFS benefit, a lack of statistical significance in OS does not imply a lack of improvement in OS. For diseases with long median SPP, the variability in SPP so dilutes the OS comparison that statistical significance is likely lost. Thus, OS is a reasonable primary endpoint when median SPP is short, perhaps less than 6 months, but is too high a hurdle when median SPP is long, such as when median SPP is longer than 12 months.

As the clinician's armamentarium of salvage therapies grows and becomes more varied, SPP will get longer. When SPP gets sufficiently long, oncology researchers and regulators will have to drop OS as the primary endpoint in clinical trials.

Funding

There was no sponsor or funding agency.

Footnotes

Authors had full responsibility for design of the study, collection of the data, analysis and interpretation of the data, decision to submit the manuscript for publication, and the writing of the manuscript.

The authors thank Lee Ann Chastain for her editorial support.

References

  • 1.Albain KS. Discussion presented at 2008 American Society of Clinical Oncology Annual Meeting. Chicago, IL: June 2008. Adding a new agent to the old in metastatic breast cancer: progress, promise, and challenges. http://www.asco.org//ASCO/Abstracts+%26+Virtual+Meeting/Virtual+Meeting?&vmview=vm_search_results_view&selectedConfs=55&SearchFilter=Speaker&fromView=vm_meeting_tracks_view&SearchTerm=albain. Accessed April 3, 2009. [Google Scholar]
  • 2.Mayfield E. Progression-free survival: patient benefit or lower standard. Life Raft Group. 2007 http://www.liferaftgroup.org/news_sci_articles/pfs_benefit.html. Accessed March 23, 2009. [Google Scholar]
  • 3.Allison M. Trouble at the office [abstract 23] Nat Biotechnol. 2008;26(9):967–969. doi: 10.1038/nbt0908-967. [DOI] [PubMed] [Google Scholar]
  • 4.Chakravarty A, Sridhara R. Use of progression-free survival as a surrogate marker in oncology trials: some regulatory issues. Stat Methods Med Res. 2008;17(5):515–518. doi: 10.1177/0962280207081862. [DOI] [PubMed] [Google Scholar]
  • 5.Panageas KS, Ben-Porat L, Dickler MN, Chapman PB, Schrag D. When you look matters: the effect of assessment schedule on progression-free survival. J Natl Cancer Inst. 2007;99(6):428–432. doi: 10.1093/jnci/djk091. [DOI] [PubMed] [Google Scholar]
  • 6.Sherrill B, Amonkar MM, Wu Y, et al. Disease progression as a predictor of overall survival in metastatic breast cancer: a meta-analysis [abstract 564] Eur J Cancer. 2008;6(7):216. [Google Scholar]
  • 7.Burzykowski T, Buyse M, Yothers G, Sakamoto J, Sargent D. Exploring and validating surrogate endpoints in colorectal cancer. Lifetime Data Anal. 2008;14(1):54–64. doi: 10.1007/s10985-007-9079-4. [DOI] [PubMed] [Google Scholar]
  • 8.Hackshaw A, Knight A, Barrett-Lee P, Leonard R. Surrogate markers and survival in women receiving first-line combination anthracycline chemotherapy for advanced breast cancer. Br J Cancer. 2005;93(11):1215–1221. doi: 10.1038/sj.bjc.6602858. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Buyse M, Burzykowski T, Carroll K, et al. Progression-free survival is a surrogate for survival in advanced colorectal cancer. J Clin Oncol. 2007;25(33):5218–5224. doi: 10.1200/JCO.2007.11.8836. [DOI] [PubMed] [Google Scholar]
  • 10.Tang PA, Bentzen SM, Chen EX, Siu LL. Surrogate end points for median overall survival in metastatic colorectal cancer: literature-based analysis from 39 randomized controlled trials of first-line chemotherapy. J Clin Oncol. 2007;25(29):4562–4568. doi: 10.1200/JCO.2006.08.1935. [DOI] [PubMed] [Google Scholar]
  • 11.R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2006. http://cran.r-project.org/doc/manuals/refman.pdf. Accessed April 2, 2009. [Google Scholar]
  • 12.Bowater RJ, Bridge LJ, Lilford RJ. The relationship between progression-free and post-progression survival in treating four types of metastatic cancer. Cancer Lett. 2008;262(1):48–53. doi: 10.1016/j.canlet.2007.11.032. [DOI] [PubMed] [Google Scholar]
  • 13.Berry D. Chicago, IL: Presented at 2008 American Society of Clinical Oncology Annual Meeting; June 2008. Discussion of “Prediction of survival benefits from progression-free survival in patients with advanced non-small cell cancer: evidence from a pooled analysis of 2,838 patients randomized in 7 trials,” by Marc E. Buyse. [Google Scholar]
  • 14.Burzykowski T, Buyse M, Piccart-Gebhart MJ, et al. Evaluation of tumor response, disease control, progression-free survival, and time to progression as potential surrogate end points in metastatic breast cancer. J Clin Oncol. 2008;26(12):1987–1992. doi: 10.1200/JCO.2007.10.8407. [DOI] [PubMed] [Google Scholar]
  • 15.Buyse ME, Squifflet P, Laporte S, et al. Prediction of survival benefits from progression-free survival in patients with advanced non small cell lung cancer: evidence from a pooled analysis of 2,838 patients randomized in 7 trials. J Clin Oncol. 2008;26(15S):8019. [Google Scholar]

Articles from JNCI Journal of the National Cancer Institute are provided here courtesy of Oxford University Press

RESOURCES