Phase II designs for anticancer botanicals and supplements

Andrew J Vickers

. Author manuscript; available in PMC: 2009 Sep 1.

Published in final edited form as: J Soc Integr Oncol. 2009 Winter;7(1):35–40.

Phase II designs for anticancer botanicals and supplements

Andrew J Vickers ¹

PMCID: PMC2736093 NIHMSID: NIHMS121690 PMID: 19476733

Abstract

The purpose of a Phase II trial is to determine whether an anticancer agent is sufficiently promising to take forward to a definitive, randomized, Phase III study. Traditional Phase II trials use tumor response as an endpoint, defined as a 50% or greater decrease in tumor size. Anticancer botanicals and supplements are unlikely to bring about rapid tumor regression, even if they do extend survival. Accordingly, response needs to be defined in terms of survival, such as being progression-free at 6 months. Such an approach requires historical data on the expected survival rate in the absence of the botanical or supplement. We present a simple Phase II design for botanicals and supplements that is based on appropriate use of historical data, incorporating adjustment for both sampling variation and case mix. The basic principle is to use a historical cohort to generate a statistical prediction model, use this to predict results of patients in the Phase II study and then compare the predictions to the observed results. Such a design asks whether patients treated by the new agent are doing better than expected; if so, this suggests that the agent should be tested further in Phase III trials.

Keywords: research design, complementary medicine

Introduction

Phase III trials of anticancer agents are often described as “definitive” on the grounds that they can determine one way or another if an agent is of clinical value: patients are randomized to the current best standard-of-care or to an alternative regimen including the new agent and followed for many years; if survival or recurrence rates are superior in the experimental arm, the new agent can be considered for clinical use. A typical Phase III trial accrues thousands of patients at dozens of different hospitals and costs many millions of dollars to perform. As such, there is no possibility that Phase III trials could be conducted for all agents of possible benefit for cancer.

The purpose of a Phase II trial is to determine whether an agent is sufficiently promising to take forward to a large and expensive Phase III study. The basic idea is to conduct a small, preliminary trial to see if the new agent has better results than expected. If this is not the case, then there would be little justification for spending time, effort and money on further research.

In this didactic paper, I will review designs for Phase II trials, discuss the drawbacks of traditional designs for anticancer botanicals and supplements (hereafter “CAM agents”) and recommend a simple alternative.

Traditional Phase II design

Phase II designs developed when the vast majority of novel agents were cytotoxics. A cytotoxic chemotherapy agent is only likely to have effects on patient survival or quality of life if it leads to tumor shrinkage. A typical Phase II design is therefore as follows:

Accrue patients with advanced disease, who have tumors visible on a scan, and are out of treatment options. Often patients with different types of cancer are accrued, as most cytotoxics are effective against a broad spectrum of tumors.
Treat a small number of patients, 25 – 60, taking scans before and after treatment.
Define patients as having a “response” if they experience a 50% or greater decrease in tumor size.
Calculate the proportion of responders. If the proportion is higher than a target prespecified during study design, then declare the agent worthy of further study in Phase III.

Perhaps the most important feature of a Phase II design is that there is a clear decision rule: the drug has to meet a certain target or it is not taken forward. Naturally, such a design can be extended in various ways. One of the most important is the use of an interim analysis designed to see if there is early evidence of an agent’s ineffectiveness; if so, the trial is stopped “for futility”. The rationale is that it is unethical to give patients an unproven, toxic agent if there are reasons to believe that it is not of benefit. An interim analysis might be implemented as follows:

Accrue 18 patients on trial.
If 2 or fewer patients respond, declare the agent “not promising” and stop the trial
If at least 3 patients respond, accrue an additional 25 patients
If 8 or more of the total 43 patients respond, declare the agent “worthy of further study”; if 7 or fewer patients respond, declare the agent as “not promising”.

This type of design is generally described as a “Simon two-stage design” after Richard Simon, a statistician at the National Cancer Institute. Two-stage trials are subdivided into “optimal” or “minimax” designs, depending on how exactly the sample size is calculated; the above example is an “optimal” design.

To calculate the appropriate numbers for the design, trialists have to select an “interesting” and an “uninteresting” response rate, which correspond to the alternative and null hypothesis respectively. The most typical numbers used are 20% and 5% respectively: for one in five patients to experience a large reduction in tumor size is clearly a good thing, particularly for the sort of advanced patients with few treatment options typically accrued in Phase II; on the other hand, it is unlikely that patients would go through the side-effects of chemotherapy if they had only a 1 in 20 chance of a response, especially as a response does not always lead to an improvement in survival.

Application of the traditional Phase II design to CAM agents

The obvious and immediate problem with using a Phase II design to test a CAM agent is that most such agents are unlikely to bring about rapid tumor regression, even if they do extend survival. Agents thought to have antiangiogenic properties, such as shark cartilage, may slow the spread of cancer but may not shrink existing tumors; agents purported to act via an immune stimulant mechanism, such as maitake, may not be effective in patients with tumors that are evaluable for response because such patients are normally late-stage patients and often have impaired immune function. Moreover, heightened immunity would not be expected to shrink away a tumor over a short course of treatment.

An alternative is to define response in terms of survival rather than tumor shrinkage. For example, instead of a responder being a patient with a 50% reduction in tumor size, a responder is defined by overall survival of at least a year. The “uninteresting” response rate for the null hypothesis might be obtained from the literature. For example, if a previous study reported that 30% of patients treated by standard therapy survived for one year, a Phase II design might use a 30% one-year survival as the null hypothesis and 50% as the survival rate deemed sufficiently promising to indicate a Phase III trial.

Using historical data to set the null hypothesis is becoming increasingly widespread for Phase II trials of conventional agents. This is for two reasons. First, targeted therapies are joining cytotoxic chemotherapy as the subject of Phase II trial. It is plausible that some of these agents may slow the growth of cancer – or prevent recurrence – without substantially reducing the size of a large tumor in a patient with advanced disease. Second, contemporary chemotherapy increasingly involves combinations of different agents. Imagine that an investigator wanted to know, for example, whether to conduct a Phase III trial comparing a taxane plus a platinum agent to the current standard of single agent platinum. A Phase II trial of taxane plus platinum with tumor response as an endpoint would only be interpretable if the response rate to platinum alone was known.

A systematic review of the use of historical data in Phase II trials

Last year, colleagues and I published a systematic review of Phase II trials in oncology that required historical data (see Further Reading). We first identified all Phase II trials published in one of two major oncology journals - Journal of Clinical Oncology or Cancer - between June 1, 2002, and June 1, 2005. We then used three objective criteria to determine whether each trial required historical data in order to determine the null. First, we included trials where the end point was survival, progression, or recurrence rate, on the grounds that expected event rates can only be assessed in the light of prior data. We also included trials where the null or alternative were explicitly justified with reference to historical data. Finally, we included trials where the specified null was a tumor response rate of more than 10%. This was on the grounds that it would be unusual to declare a treatment inactive for a response rate of 15% or more if the expected response rate in the absence of the investigational agent was close to zero, or if there were no standard therapies of demonstrated effectiveness. For all trials defined as requiring historical data, we also recorded the justification for the response rate chosen as the null hypothesis.

Of the 134 trials eligible for analysis, approximately half (70) were categorized as requiring historical data. This reemphasizes the point that use of historical data to set the null is not unique to CAM, but is common for conventional agents. Of these 70 papers, 32 (46%) gave absolutely no justification for their choice of null response rates and only 3 of the remainder gave an explicit justification in the methods section. Particularly interesting was that only 3 of these 9 declared the study agent worthy of further study compared to 46 of 56 of those that did not do so (5 studies had unclear results), a difference that was statistically significant.

In sum, we found that many Phase II studies of conventional agents require historical data to set the null, that most do not do so properly and are accordingly more likely to have positive results. Given these findings, we made a set of recommendations for Phase II trials requiring historical data. First, historical data should be cited and described in the methods. The description should include the type of study (phase II, phase III, cohort study), and details of any treatments given. Dates of accrual for the historical cohort should also be given, because, as treatment gradually improves over time, patients have a better prognosis. Second, a single estimate should be derived from the historical data: specifying only a range should be avoided, on the grounds that this offers no guidance as to the appropriate null. Thirdly, the relationship between the null and the historical data should be detailed clearly. For example, in the case of a novel chemotherapy agent added to a single-agent therapy, the null might rationally be set close to or slightly higher than the historical response rate. Alternatively, for a less toxic or more convenient version of a standard single-agent therapy, it would be reasonable if the null was slightly lower than the historical response rate.

A simple prediction method for Phase II trials: general considerations

Even if the null is carefully justified and the historical cohort clearly described, two problems with historical data remain. First, the historical cohort is subject to sampling variation. For example, imagine that a prior study of 40 patients reported that 20 survived at least 12 months. Although an investigator might be tempted to set a null of 50%, the 95% confidence interval around the historical estimate is 34% to 66%. Now imagine that the true historical survival rate was 40% which increased to 60% with the new agent. The investigator would compare 60% to 50% and declare the agent ineffective even though it was, in fact, highly beneficial.

The second problem is one of case mix. The patients in a Phase II study might be a highly selected population with less advanced disease and fewer comorbidities than a historical cohort of “all comers”. This would lead to better survival in the Phase II study, irrespective of any effects of the agent.

Based on the pioneering work of statistical colleagues at Memorial Sloan-Kettering Cancer Center (see Further Reading), we have developed a simple design for Phase II trials that can incorporate both the problem of case mix and that of sampling variation. The basic principle is to use the historical cohort to generate a statistical prediction model, use this to predict results of patients in the Phase II study and then compare the predictions to the observed results. In short, we ask “Are patients treated by the new agent doing better than expected?”; if so, this suggests that the agent might be of benefit and should be tested further in Phase III trial.

The first step is to obtain raw data from the historical cohort and to choose predictors that could be incorporated in a prediction model. In the case of localized prostate cancer, for example, these might be stage, grade and prostate-specific antigen; for breast cancer, tumor size, nodes, and hormone receptor status might be used. The historical cohort should be as large as possible, but should be chosen to be reasonably representative of the likely patients in the Phase II: a historical cohort consisting of all patients at a certain disease stage is obviously inappropriate if the Phase II will specify that only patients with good performance status will be accrued. Note that the methodology is contingent on good historical data: if no historical cohort is available, or the cohort is very small, or data on predictors are missing, our proposed methodology cannot be used.

The second step is to choose a binary endpoint, such as overall survival at one year, progression-free survival at 6 months, or recurrence within 18 months. The following criteria can be used as a guide: the treatment has the potential to affect the endpoint; there must be a reasonable number of events (e.g. if overall survival at one year is 98%, it will be difficult to show an improvement with treatment); the length of time should not be too long (a trial examining 5 year survival cannot take less than 5 years to complete); few patients should be censored in the historical cohort before the follow-up time. If progression is used as an endpoint, this should be formally defined in binary terms. This is because [rogression is typically evaluated by a scan or a blood test; a patient who schedules a clinic visit for 6 months and 1 day after entry onto the trial, and who is found to have experienced tumor growth, will nonetheless be defined as progression-free at six months by Kaplan-Meier methods. Hence progression-free survival should be defined in terms such as the following: “Response is defined as progression-free survival at 6 months. A responder is defined as anyone who meets any of the following criteria: a negative scan at 7 months or later; a negative scan at five months or later AND no positive scan before 7 months. A failure is defined as any positive scan before seven months. Patients with no negative scan after five months and no positive scan before seven months are excluded from analysis.”

The third step is to conduct the Phase II trial. A fixed number of patients are accrued (see below for sample size considerations), treated with the novel treatment regimen, and followed to the endpoint. Analysis is conducted and the prespecified decision rule applied. This might be, for example, “Progression-free survival is higher than expected on the new treatment, and a one-sided 95% confidence interval excludes no benefit”.

Statistical considerations

The method of statistical analysis involves a technique known as bootstrapping. This works as follows. Imagine that we had a data set for 80 patients undergoing treatment for colorectal cancer, including data on whether the patients had a tumor response (yes or no). We then create a new, artificial group of 80 patients by randomly sampling from this data set. Each patient in the original data set can be sampled once, more than once, or not at all (because a patient can be sampled more than once this is known as “bootstrap resampling with replacement”). The response rate of the new set is likely to be close, although perhaps not identical, to that of the original sample. This new “bootstrap sample” gives us an example of what our results might have been had we repeated the study. If we repeat the bootstrap sample many times (2,000 – 10,000 is typical) we get a range of possible study results. These can be analyzed to conclude, for example, that although we saw a response rate of 20% in the original data set “if we had run the study a large number of times, 95% of the time, the complication rate would have been between 12% and 30%”. As it turns out, we do not have to run a bootstrap to obtain these numbers because there is a simple formula for the confidence interval around a proportion. A bootstrap is helpful when the formula for a particular statistic is difficult to specify.

The step-by-step method for the statistical analysis of our proposed design is as follows:

Bootstrap resample with replacement from the historical cohort
Generate a regression model on the bootstrap sample with the endpoint (e.g. progression-free survival at 6 months) as the dependent variable and the prespecified predictors (e.g. stage and grade of cancer) as independent variables; it is also a good idea to include date as a predictor on the grounds that the outcome of many cancers is gradually improving over time.
Bootstrap resample with replacement from the Phase II data set.
Use the regression model generated in step (2) to calculate the predicted outcome for bootstrapped set of patients created in step (3).
Calculate the mean difference between the predicted and actual outcome for patients in the Phase II trial. Formally, we define S as the estimated change in outcome attributable to the new agent, Y as the actual outcome and p^ as the predicted outcome For the i patients in the Phase II, we then compute:
$S = \frac{1}{n} \sum_{i = 1}^{n} Y_{i} - {\hat{p}}_{i}$
Repeat steps (1) to (5), 10,000 times, recording the value for S, the mean difference between predicted and actual outcome.
Calculate statistics as follows:
1. The estimate for the change in outcome attributable to the Phase II treatment as the mean value of S over all iterations
2. The standard error of S, denoted by SE (S), is given by the standard deviation of S over all iterations.
3. The 95% confidence interval is given by S ± 1.96 × SE (S); the lower bound of a one-sided 95% confidence interval is given by S − 1.64 × SE (S)
4. The test statistic S / SE (S) can be related to a standard normal distribution to obtain the P value associated with null hypothesis of no change in outcome

Some example code for this analysis is given in the appendix.

Power considerations

Power for a given sample size are best calculated using simulation methods (see appendix). Power is affected importantly by the sample size of the historical cohort and by how well outcome can be predicted. For example, take a trial with a historical cohort of 250 that required 30 patients for 80% power. Now imagine that only a single previous Phase II trial was available (n=50); the sample size requirements would nearly double. If predictors were available that were associated with an area-under-the-curve of 0.75, sample size requirements would be reduced by 10 – 15%.

Assuming that the historical cohort is relatively large (at least 5 times greater than the project Phase II sample size) and the predictors of at least moderate predictiveness, a general rule of thumb is that the Phase II design will require a similar number of patients to a one-sample comparison of proportions.

Conclusions

We present a simple Phase II design for CAM agents that is based on appropriate use of historical data, incorporating adjustment for both sampling variation and case mix. The design is methodologically straightforward and involves only simple (though computationally intensive) statistics.

Appendix

1. Stata code for statistical analysis

* create a program that is one iteration
capture program drop ITERATION
program ITERATION, rclass


  version 9.0


  drop _all
  *load up historical cohort
  use "historical cohort.dta"
  * get bootstrapped sample
  bsample
  * create the model on the bootstrap sample
  * dependent variable is progression free survival at 6 months
  * predictors are treatment date (continuous), stage and grade
  * stage is one of 3 categories, therefore there are 2 dummy variables
  logit pfs6 date stage1 stage2 higrade


  *load up Phase II data
  use "phase ii cohort.dta"
  * get bootstrapped sample
  bsample
  *use the model created above to predict outcome in the Phase II cohort
  predict phat
  * work out difference between predicted and actual results
  *s is the effect size
  g s = pfs6 – phat
  * calculate the mean
  sum s
  * save the results
  return scalar s = r(mean)
end


** set seed so that you can replicate results
set seed 0510071709


** run the bootstrap and analysis 10,000 times
set more off
simulate s=r(s), reps(10000): ITERATION
set more on


*what is the effect size
sum s


*what is the 95% C.I.?
centile s, c(2.5 97.5)


*what is the p value?
sum s
local z=r(mean)/r(sd)
disp "z=′z'; p="(1-normprob(′z'))*2

2. Stata code for power calculation

* create a program that creates a single Phase II cohort
capture program drop ONETRIAL
program ONETRIAL, rclass
  drop_all
  *load up historical cohort
  use "historical cohort.dta"
  * create the model
  * dependent variable is progression free survival at 6 months
  * predictors are treatment date (continuous), stage and grade
  logit pfs6 date stage1 stage2 higrade
  * create a new cohort of patients, the Phase II cohort
  * specify the sample size you want to estimate power for (e.g. 30)
  bsample 30
  *apply the statistical model, get linear predictor
  predict linear, xb
  *apply a Bayes factor to improve rates in Phase II cohort
  *first, put in the null from historical data.
  * e.g. say 6 month progression free survival was 30% in 
historical cohort
  scalar original=0.3
  *Put in the target survival rate for the new therapy
  *e.g. assume that progression free survival improved to 50% in 
phase II
  scalar phase2=0.5
  *create predicted probability, adding the Bayes factor
  replace  linear=linear+log((phase2/(1-phase2))/((original/(1-original)))) 
((phase2/(1-phase2))/((original/(1-original))))
  g phat=exp(linear)/(1+exp(linear))
  *create random number to determine event status
  * this is drawn from a 0,1 uniform distribution
  g u=uniform()
  * simulate whether patient had event
  replace pfs6=uniform()<phat
end


*assume that you have a program called ANALYZE
* This program runs the bootstrap analysis
* and returns 1 if results are better on Phase II trial & 
p<5%, onesided
capture program drop POWER
program POWER, rclass
  *create a trial
  ONETRIAL
  *now analyze it
  ANALYZE
  *now save out the result
  return scalar result = $result
end


** set seed so that you can replicate results
set seed 0510071709


** run the bootstrap and analysis 2,000 times
set more off
simulate result=r(result), reps(2000): POWER


*calculate Power
sum result

Footnotes

Dr Vickers’ work on this research was funded by R21 CA103169-02 from the National Cancer Institute.

PERMALINK

Phase II designs for anticancer botanicals and supplements

Andrew J Vickers

Abstract

Introduction

Traditional Phase II design

Application of the traditional Phase II design to CAM agents

A systematic review of the use of historical data in Phase II trials

A simple prediction method for Phase II trials: general considerations

Statistical considerations

Power considerations

Conclusions

Appendix

1. Stata code for statistical analysis

2. Stata code for power calculation

Footnotes

Further reading

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Phase II designs for anticancer botanicals and supplements

Andrew J Vickers

Abstract

Introduction

Traditional Phase II design

Application of the traditional Phase II design to CAM agents

A systematic review of the use of historical data in Phase II trials

A simple prediction method for Phase II trials: general considerations

Statistical considerations

Power considerations

Conclusions

Appendix

1. Stata code for statistical analysis

2. Stata code for power calculation

Footnotes

Further reading

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases