Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2008 Dec 29.
Published in final edited form as: J Natl Cancer Inst. 2007 Oct 30;99(21):1577–1582. doi: 10.1093/jnci/djm185

Statistical Methods for Analyzing Sequentially Randomized Trials

Oliver Bembom 1, Mark J van der Laan 1
PMCID: PMC2610531  NIHMSID: NIHMS51225  PMID: 17971533

Abstract

In this issue of the Journal, Thall et al. present the results of a clinical trial that makes use of sequential randomization, a novel trial design that allows the investigator to study adaptive treatment strategies. Our aim is to complement this groundbreaking work by reviewing the current state of the art of statistical methods available for such analyses. Using the data collected by Thall et al. as an example, we focus on two different approaches for estimating the success rates of different adaptive treatment strategies of interest. By emphasizing the intuitive appeal and straightforward implementation of these methods and illustrating the striking findings to which these methods can lead, we hope to convince the reader that this novel trial design provides a rich source of information that is made readily accessible through current analytical approaches.


In this issue of the Journal, Thall et al. (1) present an analysis of a novel clinical trial in oncology that makes use of sequential randomization to one of four treatment regimens. During this trial, prostate cancer patients who were found to be responding poorly to their initially assigned regimen were randomly reassigned to one of the remaining candidate regimens. In contrast to conventional trials that are based on a single randomization, this design allows the investigator to study adaptive treatment strategies that adjust a patient’s treatment in response to the observed course of the illness. Such adaptive strategies, also referred to as dynamic or individualized treatment rules, form the basis of common medical practice in cancer chemotherapy, with physicians typically facing the following questions: Which regimen should be used to initially treat a patient? Which regimen should the patient be switched to if the first-line regimen fails to control the cancer? Given an observed intermediate outcome, such as a change in tumor size or prostate-specific antigen level, which threshold should be used to decide that the current regimen is failing?

In recent years, sequentially randomized trials have been recognized as being uniquely suited to the study of these exciting questions (25), with researchers in other clinical areas also beginning to implement this design (68). Our aim in this commentary is to complement the groundbreaking work by Thall et al. (1) by reviewing a number of alternative statistical methods that are currently available for studying dynamic treatment rules on the basis of sequentially randomized trials. By emphasizing the intuitive appeal and straightforward implementation of these methods and by illustrating the striking findings to which these methods can lead, we hope to convince the reader that such trials provide a rich source of information that is made readily accessible through current analytical approaches.

The data collected by Thall et al. (1) consist of a series of treatment assignments A̅ = (A1, A2,…) along with corresponding intermediate outcomes S̅ = (S1, S2,…) that indicate whether a given treatment course j was considered to be a success (Sj = 1) or a failure (Sj = 0). The four candidate drug regimens were cyclophosphamide, vincristine, and dexamethasone (CVD); ketoconazole plus doxorubicin alternating with vinblastine plus estramustine (KA/VE); paclitaxel, estramustine, and carboplatin (TEC); paclitaxel, estramustine, and etoposide (TEE). After the initial randomization to one of these four regimens, patients were kept on that regimen if they showed a favorable response to the initial 8-week treatment but were randomly assigned to one of the remaining three treatment options otherwise. The overall outcome Y for a patient was defined as a success (Y = 1) if a regimen yielded two consecutive successful responses and defined as an overall failure (Y = 0) if two unsuccessful responses accumulated.

Thall et al. (1) based their analysis on a logistic regression model that aims to explain the probability of a favorable intermediate outcome Sj as a function of the current regimen Aj as well as two summary measures of the patient’s observed treatment and response history. This approach, which is aimed at estimating the causal effect of the current regimen Aj on the probability of a successful response Sj, can be used to identify a dynamic treatment rule that at each time point j selects the treatment option that is estimated to give the highest probability of a successful response Sj at that time point. One can imagine scenarios, however, in which this strategy does not also maximize the probability of achieving a successful overall outcome Y. A particular regimen might, for example, work very well in the first-line setting but have no appealing salvage options, so that a slightly worse first-line regimen with better salvage options might in fact lead to a higher probability of an overall success. For this reason, we focus on statistical methods aimed at comparing candidate treatment strategies directly on the basis of their overall success rate.

Before using the data collected by Thall et al. (1) to illustrate two different approaches for estimating such success rates, we note that these analyses are based only on those patients who completed the trial. In accord with the approach taken by Thall et al. (1), we thus treat patient dropout as noninformative. Subsequently, we discuss a number of ways in which our analyses could be modified to adjust for this potential source of bias. The available data for those patients who completed the trial are summarized in Table 1. Note that the probability of a given first-line regimen leading to an overall success in the first two treatment rounds can be estimated in a straightforward manner based on the observed proportion of such successes among the treated patients. Of the 26 patients initially assigned to CVD, for instance, four had two consecutive positive responses, leading to an estimated probability of 0.15 (4/26 = 0.15) of CVD, yielding a first-line success. Likewise, we can use an empirical proportion to estimate the probability that a given salvage regimen will lead to an overall success after a particular first-line regimen has failed in a patient. For instance, one of the six patients for whom CVD failed to produce a successful response and who were then randomly assigned to TEC achieved an overall success on TEC. This result leads to an estimated salvage rate of 0.17 (1/6 = 0.17) for TEC given after failure on CVD. These estimated first-line and salvage success rates can be used in a straightforward manner to obtain an estimate of the overall success rate of the dynamic rule d(CVD, TEC), which initially assigns patients to CVD until the treatment fails to produce a positive response, at which point the patients are switched to TEC. By this rule, 15% of patients would be expected to achieve an overall success with CVD, and the remaining 85% would be given TEC as a salvage regimen. One-sixth of these patients, corresponding to 14% of the original cohort, would achieve an overall success on TEC, leading to an estimated overall success rate for d(CVD, TEC) of 0.29 [0.15 + (1.00 − 0.15) 0.17 = 0.29]. In the causal inference literature, this approach for estimating an overall success rate by first estimating the distribution of observed intermediate outcomes is referred to as the G-computation algorithm (9,10). Its application to the analysis of sequentially randomized trials has been described in more detail by Thall et al. (2) and Lavori et al. (4). Lavori et al. (3) also describe a very similar methodology that relies on imputation. Confidence intervals for estimates obtained in this manner are generally based on the bootstrap method (11).

Table 1.

Estimation of overall success rates by G-computation*

First-line therapy
Salvage therapy
Estimated overall success rate
Regimen No. S P Regimen No. S P P (95% CI)
CVD 26 4 0.15 KA/VE 10 5 0.50 0.58 (0.28 to 0.86)
TEC 6 1 0.17 0.29 (0.06 to 0.63)
TEE 6 0 0.00 0.15 (0.04 to 0.31)
KA/VE 28 7 0.25 CVD 7 0 0.00 0.25 (0.10 to 0.42)
TEC 8 0 0.00 0.25 (0.10 to 0.42)
TEE 6 0 0.00 0.25 (0.10 to 0.42)
TEC 30 14 0.47 CVD 5 1 0.20 0.57 (0.33 to 0.85)
KA/VE 4 0 0.00 0.47 (0.28 to 0.65)
TEE 7 0 0.00 0.47 (0.28 to 0.65)
TEE 24 10 0.42 CVD 4 1 0.25 0.56 (0.28 to 1.00)
KA/VE 4 0 0.00 0.42 (0.22 to 0.61)
TEC 6 1 0.17 0.51 (0.28 to 0.78)
*

The number of subjects, number of overall successes (S), and proportion of overall successes (P) for both first-line and salvage regimens are summarized. G-computation estimates of the probability of an overall success for each of the 12 candidate treatment rules along with 95% bootstrap confidence intervals (CIs) are shown. The four candidate drug regimens used were cyclophosphamide, vincristine, and dexamethasone (CVD); ketoconazole plus doxorubicin alternating with vinblastine plus estramustine (KA/VE); paclitaxel, estramustine, and carboplatin (TEC); and paclitaxel, estramustine, and etoposide (TEE). The data used in this analysis were obtained directly from Thall et al. (1).

Although the G-computation approach is straightforward in the example considered above, it is somewhat limited in that it does not generalize easily to more complicated examples. The problems arising in such situations stem from the fact that G-computation typically requires an estimate of the distribution of each intermediate outcome Sj, given each possible treatment and response history up to that time point. In the example described above, we made use of a slightly simplified data structure, as summarized in Table 1, by focusing only on the overall outcome Y rather than all intermediate outcomes Sj. In the context of the resulting data structure, it is straightforward to estimate the distribution of Y, given each possible treatment and response history, because we only need to estimate the success rates of the four different regimens in the first-line setting as well as the 12 different salvage probabilities for patients for whom a given first-line regimen failed. As described above, such estimates can easily be obtained on the basis of empirical proportions. In the context of the original data structure containing up to four time points, however, estimates of the probability of a successful response Sj would not have been available for some treatment and response histories at the later time points, simply because no patients in the dataset had such histories. For example, the dataset at hand contains no patients who were originally assigned to KA/VE, then were reassigned to CVD at the third time point, and responded well to the initial round of treatment with CVD. We thus cannot use an empirical proportion to estimate the probability of a successful response S4 to CVD among patients with such a history. In such situations, the G-computation method has to rely on simplifying assumptions that might, for example, posit that the probability of a successful intermediate outcome Sj is only influenced by the current treatment assignment rather than the entire history of treatment assignments. A similar problem occurs if the intermediate outcomes Sj are continuous. Although it is often possible to reduce the intermediate outcome to a binary or at least categorical variable, we will describe below an important example in which this reduction is not possible. In such situations, estimates of the distribution of the intermediate outcome Sj always have to be based on simplifying assumptions that might, for example, state that Sj is normally distributed, with its mean depending only on the current treatment Aj and, in a linear fashion, on the previously recorded outcome Sj−1. If the estimates of the distribution of Sj, given the observed history, are not valid because the simplifying assumptions do not hold, the resulting G-computation estimate of the overall success rate for a given dynamic rule d is also likely to be biased.

Murphy et al. (12) proposed an alternative methodology for analyzing dynamic treatment rules that, in the setting of randomized trials, yields valid estimates without relying on such simplifying assumptions, both in simpler cases such as the one considered in this commentary and in more complex scenarios with continuous intermediate outcomes or a large number of time points or treatment options. Their methodology relies on the idea of inverse-probability-of-treatment weighting, which was first introduced by Robins and Rotnitzky (13) and Robins (14). To illustrate this approach, we use an alternative summary of the observed data as shown in Table 2. The basic idea of inverse-probability-of-treatment weighting consists of identifying all patients whose observed treatment history is compatible with the dynamic rule under consideration, weighting each of these patients by the inverse of the probability of having been assigned to his particular treatment history, and then simply taking the average outcome in this reweighted sample as an estimate of the corresponding overall success rate. In our example, the first step identifies two different groups of patients for each candidate rule: those who achieved an overall success on the first-line regimen and those who had to move on to the appropriate salvage regimen. Of the 14 patients identified for the rule d(TEE, CVD), for example, 10 patients achieved an overall success with TEE, one patient went on to achieve an overall success with CVD, and both regimens failed for the remaining three patients. Note that the treatment history of the first group of patients is also compatible with the two rules d(TEE, KA/VE) and d(TEE, TEC). The reweighting step is intended to create a sample of patients that is representative of a randomized trial in which all patients were assigned to follow the dynamic rule under consideration. In our example, patients are initially assigned to TEE with a probability of 0.25 and then are reassigned to CVD, provided that TEE failed to produce a positive response, with probability 0.33. We thus obtain weights of 4.00 for patients who experienced an overall success on their first-line regimen and 12.00 for patients who had to be assigned to a salvage regimen. Upweighting the latter group of patients by a factor of 3 relative to the former group is necessary because although a randomized trial that is based on the single rule d(TEE, CVD) would assign all patients for whom TEE failed to produce a positive response to the salvage regimen CVD, the actual trial assigned only about one-third of these patients to CVD, with the remaining two-thirds being assigned to KA/VE or TEC. The observed group of 14 patients whose treatment history was compatible with the rule d(TEE, CVD) thus contains too few patients for whom TEE failed to produce a positive response and who had to be assigned to CVD relative to those who achieved an overall success on TEE.

Table 2.

Estimation of overall success rates by inverse-probability-of-treatment weighting*

Counts
Reweighted
Estimated overall success rate
Rule No. S1 S2 F S1 S2 F P (95% CI)
d(CVD, KA/VE) 14 4 5 5 2.2 5.9 5.9 0.58 (0.28 to 0.86)
d(CVD, TEC) 10 4 1 5 1.5 1.4 7.1 0.29 (0.06 to 0.63)
d(CVD, TEE) 10 4 0 6 1.5 0.0 8.5 0.15 (0.04 to 0.31)
d(KA/VE, CVD) 14 7 0 7 3.5 0.0 10.5 0.25 (0.10 to 0.42)
d(KA/VE, TEC) 15 7 0 8 3.8 0.0 11.2 0.25 (0.10 to 0.42)
d(KA/VE, TEE) 13 7 0 6 3.2 0.0 9.8 0.25 (0.10 to 0.42)
d(TEC, CVD) 19 14 1 4 8.9 2.0 8.1 0.57 (0.33 to 0.85)
d(TEC, KA/VE) 18 14 0 4 8.4 0.0 9.6 0.47 (0.28 to 0.65)
d(TEC, TEE) 21 14 0 7 9.8 0.0 11.2 0.47 (0.28 to 0.65)
d(TEE, CVD) 14 10 1 3 5.8 2.0 6.1 0.56 (0.28 to 1.00)
d(TEE, KA/VE) 14 10 0 4 5.8 0.0 8.2 0.42 (0.22 to 0.61)
d(TEE, TEC) 16 10 1 5 6.7 1.6 7.8 0.51 (0.28 to 0.78)
*

The number of subjects whose treatment history is compatible with a given dynamic rule and the number of overall failures (F) as well as successes on the first-line regimen (S1) and the salvage regimen (S2) for that rule are shown. In addition to the original counts, the table shows them reweighted by the inverse of the estimated probability of a given patient being assigned to his observed treatment history. The inverse-probability-of-treatment–weighted estimates of the overall success rates of the different dynamic rules along with 95% bootstrap confidence intervals (CIs) are also shown. The dynamic rule d(f, s) assigns a patient to the first-line regimen f until the treatment fails to produce a positive response, at which point he is switched to the salvage regimen s. The four candidate drug regimens are cyclophosphamide, vincristine, and dexamethasone (CVD); ketoconazole plus doxorubicin alternating with vinblastine plus estramustine (KA/VE); paclitaxel, estramustine, and carboplatin (TEC); and paclitaxel, estramustine, and etoposide (TEE). The data used in this analysis were obtained directly from Thall et al. (1).

Perhaps somewhat counterintuitively, the performance of the inverse-probability-of-treatment weighting approach can generally be improved by ignoring the known randomization probabilities and instead estimating them through empirical proportions (15). For example, we would estimate the probability of being initially assigned to TEE as 0.22 (24/108 = 0.22) rather than 0.25; similarly, we would estimate the probability of being assigned to KA/VE after failure on TEC as 0.25 (4/16 = 0.25) rather than 0.33. The use of such empirical estimates is particularly useful in complete-case analyses, such as the one presented in this commentary, in which patient dropout can seriously affect the balance of treatment assignments. Table 2 also shows the reweighted observations that are compatible with each dynamic rule, normalized to the actually observed sample sizes. The overall success rate for a given rule then can be estimated by the observed success rate in the corresponding reweighted sample. For the rule d(CVD, TEE), for instance, the reweighted sample contains 1.5 patients who experienced an overall success on CVD and 8.5 patients for whom both regimens failed, leading to an estimated overall success rate for this rule of 0.15 (1.5/10 = 0.15). As above, confidence intervals for such estimates can be obtained with the bootstrap method. An inverse-probability-of-treatment–weighted estimate of an overall success rate is valid as long as the treatment assignment probabilities used to reweight observations are correctly estimated. In sequentially randomized trials, these probabilities are known a priori and can alternatively be easily estimated by empirical proportions so that the inverse-probability-of-treatment weighting approach, unlike the G-computation algorithm, is guaranteed to provide valid estimates in the absence of any additional assumptions.

When Table 1 is compared with Table 2, we note that the two estimation approaches lead to identical estimates in the situation that we considered. This pattern is generally found if the G-computation estimates are not based on simplifying assumptions and if the inverse-probability-of-treatment–weighted estimates make use of empiric estimates of the randomization probabilities. Otherwise, these two approaches may provide slightly different results. Table 1 shows clearly that TEC and TEE had the highest initial response rates and that CVD was the worst first-line regimen by this criterion. The same table illustrates that salvage regimens generally offer very low response rates, so that the particular salvage regimen that is assigned after a given first-line regimen has failed may not be particularly important. The one exception to this observation may be the choice of KA/VE as a salvage regimen after a CVD failure, for which the salvage success rate is estimated at 50%, twice the success rate of 25% of KA/VE in the first-line setting. At the same time, there is no evidence for salvage activity of KA/VE if it was given after TEC or TEE failed. In general, one might expect that poor first-line regimens would allow for higher salvage success rates because the pool of patients entering salvage therapy would tend to be healthier on the whole. After an initial treatment with CVD, however, the other two salvage regimens, TEC and TEE, appeared to have little salvage activity. Although the numbers involved in these comparisons are small, these observations do seem to hint at an interaction between the two regimens CVD and KA/VE. The three rules with the highest estimated success rates were d(CVD, KA/VE), d(TEC, CVD), and d(TEE, CVD). TEC and TEE as first-line regimens followed by other choices of salvage regimens were also estimated to lead to high success rates. The rule d(CVD, TEE), however, was estimated to offer the lowest success rate, followed by rules that involved KA/VE as the first-line regimen. The bootstrap approach can also be used to obtain confidence intervals for comparisons of two overall success rates. Such an analysis showed that the rule d(CVD, KA/VE) as well as those rules starting with TEC or TEE were estimated to provide statistically significantly higher success rates than the worst rule d(CVD, TEE) (data not shown). In addition, the rule d(TEC, CVD) was estimated to be statistically significantly better than rules starting with KA/VE. The remaining comparisons were not statistically significant at the usual two-sided 0.05 level.

One of the most striking findings was the high estimated success rate for the rule d(CVD, KA/VE). On the basis of the first-line response rate, CVD was the worst choice for a first-line regimen. The strong salvage activity of KA/VE among patients for whom CVD failed, however, in fact appeared to make d(CVD, KA/VE) one of the best dynamic rules. These observations underline the caveat given earlier regarding the selection of an optimal rule that is based only on the probability of a successful response to the current treatment course. As evidenced by the analyses performed by Thall et al. (1), such an approach cannot identify the rule d(CVD, KA/VE) as a promising adaptive treatment strategy. We also note that the rule d(CVD, TEE) gave the lowest success rate in our analysis. In fact, the success rate for this rule was estimated to be statistically significantly lower than that for d(CVD, KA/VE). This result indicates that, among patients for whom CVD failed, the choice of salvage regimen can have a substantial impact on the overall success rate. Although the results for the rule d(CVD, KA/VE) are quite remarkable, we would like to stress again that the relevant sample size is quite small and that our analyses did not adjust for patient dropout. Adjusting for this potential source of bias could have led to lower estimated success rates for rules starting with CVD because a disproportionately high number of patients initially assigned to CVD dropped out of the trial because of disease progression (six patients as compared with one for KA/VE or TEC and two for TEE).

In some instances, it may be desirable to rely on a model that simultaneously describes the success rates of all dynamic treatment rules under consideration. If we were to assume, for example, that the effect of a salvage regimen on the overall success rate is independent of the first-line regimen that previously failed, we could use an additive model that does not include any interaction terms. Such a model would require only seven coefficients to describe the success rates of all 12 candidate rules and might thus yield a more parsimonious description of the relationships of interest than an analysis like the one presented above that estimates the success rate for each candidate rule separately. Estimates of these coefficients can be obtained in a straightforward manner by an extension of the inverse-probability-of-treatment weighting approach as described above (16). For the sake of illustration, we summarize the results of such an analysis in the appendix. We note, however, that our findings regarding the salvage activity of KA/VE indicate that the assumption of no interaction between first-line and salvage regimens is unlikely to hold in this situation, so that these results should be treated with care.

We have described another setting in which the model-based approach may seem more appealing (17). Specifically, we considered the case of a sequentially randomized trial that was aimed at comparing dynamic rules that are based not only on the choice of a first-line and a salvage regimen but also on the choice of a threshold that is used to decide when a measured continuous intermediate outcome is interpreted as evidence of an adequate response to the regimen examined. In the trial discussed by Thall et al. (1), for instance, an adequate response to the initial treatment course was defined primarily as a 40% or greater decline in the level of prostate-specific antigen relative to baseline, a criterion that was chosen a priori by expert consensus. It may also be of interest, however, to study the extent to which different switching thresholds lead to improved overall success rates. This question could be investigated through a slightly different sequentially randomized trial design. Suppose we can agree on a lower bound for the intermediate outcome below which patients could not reasonably be expected to continue their initial treatment. Patients whose intermediate outcomes are worse than this minimum level will thus have to be randomly assigned to one of the remaining treatment options as in the sequentially randomized trial described above. Unlike in that trial, however, patients with intermediate outcomes above this minimum level are not kept on their initial treatment but are once again randomly assigned to one of the four candidate regimens. We demonstrated previously (17) how data arising from such a trial could be analyzed parsimoniously by use of a model that includes linear as well as quadratic terms for the switching threshold, with coefficient estimates immediately implying an optimal choice for the first-line and salvage regimen as well as the switching threshold.

We close by considering some options for addressing the problem of potentially informative patient dropout. We suggest that different sources of dropout may require different remedial approaches. If patients drop out for reasons that would not prohibit them from continuing on with the trial, inverse-probability-of-censoring weighting may offer an attractive approach. By weighting each complete observation by the inverse of an estimate of the probability of having completed the trial, this methodology aims to create a reweighted sample that is representative of the ideal randomized trial, in which no patients dropped out. In some situations, however, it would be unrealistic—even in the context of such an ideal randomized trial—to force a patient to continue participating in the trial. In the prostrate cancer trial of Thall et al. (1), for instance, some patients had to leave the trial because of rapid disease progression that required them to receive palliative care instead of one of the four regimens considered as part of the trial. Similarly, other patients had to be given alternative treatment options because of excessive toxic effects from the four candidate regimens. Such patients could not have been realistically expected to continue to participate in the trial, so that an inverse-probability-of-censoring weighting approach aimed at mimicking an ideal randomized trial in which all patients followed their assigned treatments would not be appropriate.

We suggest that dropout for such reasons may be better addressed by modifying the definition of the candidate dynamic rules to ensure that each patient is guaranteed to be able, at least in theory, to comply with the assigned treatment course. In the trial discussed by Thall et al. (1), for example, one might modify the definition of the candidate rules d(f, s) as follows: a patient is assigned to the first-line regimen f until the treatment fails to produce a positive response, at which point he is switched to the salvage regimen s; if his disease progresses during the course of the trial, the treating physician is allowed to deviate from this algorithm to give palliative care; if the patient experiences an excessive toxic effect, the treating physician is likewise allowed to assign an alternative regimen of his choice that is likely to be better tolerated by the patient. If the investigator is then still able to record the outcome of interest for such patients, they can be used as part of the analysis just like any other patient. This approach, which is similar in flavor to an intention-to-treat analysis, would then allow the investigator to estimate the expected success rate if all patients were assigned to one of these more realistic dynamic rules that not only stipulate first-line and salvage regimens but also allow for the occurrence of disease progression and excessive toxic effects. In some sense, the use of such rules would be a direct extension of the arguments that form the basis for considering dynamic rules d(f, s) rather than single-decision rules that expect all patients to comply indefinitely with their initial treatment assignment: randomized trials can often benefit from focusing on treatment strategies that reflect actual clinical practice to ensure that any given patient can be realistically expected to comply with the assigned treatment. Treatment strategies that ignore scenarios that would preclude a patient from complying with treatment assignment may seem initially appealing from a causal–inference point of view because they promise a clear measure of the causal effect of a candidate regimen, but they ultimately cannot live up to this promise because this measure is based on an unrealistic definition that prevents it from being estimable from observed data.

Acknowledgments

Funding

National Institutes of Health (GM071397).

Notes

We would like to thank Dr Randall Millikan from the University of Texas M. D. Anderson Cancer Center for kindly making available the dataset used here to illustrate the different statistical approaches. The authors had full responsibility for the analysis and interpretation of the data, the writing of the manuscript, and the decision to submit the manuscript for publication.

Appendix

We illustrate the idea of using a model to simultaneously describe the success rates of all candidate dynamic treatment rules by considering a logistic model according to which the logit of the overall success rate R(f, s) of a dynamic rule d(f, s) depends in an additive manner on the choice of the first-line regimen f and the salvage regimen s:

logit[R(f,s)]=β0+β1I(f=KA/VE)+β2I(f=TEC)+β3I(f=TEE)+β4I(s=KA/VE)+β5I(s=TEC)+β6I(s=TEE)

where I() is the indicator function that equals one if the condition in parentheses is true and zero otherwise.

van der Laan and Petersen (16) describe how estimates of the coefficients βj can be obtained by first creating a new dataset that for each patient contains one line for each candidate treatment rule d that is compatible with his observed treatment history and then regressing these derived observations on the posited model using as weights, as above, the inverse of the estimated probability that a given patient would have been assigned to his treatment history. In the case considered here, we would thus perform a simple weighted logistic regression to obtain estimates of the coefficients βj, with confidence intervals obtained from the bootstrap method.

Table 3 summarizes the resulting odds ratio estimates for the six possible regimen comparisons in the first-line and the salvage setting. The results indicate that TEC and TEE were the best first-line regimens and that CVD and KA/VE had the worst overall success rates. It should be noted, however, that CVD was estimated to lead to a higher overall success rate than KA/VE in spite of the first-line response rate for CVD being worse than that for KA/VE (Table 1). This result was based entirely on the high salvage activity of KA/VE after a prior failure of CVD, with no comparable salvage treatment options available for patients for whom KA/VE failed to produce a positive response. Again, this finding underscores the danger of judging first-line regimens solely on the basis of their first-line response rate. Table 3 also indicates that CVD and KA/VE appear to be better choices for salvage regimens than TEE or TEC, with TEE leading to statistically significantly worse overall success rates than either CVD or KA/VE. This model-based analysis would thus suggest that treatment rules be used that start with TEC or TEE, followed by CVD or KA/VE after those regimens have failed. It thus misses the rule d(CVD, KA/VE) that our previous analysis highlighted as quite promising, an oversight that can be attributed directly to the unrealistic assumption of no interaction between first-line and salvage regimens. For rules starting with TEC or TEE, this assumption would also cause us to overstate the importance of switching to CVD or KA/VE rather than TEC or TEE. Table 1 shows that most salvage regimens, with the exception of KA/VE given after failure of CVD, have similarly low likelihood of producing an overall success.

Table 3.

Additive model *

Comparison First-line therapy, OR (95% CI) Salvage therapy, OR (95% CI)
KA/VE vs CVD 0.63 (0.14 to 2.12) 0.96 (0.42 to 1.91)
TEC vs CVD 1.79 (0.60 to 6.42) 0.77 (0.33 to 1.55)
TEE vs CVD 1.57 (0.47 to 6.18) 0.54 (0.26 to 0.86)
TEC vs KA/VE 2.85 (0.90 to 13.14) 0.80 (0.36 to 1.81)
TEE vs KA/VE 2.49 (0.73 to 12.60) 0.56 (0.30 to 0.94)
TEE vs TEC 0.87 (0.24 to 3.27) 0.70 (0.35 to 1.20)
*

Odds ratio (OR) estimates for the six possible regimen comparisons for both the first-line and the salvage setting along with 95% bootstrap confidence intervals (CIs) are shown. The four candidate drug regimens are cyclophosphamide, vincristine, and dexamethasone (CVD); ketoconazole plus doxorubicin alternating with vinblastine plus estramustine (KA/VE); paclitaxel, estramustine, and carboplatin (TEC); and paclitaxel, estramustine, and etoposide (TEE). The data used in this analysis were obtained directly from Thall et al. (1).

References

  • 1.Thall PF, Logothetis C, Pagliaro LC, Wen S, Brown MA, Williams D, Millikan R. Adaptive therapy for androgen-independent prostate cancer: a randomized selection trial including four regimens. J Natl Cancer Inst. 2007;99:1613–1622. doi: 10.1093/jnci/djm189. [DOI] [PubMed] [Google Scholar]
  • 2.Thall PF, Millikan RE, Sung H-G. Evaluating multiple treatment courses in clinical trials. Stat Med. 2000;19:1011–1028. doi: 10.1002/(sici)1097-0258(20000430)19:8<1011::aid-sim414>3.0.co;2-m. [DOI] [PubMed] [Google Scholar]
  • 3.Lavori PW, Dawson R. A design for testing clinical strategies: biased adaptive within-subject randomization. J R Stat Soc Ser A Stat Soc. 2000;163:29–38. [Google Scholar]
  • 4.Lavori PW, Dawson R. Dynamic treatment regimes: practical design considerations. Clin Trials. 2004;1:9–20. doi: 10.1191/1740774s04cn002oa. [DOI] [PubMed] [Google Scholar]
  • 5.Murphy SA. An experimental design for the development of adaptive treatment strategies. Stat Med. 2005;24:1455–1481. doi: 10.1002/sim.2022. [DOI] [PubMed] [Google Scholar]
  • 6.Rush AJ, Trivedi M, Fava M, Depression IV. Am J Psychiatry. 2003;160:237. doi: 10.1176/appi.ajp.160.2.237. [DOI] [PubMed] [Google Scholar]
  • 7.Schneider LS, Tariot PN, Dagerman KS, Davis SM, Hsiao JK, Ismail S, et al. Effectiveness of atypical antipsychotic drugs in patients with Alzheimer's disease. N Engl J Med. 2006;355:1525–1538. doi: 10.1056/NEJMoa061240. [DOI] [PubMed] [Google Scholar]
  • 8.Swartz MS, Perkins DO, Stroup TS, Davis SM, Capuano G, Rosenheck RA, et al. Effects of antipsychotic medications on psychosocial functioning in patients with chronic schizophrenia: findings from the NIMH CATIE study. Am J Psychiatry. 2007;164:428–436. doi: 10.1176/ajp.2007.164.3.428. [DOI] [PubMed] [Google Scholar]
  • 9.Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods—application to control of the healthy survivor effect. Math Model. 1986;7:1393–1512. [Google Scholar]
  • 10.Robins JM. Addendum to “a new approach to causal inference in mortality studies with sustained exposure periods—application to control of the healthy survivor effect”. Comput Math Appl. 1987;14:923–945. Math Model 1986;7:1393–1512. [Google Scholar]
  • 11.Efron B, Tibshirani RJ. Monographs on statistics and applied probability. New York: Chapman Hall; 1993. An introduction to the bootstrap. [Google Scholar]
  • 12.Murphy SA, van der Laan MJ, Robins JM. Marginal mean models for dynamic treatment regimens. J Am Stat Assoc. 2001;96:1410–1424. doi: 10.1198/016214501753382327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell V, editors. AIDS epidemiology, methodological issues. Boston, MA: Birkhauser; 1992. pp. 297–331. [Google Scholar]
  • 14.Robins JM. Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. Proceedings of the Biopharmaceutical section, American Statistical Association; 1993. pp. 24–33. [Google Scholar]
  • 15.van der Laan MJ, Robins JM. Springer series in statistics. New York: Springer; 2003. Unified methods for censored longitudinal data and causality. [Google Scholar]
  • 16.van der Laan MJ, Petersen ML. Causal effect models for realistic individualized treatment and intention to treat rules. Int J Biostat. 2007;3 doi: 10.2202/1557-4679.1022. Article 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bembom O, van der Laan MJ. Analyzing sequentially randomized trials based on causal effect models for realistic individualized treatment rules. [Last accessed: October 10, 2007];UC Berkeley Division of Biostatistics Working Paper Series 2007; Technical Report No. 216. doi: 10.1002/sim.3268. Available at: http://www.bepress.com/ucbbiostat/paper216. [DOI] [PubMed]

RESOURCES