Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 10.
Published in final edited form as: Stat Med. 2019 Sep 9;38(28):5317–5331. doi: 10.1002/sim.8363

Quantifying treatment effects using the personalized chance of longer survival

Ying-Qi Zhao a,*, Mary W Redman b, Michael L LeBlanc a
PMCID: PMC6842038  NIHMSID: NIHMS1046867  PMID: 31502297

Abstract

The hazard ratio (HR) is widely used to measure or to summarize the magnitude of treatment effects, but it is justifiably difficult to interpret in a meaningful way to patients and perhaps for clinicians as well. In addition, it is most meaningful when the hazard functions are approximately proportional over time. We propose a new measure, termed personalized chance of longer survival. The measure, which quantifies the probability of living longer with one treatment over the another, accounts for individualized characteristics to directly address personalized treatment effects. Hence, the measure is patient-focused, can be used to evaluate subgroups easily. We believe it is intuitive to understand and clinically interpretable in the presence of non-proportionality. Furthermore, because it estimates the probability of living longer by some fixed amount of time, it encodes the probabilistic part of treatment effect estimation. We provide nonparametric estimation and inference procedures that can accommodate censored survival outcomes. We conduct extensive simulation studies, which characterize performance of the proposed method and data from a large randomized Phase III clinical trial (SWOG S0819) is analyzed using the proposed method

Keywords: Chance of a longer survival, personalized medicine, kernel smoother, inverse probability of censoring weighting, bootstrap

1. Introduction

There has been a paradigm shift in oncology treatment. One focus of anticancer drug development is targeted therapies, which act on specific molecular targets to block the growth and spread of cancer. Usually, a subgroup of patients who have an appropriate target would benefit from a particular targeted therapy. Hence, targeted therapy has become the foundation of precision medicine, where individual information is used to prevent, diagnose and treat disease. In addition, cancer immunotherapy (IO) has been developed, where therapeutic agents are used to target immune cells rather than cancer cells. The utility of IOs works through the complexity of the immune system and the factors needed to activate the immune system, provided there is sufficient time to allow the immune system to work on the cancer. Especially delayed benefits in treatment with IO could occur, given that IOs do not work directly on a patient’s tumor. The targeted nature of targeted therapies and the “indirect” effect of IO motivate the need for modernizing trial analysis to evaluate agents, where the types of personalized effects and non-proportionality are expected.

Time-to event endpoints, such as overall survival are typically considered the gold standard for evaluating treatment efficacy, and are widely used as the primary endpoint in oncology clinical trials. Treatment effects are usually quantified in terms of the hazard ratio (HR), the ratio of hazard rates between two treatment arms. It is most meaningful when the hazard functions are approximately proportional over time. If the assumption of proportional hazards is not met, the true HR could change over time and the calculated HR would not best reflect the treatment benefit (1). More importantly, the hazard ratio is sometimes hard to interpret, since the concept of hazard rate, representing the instantaneous rate of failure for the survivors, could be difficult to explain to patients and practitioners. Researchers have attempted to investigate treatment effects based on other interpretable quantities. For example, milestone survival is evaluated, which is defined as the Kaplan–Meier survival probability at a time point defined a priori. Such a milestone usually represents a clinically meaningful benchmark (2). However, the milestone does not necessarily represent long-term survival. This approach ignores all of the events occurring after the chosen time point (3). Other measures of treatment effect have been proposed, for example, the integrated survival probability difference over a prespecified time interval (4), and the ratio of restricted mean survival times (5). However, they have no straightforward application to the individual patient. As new cancer treatments such as targeted therapies become increasingly popular, determination and evaluation of the groups who derive the most benefit become more important and complicated. Benefiting greatly from The Cancer Genome Atlas project (6), and the rapid pace of molecular and cancer immunology research, the current focus of most therapeutic development in oncology is on targeted therapies for molecular biomarker-defined subgroups, immunotherapies in immune biomarker-defined subgroups, or combinations of these therapies, also in subgroups (7; 8). Given this focus, incorporation of robust identification and evaluation of the candidate subgroups who are likely to derive the most benefit within clinical trial design and analysis has become more important and complicated. Indeed, there have been methods developed for estimating personalized treatment effects from censored data. For example, some authors discussed construction of optimal individualized treatment rules using censored data (9; 10; 11). Tian et al (2014) developed a method to estimate interactions between a treatment and a large number of covariates for survival outcomes (12). Henderson et al (2018) focused on estimating heterogeneous treatment effects using a nonparametric accelerated failure time model (13). However, as discussed in (14), in today’s era with increasing costs of therapies and the associated toxicities, it is important to not only evaluate new treatments in terms of their statistical improvement over standard-of-care therapies but also to evaluate the therapies relative to benchmarks thought to be clinically meaningful. We note that none of these aforementioned methods focus on the evaluation of personalized treatment effects against a clinically meaningful amount and consistent with the current paradigm, it would be useful to augment the current approaches such that we can describe personalized clinically meaningful benefits.

Péron et al. (2016) (15) proposed a measure of treatment benefit, termed the net chance of a longer survival, to demonstrate the treatment effect in clinical trials. The net chance of a longer survival is defined as the probability that a random patient in the treatment group survives longer (by certain time) than a random patient in the control group minus the probability of the opposite situation (16; 15). For example, if the net chance were estimated to be 0.1, a random patient in the treatment group would have a 10% higher probability of a longer survival than a patient in the control group. However, their developments did not account for censored data. Simply ignoring the censored observations could lead to biased results. Furthermore, this measurement does not take into account the personal characteristics that might affect the net chance of a longer survival. It is likely that treatment may work on a subset of patients, even though the overall treatment benefit is not significant. Hence, a more comprehensive measurement is desirable to characterize the individualized treatment effect.

In this paper, we propose a more patient-oriented measure – personalized chance of a longer survival, an augmented measure based on the net chance of benefit developed in (15) which incorporates the variability of standard(control arm) treatment efficacy and censored observations to estimate overall and subgroup treatment effects. The measure answers a patient’s question: “For a patient like me, what is the net chance of surviving longer with new treatment than with standard care?” or, choosing a clinically meaningful benchmark of 6 months (for example) “What is the net chance of surviving at least 6 months longer with new treatment than with standard care?” Its population analog, chance of a longer survival, is also presented. We develop valid estimation and inference procedures for the proposed measures using kernel smoothers, which can handle the situation with censored data.

In Section 2, we define the general concepts of chance and personalized chance of a longer survival. In Section 3, we discuss the nonparametric estimation and inference methods for the proposed measures. In Section 4, we conduct extensive simulation studies to examine the empirical performance. A data example using data from the SWOG Cancer Research Network (formerly known as Southwest Oncology Group) S0819 trial is presented in Section 5, and Section 6 provides a discussion.

2. Personalized chance of a longer survival

Let T denote the overall survival, X=(X1,,Xp)Tp denote baseline characteristics, and A ∈ {0, 1} denote the binary treatment assigned with 1 indicating the investigational treatment and 0 indicating the standard of care/control arm treatment. Let T(a) denote the potential outcome (17; 18) under the treatment a, a = 0,1, and T˜(0) is an independent copy of T(0). The chance of a longer survival by at least m months is defined as

Δ(m)=P{T(1)>T(0)+m}P{T˜(0)>T(0)+m}.

Here, the choice of possible values for m is constrained by τ, and the value of Δ(m) can range from −1 to 1. It is intuitive to set m = 0 in Δ(m), which directly quantifies the chance of a greater survival probability. However, simple statistical improvement of a new therapy may not be adequate. Physicians and the FDA are evaluating therapies for their durable and clinically meaningful effects as treatment efficacy can also coupled with a rise in the incidence of toxicities or high costs associated with treatment. The ability to evaluate scenarios with m > 0 facilitates evaluation of the additional benefit in survival probability against clinical benefit benchmarks. Given that different patients may have different preferences over the tradeoff between efficacy and toxicity, a comprehensive investigation of scenarios with different m values will be valuable for patients to make decisions. The choice of m will also depend on the specific disease setting. For example, in advanced lung cancer m = 0 may be adequate but in earlier stage breast cancer, it may be important to choose a larger m.

In defining Δ(m), the second term serves as an offset that reflects the variability of the survival time for patients receiving the control arm treatment. By using this offset term, the proposed quantity reflects the benefit in a longer survival solely due to the treatment. The usefulness of such adjustment depends on the tail behavior of the survival distribution. A longer and heavier tail distribution leads to a more important adjustment. Furthermore, it may not play a role if m is extremely large, as the probability will be close to 0. However, for small to moderate m, higher variability in the control arm will lead to a larger adjustment. If there is no variability of outcomes at all in the control arm, then we can completely attribute the chance of a better survival to the treatment arm. For example, while P{T˜(0)>T(0)}1/2, P{T˜(0)>T(0)+m} for moderate m > 0 is larger with higher variability in control arm.

Provided that we are using data from a randomized clinical trial, it is reasonable to assume that the potential outcomes are well defined. We can express the quantity in terms of the data generating model (19). It follows that expressing Δ(m) as a function of observable outcomes, the quantity is defined as:

Δ(m)=P{T1i>T0j+m}P{T0k>T0j+m},

where T1i is the outcome of a patient i from treatment group, and T0j and T1j are the outcomes of patients j and k from control group respectively. This is the probability that a random patient in the treatment group survives longer than a random patient in the control group by at least m, minus the probability of a random patient in the control group survives longer than another random patient in the control group by at least m. Hence, the chance of a longer survival for a patient in the treatment group is adjusted by the probability that the patient could have a longer survival even in the control group.

Remark 1 Assume that the proportional hazards assumption holds. Furthermore, there are constant hazard rates in both arms, with hazard rate λ1 in treatment arm A = 1 and λ0 in treatment arm A = 0. This indicates there is a constant HR between the two arms, Γ = λ10. Then,

Δ(m)=P{T1i>T0j+m}P{T0k>T0j+m}=(11+Γ12)eλ1m=(1Γ)eλ1m2(Γ+1).

For the special case of m = 0,

Δ(0)=P{T1i>T0j}P{T0k>T0j}=1Γ2(Γ+1).

Remark 1 provides a way to determine the clinically meaningful difference for Δ(m). For example, assume that median survival in control group is 6 months. According to (14), relative improvements of at least 20% in median overall survival are necessary to deem improvement in outcome clinically meaningful. We set our goal to increase median survival to 8 months (hazard ratio Γ = 0.75, assuming exponential survival function). In this case, a meaningful m value is 2 months, and a clinically meaningful improvement is Δ(2) = 6%.

Δ(m) is referred to a random patient without accounting for his/her individual characteristics. We further propose the concept of the personalized chance of a longer survival, denoted as Δ(m;x). For a patient with covariate x,

Δ(m;x)=P{T(1)>T(0)+m|X=x}P{T˜(0)>T(0)+m|X=x}.

P{T(1) > T(0) + m|X = x} is the probability for a patient with characteristics x to be favorable to the treatment by m month, and and P{T˜(0)>T(0)+m|X=x} describes the probability for a patient with covariate x to live longer by m months compared to another patient with covariate x when they are both receiving the control therapy. Denote the outcomes of a patient in treatment or control group as T1 and T0 respectively. With data from a randomized clinical trial,

Δ(m;x)=P{T1>T0+m|X=x}P{T˜0>T0+m|X=x},

where T˜0 is an independent copy of T0.

Remark 2 Assume that the Cox proportional hazards model holds for the data. In particular, for patient with covariate x in treatment group, the hazard function is

λ(t|x,a)=λ0(t)exp(γa)exp(xTβ).

The cumulative baseline hazard function is denoted by Λ0(t). Then,

P(T1>T0|X=x)=exp(Λ0(t))exp(γ)exp(xTβ)fT0(t)dt=exp(Λ0(t))exp(γ)exp(xTβ)exp(Λ0(t))exp(xTβ)λ0(t)exp(xTβ)dt=exp(xTβ){exp(γ)+1}exp(xTβ)λ0(t){exp(γ)+1}exp(xTβ)exp{Λ0(t)}{exp(γ)+1}exp(xTβ)dt=1exp(γ)+1.

Hence, Δ(0;x) is also a constant, where

Δ(0;x)=1exp(γ)2{exp(γ)+1}.

Remarks 1 and 2 establish connection between the proposed concepts and the popular Cox proportional hazards model. Particularly, under proportional hazards assumptions we can estimate the personalized chance of a better survival using the Cox model, which is appealing in practice. Our development in later sections focuses on non-parametric estimation that does not require the assumption of proportional hazards for estimation and inference.

3. Estimation and inferences

3.1. Setup and pairwise comparison

Let τ be end of study. Let C denote the censoring time, which could go beyond τ, and assume that C and T are independent given (A,X). Our data consists of n independent identically distributed subjects, {Yi = TiCii = I(TiCi),Xi,Ai}, i = 1,…, n, where δ = I(TC) denotes the censoring indicator. We assume that C and T are independent given (X,A).

Assume the observed outcome from the treatment group is denoted by Y1 and denoted by Y0 from the control group. For any pair of individuals, we can compare their outcomes and define the pairwise indicator function for l = 0, 1 as

L(Yli,Y0j)={1if YliY0j>m,0Otherwise.

The pairwise comparison is evident if the outcomes are binary or continuous (16). However, in the presence of censoring with a time-to-event outcome, the pairwise comparison could be uninformative, if either or both outcomes are censored. In the following section, we will propose nonparametric estimation and inference methods for both Δ(m) and Δ(m;x) based on the pairwise comparisons, where inverse probability of censoring weighting techniques are used to adjust for censored observations, and kernel smoothing is considered to derive personalized effects.

3.2. Estimating chance of a longer survival

If all patients are fully observed, i.e., Yi = Ti, Δ(m) can be estimated by Δ^(m), where

Δ^(m)=i=1n1j=1n0L(Y1i,Y0j)n0n1i=1n0j=1n0L(Y0i,Y0j)n02, (1)

with n0 and n1 as the number of subjects in treatment arms A = 0 and A = 1 respectively. To simplify notations, even though the pairs in the second term are different from the ones in the first, we keep the same indices notation. In estimating Δ(m), the second term serves as an offset that reflects the estimated variability of the survival time for patients in the control arm. Note that this quantity is a generalization of Mann–Whitney U statistic, which tries to estimate P(X > Y), where X,Y are two random variables generated from two distributions (16).

With the presence of censoring, (1) may lead to biased results if we simply drop all uninformative pairs. We weight each observed pair by the inverse probability of censoring weights. Hence, each observed pair would represent multiple pairs that are similar to those who might be censored. Let δ1 and δ0 denote the censoring indicator in group 1 and group 0 respectively, and SC(t|x,a) = P(C > t|A = a,X = x) be the conditional treatment specific survival function for the censoring time given covariates x. Then,

E{I(δ1=1,δ0=1)L(Y1,Y0)SC(Y1|X,A)SC(Y0|X,A)|X,A}=E{I(C1>T1)I(C0>T0)(T1T0>m)SC(Y1|X,A)SC(Y0|X,A)|X,A}=P(T1T0>m|X,A),

where we have used the conditional independence of T and C given X,A. Consequently, Δ(m) can be estimated by

Δ^(m)=i=1n1j=1n0I(δ1i=1,δ0j=1)L(Y1i,Y0j)S^C(Y1i|X1i,1)S^C(Y0j|X0j,0)i=1n1j=1n0I(δ1i=1,δ0j=1)S^C(Y1i|X1i,1)S^C(Y0j|X0j,0)l=1n0j=1n0I(δ0l=1,δ0j=1)L(Y0l,Y0j)S^C(Y0l|X0l,1)S^C(Y0j|X0j,0)l=1n0j=1n0I(δ0l=1,δ0j=1)S^C(Y0l|X0l,1)S^C(Y0j|X0j,0). (2)

3.3. Estimating personalized chance of a longer survival

The information on the patient with covariate x is usually limited, especially if it is not discrete, in the observed dataset. To estimate the personalized chance of a longer survival, we will need to borrow information from other subjects. Given that we can compare different pairs, we can incorporate each pair’s information with certain weight that represents the similarity between this pair and a patient of interest. For this purpose, we will use a kernel smoother to estimate Δ(m;x).

Let K be a kernel function and bn be a bandwidth sequence. Define Kbn(x0X)=K{(x0X)/bn}/bnp, where p is the number of covariates. Without any censoring, we propose to use the kernel estimator as

Δ^(m;x)=i=1n1j=1n0Kbn(xX1i)Kbn(xX0j)L(Y1i,Y0j)i=1n1j=1n0Kbn(xX1i)Kbn(xX0j)l=1n0j=1n0Kbn(xX0l)Kbn(xX0j)L(Y0l,Y0j)l=1n0j=1n0Kbn(xX0l)Kbn(xX0j). (3)

Here, we assume that the bandwidth parameters are the same for simplicity. The estimator incorporates all pairs information between two arms in the first term, accompanied with the weights Kbn(xX1i)Kbn(xX0j), which represent how similar the X1i and X0j are relative to patient x. This subsequently represents the similarity between the pair (Y1i,Y0j) and (T1,T0) for patient x to some extent. Similarly, the second term incorporates all pairs information within the control group to estimate P{T˜0>T0+m|X=x} in Δ(m;x). The following proposition characterizes how the bias and variance of Δ^(m;x) depend on the bandwidth bn.

Proposition 3.1 Assume that K(u) is symmetric about 0 and satisfies ∫ K(u)du = 1. Then,

bias{Δ^(m;x)}=bn22σK4M(x)+O(bn4),

and

Var{Δ^(m;x)}=4R(K)2v(x)n2bn2p,

where v(x)=var{Δ(m;X)}|X=x, σK2=u2K(u)du, R(K)=K(u)2du, and M(x) is a function of x related to the first and second derivatives of Δ(m;x) and the density function. Details of M(x) can be found in the proof available in the Appendix.

The average mean squared error of Δ^(m;x) is of the order O(bn4)+O(1/n2bn2p). A requirement for the average mean squared error to decrease to zero is that bn → 0 and nbnp. The optimal bandwidth bn has the order of n−1/(p+2).

When data are subject to censoring, we propose the kernel estimator using inverse probability of censoring weighting techniques as

Δ^(m;x)=i=1n1j=1n0Kbn(xX1i)Kbn(xX0j)I(δ1i=1,δ0j=1)I{L(Y1i,Y0j)=1}S^C(Y1i|X1i,1)S^C(Y0j|X0j,0)i=1n1j=1n0Kbn(xX1i)Kbn(xX0j)I(δ1i=1,δ0j=1)(S^C(Y1i|X1i,1)S^C(Y0j|X0j,0)l=1n0j=1n0Kbn(xX0l)Kbn(xX0j)I(δ0l=1,δ0j=1)I{L(Y0l,Y0j)=1}S^C(Y0l|X0l,1)S^C(Y0j|X0j,0)l=1n0j=1n0Kbn(xX0l)Kbn(xX0j)I(δ0l=1,δ0j=1)(S^C(Y0l|X1i,1)S^C(Y0j|X0j,0).

In general, the convergence rate of Δ^(m;x) will not be affected by the inverse probability of censoring weighting estimator; if we use working models for estimating SC(t|x,a) it yields a root-n convergence rate. For example, we can use the Cox proportional hazards model for SC(t|x,a) (20). Let λCi(t) denote the hazard function of censoring times for subject i, and let Z denote regressors constructed from X and A used for modeling C. Under the Cox model, λCi(t)=λC0(t)exp(βCTZCi), where λC0(t) is the baseline hazard function for censoring. The estimator for βC, say β^C, can be obtained by maximizing the partial likelihood. The Breslow estimator can be used for the cumulative baseline hazard function ΛC0(t), denoted by Λ^C0(t). An estimator of ΛC(t|Ai,Xi), the cumulative hazard function of censoring time for subject i, is thus exp(β^CTZCi)Λ^C0(t). An estimator for SC(t|Ai,Xi) is S^C(t|Ai,Xi)=exp{Λ^C0(t)}exp(β^CTZCi).

3.4. Bandwidth selection

As for any nonparametric estimation problem, it is crucial to select the appropriate bandwidth bn. Since the theoretical optimal bandwidth is difficult to derive, we propose a practical empirical method for choosing the best bandwidth. If the biomarker is discrete, the bandwidth can be simply chosen as a nonzero value. Basically, in that case, estimation uses the information from all subjects in the dataset who have the biomarker value as x for the estimating Δ^(m;x). If the biomarker is continuous, we use the cross-validation method with Δ^(m) as criterion to choose bn for Δ^(m;x) The estimated optimal bandwidth minimizes an average squared distance between the observed Δ^(m), and the predicted counterparts from aggregated Δ^(m;x) in the validation samples under K-fold cross-validation setting. We randomly split the data into K disjoint subsets of approximately equal sizes, denoted by {Il,l = 1,…,K}. For Xi in each l, we estimate Δ^(m;Xi) using data not in Il with a given bn. We denote this estimator as Δ^(l)(m;Xi). We then calculate Δ^(l)(m;Il) as

Δ^(l)(m;Il)=1nliIlΔ^(l)(m;Xi),

where nl denotes the number of subjects in Il.Δ^(m;Il) can be estimated with data from Il using (2). We calculate the square error {Δ^(m;Il)Δ^(l)(m;Il)}2 for the lth subset. Last, we sum the square error over l = 1,…,K and then choose bn by minimizing the sum over the K-folds.

3.5. Bootstrap methods for constructing confidence intervals

It is difficult to derive this limiting distribution in our case. However, the limiting distribution of Δ^(m;x) can be approximated using the bootstrap. It was shown in (21) that limiting distribution of kernel regression estimates may be closely approximated by a Gaussian process under mild condition. Though the kernel type estimator does not achieve root-n consistency, the bootstrap can be shown to consistently approximate the limiting distribution of the personalized net benefit function, using arguments such as those given in (22).

Let (Y**,X*,A*) denote a resampling of (Y,δ,X,A) drawn randomly. The bootstrap estimate and Δ^*(m;x) are calculated based on the resample, in which (Y,δ,X,A) are replaced by (Y**,X*,A*). We repeat this process M times, and obtain M values of Δ^*(m;x), denoted as Δ^1*(m;x),,Δ^M*(m;x). We then obtain the bootstrap percentile method (1 – α) confidence interval as [Δ^αM/2*(m;x),Δ^MαM/2*(m;x)], where t is the largest integer that is smaller than t. The confidence interval for Δ^(m) can be obtained similarly. Hall et al (2001) showed that there is no general first-order theoretical improvement in the accuracy of bootstrap approximations when using resampled bandwidth compared to the sample version. However, the computational burden is much higher (23). Hence, we recommend not to vary the bandwidth across resamples in the implementation.

4. Simulation Studies

We conducted simulation studies to illustrate our proposed method under differing treatment survival models. We generated treatment variable A∈ {0, 1} with a probability of 0.5. The survival times for patients in the control group were generated from a Weibull distribution, which is sufficiently flexible to approximate various survival curves observed in cancers. The Weibull model is given by S(t) = exp(−λtk), where λ is called the scale parameter and k is called the shape parameter. For the control arm, we set λ0 = 0.295 and k = 2. In Scenario 1, survival times in the treatment group followed a Weibull distribution with k = 2 and λ1 = 0.221. The hazard ratio was constant and equal to 0.75. The proportional hazard assumption does not hold in Scenarios 2 & 3. In Scenario 2, survival times in the treatment group followed a Weibull distribution with k = 2 and λ1 changed such that the HR was 0.4 between 0 and 4 months, 0.55 between 4 and 8 months, 0.7 between 8 and 12 months, 0.85 between 12 and 16 months, and to 1 after 16 months. This corresponds to the case where there is better survival in the treatment arm in the beginning, but the treatment benefit disappears at later times after registration/randomization. In Scenario 3, survival times in the treatment group followed a Weibull distribution with k = 2 and λ1 such that the hazard ratio was 1 between 0 and 4 months. It decreased to 0.875 between 4 and 8 months, to 0.7 between 8 and 12 months, to 0.5 between 12 and 16 months and to 0.3 thereafter. This represents the case where there exists a delayed effect, and treatment works better in a later stage. The delayed separation has been observed in recent clinical trials with IOs (24; 25). See Figure 1 for the presentation of data generated under Scenarios 1–3.

Figure 1.

Figure 1.

Kaplan-Meier curves for, left panel: Scenario 1; middle panel: Scenario 2; right panel: Scenario 3

We then considered cases where there existed a subgroup, and the treatment effects could differ between subgroups. We generated a binary covariate X ∈ {0, 1} with equal probabilities. Patients are randomized with equal probabilities to treatment (A = 1) and control (A = 0). In Scenario 4, for patients with X = 0, both survival times in treatment and control were generated from Weibull with λ0 = λ1 = 0.295 and k = 2. Data for patients with X = 1 were generated according to Scenario 2. In Scenario 5, survival times for patients with X = 0 in the treatment group followed a Weibull distribution with k = 2 and λ1 = 0.266. The hazard ratio was equal to 0.9. Hence, there is a moderate treatment benefit. Survival times for patients with X = 1 were simulated following Scenario 3. In Scenario 6, the survival times for patients with X = 0 were generated using a constant hazard ratio equal to 2.5 (detrimental effect of the experimental treatment), and the survival times for patients with X = 1 were generated using a constant hazard ratio equal to 0.4 (benefit of the experimental treatment). See Figure 2 for the presentation of data generated under Scenarios 4–6.

Figure 2.

Figure 2.

Kaplan-Meier curves for, left panel: Scenario 4; middle panel: Scenario 5; right panel: Scenario 6

Finally in Scenario 7, we investigated the case with a continuous biomarker, where X was generated from a standard normal distribution. The survival time T is the minimum of τ = 50 and T˜, which is generated with hazard rate function

λT˜(t|X,A)=exp{1+(0.25+X)A}.

In all scenarios, we considered two censoring mechanisms. The censoring time C was generated from a uniform distribution on [0, 75] under censoring mechanism 1. With censoring mechanism 2, C depends on the covariate X, which is the minimum of 120 and the time generated with a hazard rate function e2X/20.

A data set was generated including 2 treatment groups for each scenario, with the total sample size being 500 or 1000. We compared the proposed method with the method ignoring censoring in estimating the chance of a longer survival with m = 0. The biases and the associated 95% confidence intervals out of 500 replications are shown in Tables 1 and 2, along with the true adjusted probability of patients living longer on treatment. The confidence intervals were constructed based on 500 bootstrap samples. It can be seen that ignoring censoring could lead to large biases and severe under coverage in certain situations. The proposed method had much smaller biases and achieved nominal coverage most of the time, regardless of the censoring mechanism. And the performance improved as sample sizes increased.

Table 1.

Simulation results for chance of a longer survival for n = 500

Case Truth Censoring Type Bias 95% CI
Proposed Ignore Censoring Proposed Ignore Censoring
1 0.071 1 0.004 0.019 0.932 0.896
1 0.071 2 0.006 0.017 0.93 0.916
2 0.055 1 0.005 0.008 0.932 0.938
2 0.055 2 0.006 0.007 0.93 0.95
3 0.161 1 0.004 0.062 0.932 0.414
3 0.161 2 0.005 0.055 0.932 0.522
4 0.041 1 0 0.009 0.95 0.944
4 0.041 2 0 0.012 0.946 0.932
5 0.081 1 0 0.032 0.952 0.796
5 0.081 2 0.001 0.046 0.946 0.644
6 0 1 0.004 0.009 0.924 0.94
6 0 2 0.003 0.043 0.932 0.654
7 0.022 1 0.004 −0.012 0.938 0.942
7 0.022 2 0.012 0.032 0.942 0.735

Table 2.

Simulation results for chance of a longer survival for n = 1000

Case Truth Censoring Type Bias 95% CI
Proposed Ignore Censoring Proposed Ignore Censoring
1 0.071 1 0.001 0.02 0.952 0.842
1 0.071 2 0.002 0.019 0.938 0.814
2 0.055 1 0.002 0.008 0.954 0.95
2 0.055 2 0.003 0.009 0.938 0.912
3 0.161 1 0.001 0.063 0.966 0.096
3 0.161 2 0.002 0.058 0.944 0.166
4 0.041 1 0.001 0.009 0.944 0.932
4 0.041 2 0.001 0.011 0.948 0.918
5 0.081 1 0.001 0.032 0.94 0.624
5 0.081 2 0 0.045 0.954 0.362
6 0 1 0.001 0.011 0.934 0.896
6 0 2 0.001 0.045 0.96 0.342
7 0.022 1 0 −0.013 0.964 0.884
7 0.022 2 0.008 0.03 0.914 0.558

We estimated the personalized chance of a longer survival using the proposed method for patients with X = x1 and X = x2, where x1 = 1 and x2 = 0 in Scenarios 1–6, and x1 and x2 were set to the 75 and 25 percentiles of the biomarker X in Scenario 7. The results are presented in Tables 3 and 4. The true personalized chance of patients living longer were the same for x1 and x2 in Scenarios 1–3, since there were no subgroups present in the data. They differed in Scenarios 4–7 given that there was an interaction between the biomarker and the treatment in the generative model for the survival time. The proposed method gave good performance, although there was slight under coverage for some scenarios with n = 500 perhaps due to limited observations with biomarker values in the neighborhood of x1 (x2). Such phenomena disappeared when the sample size was increased to 1000.

Table 3.

Simulation results for personalized chance of a longer survival for x1

Case Truth Censoring Type Bias 95% CI
n = 500 n = 1000 n = 500 n = 1000
1 0.071 1 0.007 0.001 0.936 0.966
1 0.071 2 0.012 0.002 0.934 0.944
2 0.055 1 0.008 0.001 0.926 0.962
2 0.055 2 0.013 0.003 0.934 0.958
3 0.161 1 0.007 0 0.916 0.962
3 0.161 2 0.01 0.001 0.934 0.928
4 0.055 1 0.002 0.001 0.948 0.942
4 0.055 2 0.006 0.004 0.934 0.944
5 0.161 1 0.003 0 0.936 0.956
5 0.161 2 0.003 0.003 0.934 0.948
6 0.214 1 0.003 0.001 0.938 0.93
6 0.214 2 0.004 0.001 0.914 0.95
7 −0.089 1 0.023 0.015 0.914 0.932
7 −0.089 2 0.021 0.014 0.935 0.91

Table 4.

Simulation results on personalized chance of a longer survival for x2

Case Truth Censoring Type Bias 95% CI
n = 500 n = 1000 n = 500 n = 1000
1 0.071 1 0.007 0.004 0.946 0.942
1 0.071 2 0.006 0.005 0.94 0.92
2 0.055 1 0.008 0.005 0.952 0.93
2 0.055 2 0.006 0.005 0.946 0.914
3 0.161 1 0.006 0.004 0.938 0.94
3 0.161 2 0.006 0.005 0.946 0.928
4 0.026 1 0.003 0.001 0.94 0.934
4 0.026 2 0.001 0.001 0.944 0.942
5 0 1 0.003 0.001 0.948 0.924
5 0 2 0.001 0.001 0.944 0.942
6 −0.214 1 0.01 0.004 0.944 0.942
6 −0.214 2 0.008 0.003 0.938 0.946
7 0.151 1 0.005 −0.004 0.945 0.97
7 0.151 2 −0.004 −0.007 0.962 0.962

5. Data analysis: SWOG S0819 trial

SWOG S0819 was an open-label, phase 3 study evaluating the addition of cetuximab to standard-of-care chemotherapy, with bevacizumab, as appropriate, as first-line treatment for advanced non-small cell lung cancer (NSCLC). Cetuximab, an EGFR-directed monoclonal antibody, blocks ligand-induced EGFR activation, stimulates receptor internalization, and is capable of inducing antibody-dependent cellular cytotoxicity. The design of this study has been previously described (26) and the results published (27). Prior to the launch of this trial, there was evidence that cetuximab may have broad activity within this population, also hypothesized that the activity may be greater within subsets of patients based on EGFR expression. The goal of the study was to evaluate if the addition of cetuximab resulted in a longer duration to progression (progression-free survival) among patients defined to be EGFR positive based on the fluorescence in situ hybridization (FISH) assay, while the study enrolled an unselected population to evaluate the activity in this broader population as a coprimary objective. The primary endpoint for the entire study population evaluation was overall survival (OS). The sample size was based on attaining a specific number of EGFR FISH positive patients. In all, 1313 patients were randomized with 657 patients in the control and 656 in the cetuximab-containing arm; 400 EGFR FISH positive patients were enrolled. The study was closed at an interim analysis, meeting the pre-specified rules for futility. PFS was not significantly different in the EGFR FISH positive group and OS did not differ in the entire study population (HR 0.93, 95% CI 0.83–1.04; p=0.22; median 10.9 months [95% CI 9.5–12.0] vs 9.2 months [8.7–10.3]). In addition the evaluation within EGFR FISH positive patients, a pre-specified subgroup analysis based on the results of the SQUIRE trial (28) evaluated OS among EGFR positive patients with squamous histology. This analysis found a benefit within this subgroup (HR 0.58, 95% CI 0.36–0.86; p=0.0071). There were no significant differences and overall survival did not differ in other subgroups.

We applied the proposed measure on data from overall trial. We considered four subgroups of interest based on the histology type and FISH status: x = (1, 1) indicating FISH positive with squamous cell histology (111 patients), x = (1, 0) indicating FISH positive with non-squamous cell histology (289 patients), x = (0,1) indicating FISH negative with squamous cell histology (210 patients) and x = (0, 0) indicating FISH negative with non-squamous cell histology (703 patients). We present the chance and personalized chance of a better survival in Figure 3. The chance of a longer survival in the overall population is around 0 across all values of m, suggesting that the probability of living longer on treatment is not any better than the control. However, patients with FISH positive status and squamous cell histology enjoy significant benefits from the treatment, where the probability of living longer on treatment by 1 year is close to 10%. The 95% confidence intervals, calculated via bootstrap method described earlier, are shown in Table 5. We observe that as m increase, the confidence intervals are narrow. This is because P{T(1) > T(0) + m} and P{T˜(0)>T(0)+m} decrease over increasing m as well as the variances of the estimated probabilities Δ(m)=P{T(1)>T(0)+m}P{T˜(0)>T(0)+m}. In addition, when the m is large enough, Δ(m) is effectively equal to 0. There is no observation reflecting m difference in months in the empirical dataset. Subsequently we obtain a boostrap estimated confidence interval of (0, 0). There is a practical limitation on empirical estimation of confidence intervals with large m and limited follow-up.

Figure 3.

Figure 3.

Chance and personalized chance of a longer survival in S0819

Table 5.

95% confidence intervals for chance and personalized chance of a longer survival in S0819

m Δ(m) Δ(m; x = (1, 1)) Δ(m; x = (1, 0)) Δ(m; x = (0, 1)) Δ(m; x = (0, 0))
0 (−0.056,0.057) (0,0.375) (−0.082,0.155) (−0.131,0.126) (−0.085,0.059)
3 (−0.063,0.058) (−0.065,0.368) (−0.101,0.154) (−0.16,0.115) (−0.098,0.068)
6 (−0.067,0.06) (−0.058,0.391) (−0.092,0.156) (−0.158,0.126) (−0.105,0.069)
9 (−0.071,0.056) (−0.083,0.346) (−0.092,0.146) (−0.13,0.137) (−0.122,0.054)
12 (−0.076,0.05) (−0.083,0.316) (−0.085,0.134) (−0.135,0.129) (−0.133,0.033)
15 (−0.079,0.041) (−0.062,0.279) (−0.088,0.116) (−0.116,0.138) (−0.137,0.024)
18 (−0.081,0.034) (−0.045,0.258) (−0.077,0.098) (−0.112,0.131) (−0.131,0.016)
21 (−0.074,0.032) (−0.037,0.204) (−0.075,0.076) (−0.07,0.135) (−0.126,0.018)
24 (−0.068,0.032) (−0.026,0.178) (−0.074,0.045) (−0.053,0.143) (−0.116,0.02)
27 (−0.063,0.031) (0,0.153) (−0.07,0.016) (−0.023,0.131) (−0.103,0.029)
30 (−0.056,0.027) (0,0.099) (−0.055,0.01) (−0.011,0.127) (−0.088,0.033)
33 (−0.047,0.026) (0,0.038) (−0.039,−0.003) (−0.005,0.104) (−0.073,0.037)
36 (−0.039,0.03) (0,0) (−0.021,0) (0,0.093) (−0.065,0.037)

6. Discussion

In this paper, we proposed new measures characterizing the treatment benefit, which are intuitive to understand and can be easily communicated to clinicians and practitioners. The overall measurement, chance of a longer survival, describes the treatment effects on the whole patient population; the personalized measurement, personalized chance of a longer survival, takes into account the individualized characteristics. Compared to the usual summary measures such as hazard ratio, the proposed quantities can be interpreted even when the proportional hazard assumption is violated.

As an alternative, the restricted mean survival time has been used to characterize treatment effects in some settings where proportional hazards assumptions done hold. It can be estimated based on areas between two treatment arm survival curves up to a fixed time point. Hence, the interpretation varies with different end time points. Again, the new method estimates the probability of living longer, or living at least m months longer could be a more direct and natural way of describing a patient’s expected experience. Therefore, we think it may more directly encodes the probabilistic part of treatment effect estimation. It likely also makes more sense to the physician attempting to explain the relative efficacy of a new therapy

Along with the new measures, we also presented new estimation and inference tools that are practically useful and theoretically valid. Note that even if we are only interested in estimating the overall measurement, the personalized measurements are still valuable, where we can aggregate the personalized measures over all X to estimate the chance of a longer survival. The proposed work should prove to be a useful tool that will aid in analysis and interpretation of treatment benefits in clinical trials.

There are two important future directions we are pursuing. First, we would like to develop methods that can be used for power analysis and sample size calculation, which is important in clinical trial design. Although there are connections with hazard ratios in some situations, and the existing tools can be directly applied, a more general procedure is needed to plan ahead a sample size, which is sufficient to guarantee that we achieve the desired personalized chance of a longer survival on treatment. In this case, the goal is to determine a minimal sample size such that

P{|Δ^(m;x)Δ(m;x)|d}1α,

where α ∈ (0,1) and d > 0. We will need to derive the asymptotic distribution of Δ^(m;x). As shown in (29), the kernel estimators are asymptotically normal. In addition, the variance of Δ^(m;x) is of the order nbn. Assume that var{Δ^(m;x)}=C/nbn, and bn = nr. Then the minimum sample size is

(Cd2z1α/22)1r,

where Zα is the 100% α upper percentile of the standard normal distribution. Even if the calculations did not drive the primary study design Phase III trial sample size, which may be based on proportional hazards testing, the approximate variance calculations from above will be useful. They can be used to assess the prospective statistical properties personalized chance of living longer estimates from such a Phase III study as co-primary or key secondary objectives of the trial design. Second, individualized treatment rules (30; 31), tailoring treatment decisions as a function from patient characteristics to a recommended treatment, have become increasingly popular in recent years. Usually, the treatment rules are constructed by maximizing the mean of a pre-specified clinical outcome if applied to recommend treatments in a population of interest. It would be interested to develop treatment rules that utilize the proposed measure as a criterion.

Acknowledgement

The authors thank John Crowley for the helpful discussion. The authors gratefully acknowledge support by R01DK108073, U10CA180819, P30 CA015704, S10 OD020069 awarded by the National Institutes of Health, and Coltman Early Career Fellowship awarded by Hope Foundation.

Appendix. Proof of Proposition 3.1.

Let Δ^(m;x)=g^1(x)g^2(x), where g^1(x) and g^2(x) correspond to the terms in (3). We first consider the bias of g^1(x) relative to g1(x) = g1(x, x), where g1(x1,x0) = P(Y1Y0 > m|X1 = x1,X0 = x0).

E[i=1n1j=1n0Kbn(xX1i)Kbn(xX0j)L(Y1i,Y0j)]=I(y1y0>m)1n0bnp1n1bnpK(xu1bn)K(xu0bn)f(u1,u0,y1,y0)du1du0dy1dy0=I(y1y0>m)1n0bnp11n1bnp1K(s1)K(s0)f(xbns1,xbns0,y1,y0)ds1ds0dy1dy0, (4)

using the change of variable s1 = (xu1)/bn and s0 = (xu0)/bn. Note that f(xbns1,xbns0,y1,y0) = f(y1,y0|xbns1,xbns0)f(xbns1,xbns0). The above quantity is equal to

K(s1)K(s0)f(xbns1,xbns0)I(y1y0>m)f(y1,y0|xbns1,xbns0)ds1ds0dy1dy0=K(s1)K(s0)f(xbns1,xbns0)Q(xbns1,xbns0)ds1ds0.

Using Taylor series expansions for f(xbns1,xbns0) and g1(xbns1,xbns0), we further obtain (4) equals to

f(x,x)g1(x)+bn2σK4{fx1(1)(x,x)g1,x1(1)(x,x)+fx2(1)(x,x)g1,x2(1)(x,x)+fx1(1)(x,x)g1,x2(1)(x,x)+fx2(1)(x,x)g1,x1(1)(x,x)+f(x,x)g1,x1(2)(x,x)+f(x,x)g1,x2(2)(x,x)+f(x,x)g1,x1(1)(x,x)g1,x2(1)(x,x)+fx1(2)(x,x)g1(x)+fx2(2)(x,x)g1(x)+2fx1(1)(x,x)fx2(1)(x,x)g1(x)+o(h2)},

where σK2=u2K(u)du. Similarly,

E[i=1n1j=1n0Kbn(xX1i)Kbn(xX0j)]=f(x,x)+bn22σK4{fx1(2)(x,x)+fx2(2)(x,x)+2fx1(1)(x,x)fx2(1)(x,x)}.

Therefore,

E{g^1(x)}f(x,x)[g1(x)+bn2σK4{(*)}]f(x,x)[1+bn2σK4{fx1(2)+fx2(2)+2fx1(1)fx2(1)}/2f]=g1(x)+bn22σK4M(x),

Where

M(x)={fx1(1)(x,x)g1,x1(1)(x,x)+fx2(1)(x,x)g1,x2(1)(x,x)+fx1(1)(x,x)g1,x2(1)(x,x)+fx2(1)(x,x)g1,x1(1)(x,x)+f(x,x)g1,x1(2)(x,x)+f(x,x)g1,x2(2)(x,x)+f(x,x)g1,x1(1)(x,x)g1,x2(1)(x,x)},

using the approximation that (1 + h2c)−1 ≈ 1 − h2c for small h. Hence, the result on bias of g^1(x) follows.

Now,

Var[1n1n0i=1n1j=1n0Kbn(xX1i)Kbn(xX0j)L(Y1i,Y0j)]=1n1n0E[Kbn(xX1i)Kbn(xX0j)L(Y1i,Y0j)]2O(n1)R(K)2f(x,x)n0n1bn2p[g1(x)2+v1(x)}],

where R(K) = ∫ K(u)2du, and v1(x) = var{L(Y1,Y0)|X1 = X,X0 = X)}|X=x. Also,

Var[1n1n0i=1n1j=1n0Kbn(xX1i)Kbn(xX0j)]R(K)2f(x,x)n0n1bn2p.

Finally, Also,

Cov[1n1n0i=1n1j=1n0Kbn(xX1i)Kbn(xX0j),1n1n0i=1n1j=1n0Kbn(xX1i)Kbn(xX0j)L(Y1i,Y0j)]R(K)2f(x,x)g1(x)n0n1bn2p.

Hence, using the formula for the variance of the ratio of two random variables,

Var{g^1(x)}R(K)2v1(x)n0n1bn2p4R(K)2v1(x)n2bn2p.

The desired results thus follow. □

References

  • [1].Hattori S, Henmi M. Estimation of treatment effects based on possibly misspecified Cox regression. Lifetime Data Analysis 2012; 18(4):408–433. [DOI] [PubMed] [Google Scholar]
  • [2].Chen TT. Milestone survival: a potential intermediate endpoint for immune checkpoint inhibitors. JNCI: Journal of the National Cancer Institute 2015; 107(9). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Klein JP, Logan B, Harhoff M, Andersen PK. Analyzing survival curves at a fixed point in time. Statistics in Medicine 2007; 26(24):4505–4519. [DOI] [PubMed] [Google Scholar]
  • [4].Zhao L, Tian L, Uno H, Solomon SD, Pfeffer MA, Schindler JS, Wei LJ. Utilizing the integrated difference of two survival functions to quantify the treatment contrast for designing, monitoring, and analyzing a comparative clinical study. Clinical Trials 2012; 9(5):570–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Trinquart L, Jacot J, Conner SC, Porcher R. Comparison of treatment effects measured by the hazard ratio and by the ratio of restricted mean survival times in oncology randomized controlled trials. Journal of Clinical Oncology 2016; 34(15):1813–1819. [DOI] [PubMed] [Google Scholar]
  • [6].Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM, Network CGAR, et al. The cancer genome atlas pan-cancer analysis project. Nature genetics 2013; 45(10):1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Reck M, Rodriguez-Abreu D, Robinson AG, Hui R, Csoszi T, Fulop A, Gottfried M, Peled N, Tafreshi A, Cuffe S, et al. Pembrolizumab versus chemotherapy for pd-l1–positive non–small-cell lung cancer. New England Journal of Medicine 2016; 375(19):1823–1833. [DOI] [PubMed] [Google Scholar]
  • [8].Carbone DP, Reck M, Paz-Ares L, Creelan B, Horn L, Steins M, Felip E, van den Heuvel MM, Ciuleanu TE, Badin F, et al. First-line nivolumab in stage iv or recurrent non–small-cell lung cancer. New England Journal of Medicine 2017; 376(25):2415–2426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Goldberg Y, Kosorok MR. Q-learning with censored data. Annals of statistics 2012; 40(1):529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Zhao YQ, Zeng D, Laber EB, Song R, Yuan M, Kosorok MR. Doubly robust learning for estimating individualized treatment with censored data. Biometrika 2014; 102(1):151–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Shen J, Wang L, Daignault S, Spratt DE, Morgan TM, Taylor JM. Estimating the optimal personalized treatment strategy based on selected variables to prolong survival via random survival forest with weighted bootstrap. Journal of biopharmaceutical statistics 2018; 28(2):362–381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Tian L, Alizadeh AA, Gentles AJ, Tibshirani R. A simple method for estimating interactions between a treatment and a large number of covariates. Journal of the American Statistical Association 2014; 109(508):1517–1532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Henderson NC, Louis TA, Rosner GL, Varadhan R. Individualized treatment effects with censored data via fully nonparametric bayesian accelerated failure time models. Biostatistics 2018;. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Ellis LM, Bernstein DS, Voest EE, Berlin JD, Sargent D, Cortazar P, Garrett-Mayer E, Herbst RS, Lilenbaum RC, Sima C, et al. American society of clinical oncology perspective: raising the bar for clinical trials by defining clinically meaningful outcomes. J Clin Oncol 2014; 32(12):1277–1280. [DOI] [PubMed] [Google Scholar]
  • [15].Peron J, Roy P, Ozenne B, Roche L, Buyse M. The net chance of a longer survival as a patient-oriented measure of treatment benefit in randomized clinical trials. JAMA Oncology 2016; 2(7):901–905. [DOI] [PubMed] [Google Scholar]
  • [16].Buyse M Generalized pairwise comparisons of prioritized outcomes in the two-sample problem. Statistics in Medicine 2010; 29(30):3245–3257. [DOI] [PubMed] [Google Scholar]
  • [17].Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 1974; 66:688–701. [Google Scholar]
  • [18].Holland PW. Statistics and causal inference. Journal of the American statistical Association 1986; 81(396):945–960. [DOI] [PubMed] [Google Scholar]
  • [19].Robins JM, Hernán MÁ, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000; 11(5):550–560. [DOI] [PubMed] [Google Scholar]
  • [20].Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society. Series B 1972; 34:187–220. [Google Scholar]
  • [21].Yp Mack, BW Silverman. Weak and strong uniform consistency of kernel regression estimates. Zeitschrift fürWahrscheinlichkeitstheorie und verwandte Gebiete 1982; 61(3):405–415. [Google Scholar]
  • [22].Hall P On convergence rates of suprema. Probability Theory and Related Fields 1991; 89(4):447–455. [Google Scholar]
  • [23].Hall P, Kang KH. Bootstrapping nonparametric density estimators with empirically chosen bandwidths. Annals of statistics 2001;:1443–1468. [Google Scholar]
  • [24].Borghaei H, Paz-Ares L, Horn L, Spigel DR, Steins M, Ready NE, Chow LQ, Vokes EE, Felip E, Holgado E, et al. Nivolumab versus docetaxel in advanced nonsquamous non–small-cell lung cancer. New England Journal of Medicine 2015; 373(17):1627–1639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Rittmeyer A, Barlesi F, Waterkamp D, Park K, Ciardiello F, Von Pawel J, Gadgeel SM, Hida T, Kowalski DM, Dols MC, et al. Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): a phase 3, open-label, multicentre randomised controlled trial. The Lancet 2017; 389(10066):255–265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Redman MW, Crowley JJ, Herbst RS, Hirsch FR, Gandara DR. Design of a phase III clinical trial with prospective biomarker validation: SWOG S0819. Clinical Cancer Research 2012; 18(15):4004–4012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Herbst RS, Redman MW, Kim ES, Semrad TJ, Bazhenova L, Masters G, Oettel K, Guaglianone P, Reynolds C, Karnad A, et al. Cetuximab plus carboplatin and paclitaxel with or without bevacizumab versus carboplatin and paclitaxel with or without bevacizumab in advanced NSCLC (SWOG S0819): a randomised, phase 3 study. The Lancet Oncology 2018; 19(1):101–114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Thatcher N, Hirsch FR, Luft AV, Szczesna A, Ciuleanu TE, Dediu M, Ramlau R, Galiulin RK, Balint B, Losonczy G, et al. Necitumumab plus gemcitabine and cisplatin versus gemcitabine and cisplatin alone as first-line therapy in patients with stage iv squamous non-small-cell lung cancer (squire): an open-label, randomised, controlled phase 3 trial. The Lancet Oncology 2015; 16(7):763–774. [DOI] [PubMed] [Google Scholar]
  • [29].Schuster EF, et al. Joint asymptotic distribution of the estimated regression function at a finite number of distinct points. The Annals of Mathematical Statistics 1972; 43(1):84–88. [Google Scholar]
  • [30].Qian M, Murphy SA. Performance guarantees for individualized treatment rules. The Annals of Statistics 2011; 39:1180–1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Zhao YQ, Zeng D, Rush AJ, Kosorok MR. Estimating individualized treatment rules using outcome weighted learning. Journal of American Statistical Association 2012; 107:1106–1118. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES