Quantifying the totality of treatment effect with multiple event-time observations in the presence of a terminal event from a comparative clinical study

Brian Claggett; Lu Tian; Haoda Fu; Scott D Solomon; Lee-Jen Wei

doi:10.1002/sim.7907

. Author manuscript; available in PMC: 2020 Feb 14.

Published in final edited form as: Stat Med. 2018 Jul 25;37(25):3589–3598. doi: 10.1002/sim.7907

Quantifying the totality of treatment effect with multiple event-time observations in the presence of a terminal event from a comparative clinical study

Brian Claggett ¹, Lu Tian ², Haoda Fu ³, Scott D Solomon ¹, Lee-Jen Wei ⁴

PMCID: PMC7021204 NIHMSID: NIHMS1551881 PMID: 30047148

Abstract

To evaluate the totality of one treatment’s benefit/risk profile relative to an alternative treatment via a longitudinal comparative clinical study, the timing and occurrence of multiple clinical events are typically collected during the patient’s follow-up. These multiple observations reflect the patient’s disease progression/burden over time. The standard practice is to create a composite endpoint from the multiple outcomes, the timing of the occurrence of the first clinical event, to evaluate the treatment via the standard survival analysis techniques. By ignoring all events after the composite outcome, this type of assessment may not be ideal. Various parametric or semiparametric procedures have been extensively discussed in the literature for the purposes of analyzing multiple event-time data. Many existing methods were developed based on extensive model assumptions. When the model assumptions are not plausible, the resulting inferences for the treatment effect may be misleading. In this article, we propose a simple, nonparametric inference procedure to quantify the treatment effect, which has an intuitive clinically meaningful interpretation. We use the data from a cardiovascular clinical trial for heart failure to illustrate the procedure. A simulation study is also conducted to evaluate the performance of the new proposal.

Keywords: clinical trials, composite endpoint, counting process, multiple outcomes, nonparametric estimation, survival analysis, Wei-Lin-Weissfeld procedure

1 ∣. INTRODUCTION

In a longitudinal clinical study, each patient may experience any of several clinical events at various time points during the follow-up period. Such multiple event-time observations provide a temporal profile of the patients’ disease burden or progression. An important question is how to utilize these observations collectively, for instance, to evaluate a new therapy versus the standard care from a risk-benefit perspective. A common practice is to consider either the time from enrollment or randomization to a specific event or to the first occurrence of one of a collection of prespecified clinical events as the study’s primary endpoint and to then analyze such data using standard inference procedures from survival analysis. Such approaches, however, may not utilize all relevant information to fully answer the clinical question of interest.

As an example, a randomized comparative clinical trial, “Beta-Blocker Evaluation of Survival Trial (BEST),” was conducted to evaluate whether the beta-blocking drug, bucindolol, would benefit patients with advanced chronic heart failure (BEST, 2001). For this study, there were 2708 patients enrolled, randomized to receive either placebo or the beta blocker, who were then followed for an average of two years. The patient’s overall survival time was chosen as the primary endpoint of the study. For the comparison of the two treatment groups, the p-value of the two-sample logrank test was 0.11 with the 0.95 confidence interval for the hazard ratio of (0.78, 1.02), numerically, but not significantly, in favor of the beta blocker. Although mortality is an important endpoint, an evaluation of the beta blocker’s benefits and risks should also include morbidity for chronic heart failure patients over the course of the study. Clinically important morbidity events for these patients are, for instance, hospitalization for worsening heart failure (WHF), nonheart failure hospitalization (NHFH), myocardial infarction (MI), and heart transplant (HT). The BEST study is a typical cardiovascular trial for which the times to nonfatal events prior to a terminal event (eg, death) can be potentially observed for each patient. If we follow the conventional approach using a composite endpoint, ie, the time of the first occurrence of any of the aforementioned five distinct events as the endpoint, the resulting Kalpan-Meier curves for two arms are given in Figure 1. The 0.95 confidence interval for the hazard ratio is (0.85, 1.02) and the p-value of the logrank test is 0.10. Furthermore, if we consider the distribution of each specific component event, it is apparent that the composite event is more often an occurrence of NHFH and less often WHF in the bucindolol arm (Table 1), even though each of these types of events occurs in fewer patients randomized to bucindolol than in patients randomized to placebo. Like the results from the mortality analysis, the beta blocker has only modest statistical evidence of benefit in this population with respect to this composite outcome.

Time to first occurrence of worsening heart failure, nonheart failure hospitalization, heart transplant, myocardial infarction, or death [Colour figure can be viewed at wileyonlinelibrary.com]

TABLE 1.

Total number of patients experiencing each type of event, and specific type of clinical events represented by the composite outcome, by treatment group

Event type	Placebo	Bucindolol
Worsening HF	569 (42%)	476 (35%)
Non-HF Hosp	634 (47%)	619 (46%)
Death	449 (33%)	411 (30%)
MI	85 (6%)	46 (3%)
Transplant	41 (3%)	29 (2%)
Composite	971 (72%)	931 (69%)
Composite event type
Worsening HF	393 (40%)	341 (37%)
Non-HF Hosp	445 (46%)	466 (50%)
Death	99 (10%)	103 (11%)
MI	32 (3%)	18 (2%)
Transplant	2 (<1%)	3 (<1%)
Total	971	931

Open in a new tab

Abbreviations: HF, heart failure.

If the clinical questions regarding the risks and benefits of bucindolol extend beyond the simple analysis of mortality or of the occurrence of the first composite event, several novel statistical procedures for comparing two groups may be used to analyze such multiple event time observations.^1-8 These methods generally utilize model-based parameters to quantify the between-group difference. As is the case with univariate survival analysis, when the model assumptions are not plausible, the resulting estimates for the parameters may be difficult to interpret clinically.^9-14

In this article, in order to include both mortality and morbidity events beyond the first composite endpoint, we consider the patient’s endpoint based on “reverse” counting process, R(t) over time t, which provides the profile of the multiple event times that comprise the aforementioned composite outcome. For example, with the aforementioned five clinical events, ie, WHF, NHFH, MI, HT, and death, in the BEST study, Figure 2 shows several realizations of the R(·) process. Each realization is a downward step function starting with a y-axis value of five, the number of distinct types of event under consideration. At the time of an occurrence of a nonterminal (nonfatal) event, R(·) drops by one unit, but at the time of the terminal event, R(·) drops to zero. At a specific time t, R(t) represents the number of the composite events not experienced at t. The area under this step function at t, A(t), is the sum of five event-free survival times up to t. For example, for the first realization of R(·) in Figure 2, the observed A(48) is 118 (months). That is, this patient enjoyed 10 months of HF-free survival, 18 months free of MI-free survival, 30 months of HT-free survival, 30 months of NHFH-free survival, and 30 months of overall survival. The cumulative total of these is 118 months of event-free survival. Noting that the ideal case of a patient without any increase in disease burden over the study period of interest (ie, here, 48 months) would correspond to A(48) = 240(= 48 × 5) months, this particular patient experienced 49%(= 118/240) of the maximum possible cumulative event-free survival, or conversely, 51%(= 1 – 118/240) of the maximum possible disease burden over this time period, as measured by this combination of morbidity and mortality. The values A(t) or the aforementioned ratio, P(t), eg, would be clinically meaningful summaries for the temporal profile of patient health with regards to these multiple event times up to time t. Note that for the second realization in Figure 2, only one of the five outcomes is observed prior to the patient’s censoring at month 30. For this patient, A(48) is not fully observed, but the available partial information indicates that A(48) ∈ (140, 212] months (140 if the patient died the day after censoring; 212 if the patient experienced no subsequent events until month 48) and P(48) is between 0.12 and 0.42. It is important to note that in the presence of a terminal event such as death, the standard forward counting process as the patient’s endpoint is problematic, since this process is not well defined after death.

Profile of observed data from three hypothetical patients from randomization to end of follow-up. HF, heart failure; MI, myocardial infarction [Colour figure can be viewed at wileyonlinelibrary.com]

For the comparison of two groups, the difference or ratio of the two expected values E(R(t)), E(A(t)), or E(P(t)) is a clinically interpretable model-free summary to quantify the between-group contrast. In this paper, we present inference procedures for handling one- and two-sample problems. All of the proposals are illustrated with the data from the BEST study. Note that for the case with a single event time observation for each patient, E(R(t)) reduces to the survival rate S(t) and E(A(t)) is the so-called restricted mean survival time at time t, which has been extensively studied.^13,15-21 Furthermore, classical methods either treat the terminal events as censoring,²² or extend the counting process of nonfatal and terminal events by assuming that there is no nonfatal event after death.^6,23,24 The validity of the former approach relies on the non-informative censoring assumption, which unrealistically assumes that the terminal event is independent of the nonfatal events during the follow-up. In addition, this approach blurs an important distinction between censoring and terminal events, ie, the patients’ history exists and is still potentially observable after censoring but genuinely terminates at the terminal event. The latter approaches do not differentiate nonfatal events from fatal (ie, terminal) events, which are often of greater clinical concern and may therefore yield misleading comparisons. For example, a significantly lower overall incidence rate of events in one arm may result from a higher early mortality rate. Ignoring such mortality difference may lead to a false conclusion of a beneficial effect of the therapy under investigation. Other methods^25,26 explicitly model the joint distribution of nonfatal and terminal events and produce estimators of the treatment effect on nonfatal and fatal events separately. These methods are heavily model dependent, and it is difficult to combine the two estimates into a single summary in a setting of binary decision making. A semiparametric model was recently proposed²⁷ for a composite outcome based on prespecified weights for different types of events, relying on an assumption of multiplicative effects on the marginal rate function.

2 ∣. ONE- AND TWO-SAMPLE INFERENCE PROCEDURES

Suppose that for each study subject, there are (K + 1) distinct types of events of interest, which can be potentially observed during the study follow-up. In addition, assume that the (K + 1)th event is the only terminal event. Let R(·) be the reverse counting process described in the Introduction with respect to these K + 1 events. In this section, we are interested in making inferences about the parameters E(R)(·), E(A)(·), and E(P)(·). Now, let T_k, k = 1, … , K + 1, be the minimum of ${\tilde{T}}_{k}$ and ${\tilde{T}}_{K + 1}$ , where ${\tilde{T}}_{k}$ is the time to the first occurrence of the kth type of event. Then,

R (t) = \sum_{k = 1}^{K + 1} I (T_{k} \geq t),

(1)

where I(·) is the indicator function,

A (t) = \sum_{k = 1}^{K + 1} A_{k} (t),

(2)

where A_k(t) is the minimum of T_k and t, and

P (t) = 1 - \frac{A (t)}{t (K + 1)} .

(3)

Note that E(A_k)(t) is the restricted mean survival time up to time t for T_k, which is the area under the survival curve for T_k up to time t.

The aforementioned processes may not be observed completely if the terminal event time ${\tilde{T}}_{K + 1}$ is censored by a random variable C, which is assumed to be independent of ${\tilde{T}}_{K + 1}$ , and the nonterminal event times ${\tilde{T}}_{1}, \dots, {\tilde{T}}_{K}$ . Let X_k be the minimum of T_k and C, Δ_k = 1 if T_k is observed and zero; otherwise, for k = 1, … , K, and $\bar{Δ} = 1$ if ${\tilde{T}}_{K + 1}$ is observed and zero otherwise. The data, $({R_{i} (t), 0 \leq t \leq C_{i} (1 - {\bar{Δ}}_{i}) + τ^{*} {\bar{Δ}}_{i}}, {\bar{Δ}}_{i})$ , i = 1, … , n, where τ* is the maximum study follow-up time, consists of n independent copies of $({R (t), 0 \leq t \leq C (1 - \bar{Δ}) + τ^{*} \bar{Δ}}, \bar{Δ})$ .

Using (1) to (3), E(R)(t), E(P)(t), and E(A)(t) can be consistently estimated with these n sets of possibly incomplete observations by

\hat{E (R)} (t) = \sum_{k = 1}^{K + 1} {\hat{S}}_{k} (t),

(4)

where ${\hat{S}}_{k} (\cdot)$ is the Kaplan-Meier (KM) estimate for T_k based on {X_ik, Δ_ik, i = 1, … , n} for k = 1, ⋯, K + 1, and

\hat{E (A)} (t) = \sum_{k = 1}^{K + 1} \hat{E (A_{k})} (t),

(5)

where $\hat{E (A_{k})} (t)$ is the area under the KM curve ${\hat{S}}_{k} (\cdot)$ up to time t.

It is important to note that although the censoring variable C is assumed to be independent of all the event times, there may be positive correlation between, eg, $R_{i} ({\tilde{T}}_{i, K + 1})$ and R_i(C_i) and similarly for outcome processes A(t) and P(t).^28,29 Such an induced dependence also presents in analyses of quality-of-life adjusted survival and medical costs^30,31 and results in some technical difficulty for deriving the large sample properties of $\hat{E (R)} (\cdot)$ and $\hat{E (A)} (\cdot)$ . For the present case, due to the decompositions (4) and (5), one may use similar techniques^1,2 to justify the large sample mean-zero Gaussian approximations to the distributions of ${\hat{E (R)} (t) - E (R) (t)}$ and ${\hat{E (A)} (t) - E (A) (t)}$ as processes over time (Appendix A in Supporting information section), such that t ≤ τ, pr(X_k > τ) > 0 for all k. In practice, approximations to these distributions can be obtained via a perturbation-resampling method. Specifically, a perturbed version of each KM estimate is

S_{k}^{*} (t) = exp [- \sum_{i = 1}^{n} \int_{0}^{t} \frac{V_{i} d {I (u \leq T_{i k}) Δ_{i k}}}{\sum_{l = 1}^{n} V_{l} I (X_{l k} \geq u)}],

(6)

where t ≤ τ and {V_i : i = 1, … n} is a random sample of size n from the standard exponential distribution. For each realization of random weights {V_i}, let $\hat{E (R^{*})} (t) = \sum_{k} S_{k}^{*} (t)$ , and $\hat{E (A^{*})} (t) = \sum_{k} {\hat{E (A^{*})}}_{k} (t)$ , where ${\hat{E (A^{*})}}_{k} (t)$ is the area under the KM curve $S_{k}^{*} (\cdot)$ up to time t. ^17-19 Then, the distribution of $\sqrt{n} {\hat{E (R) (\cdot)} - E (R) (\cdot)}$ can be approximated by the distribution of $\sqrt{n} {\hat{E (R^{*}) (\cdot)} - \hat{E (R) (\cdot)}}$ with a large number, M, of realizations of random weights {V_i}. Denote the observed variance as ${\hat{σ}}_{R}^{2} (\cdot)$ . Similarly, the distribution of $\sqrt{n} {\hat{E (A) (\cdot)} - E (A) (\cdot)}$ can be approximated by the distribution of $\sqrt{n} {\hat{E (A^{*})} (\cdot) - \hat{E (A) (\cdot)}}$ with the corresponding variance estimate ${\hat{σ}}_{A}^{2} (\cdot)$ . Thus, a (1 – α) confidence interval for E(R)(·), for t ≤ τ,is given by

(\hat{E (R) (\cdot)} - z_{1 - α ∕ 2} n^{- 1 ∕ 2} {\hat{σ}}_{R} (\cdot), \hat{E (R) (\cdot)} + z_{1 - α ∕ 2} n^{- 1 ∕ 2} {\hat{σ}}_{R} (\cdot)),

where z_1–α/2 is the (1−α/2)th quantile of the standard normal distribution. To preserve the range of E(R)(·) ∈ [0, K + 1], we may also first construct a confidence interval of g⁻¹(E(R)(·)) based on the proposed perturbation method, and then transform the resulting confidence interval using g(·) to obtain an appropriate confidence interval for E(R)(·), where g(·) is a given monotone function (−∞, +∞) → [0, K + 1]. Similarly, a (1 – α) confidence interval for E(A)(·) is given by

(\hat{E (A) (\cdot)} - z_{1 - α ∕ 2} n^{- 1 ∕ 2} {\hat{σ}}_{A} (\cdot), \hat{E (A) (\cdot)} + z_{1 - α ∕ 2} n^{- 1 ∕ 2} {\hat{σ}}_{A} (\cdot)) .

This resampling technique has been utilized in dealing with various survival analysis problems.^32,33

Now, suppose we are interested in constructing a simultaneous confidence band for E(R)(·) or E(A)(·) over a specific range t ∈ [a, b], where a is larger than the first observed event time and b is smaller than the largest observed follow-up time. The equal precision (1 – α) confidence bands³⁴ can be constructed by

(\hat{E (R) (\cdot)} - c_{α} n^{- 1 ∕ 2} {\hat{σ}}_{R} (\cdot), \hat{E (R) (\cdot)} + c_{α} n^{- 1 ∕ 2} {\hat{σ}}_{R} (\cdot))

and

(\hat{E (A) (\cdot)} - d_{α} n^{- 1 ∕ 2} {\hat{σ}}_{A} (\cdot), \hat{E (A) (\cdot)} + d_{α} n^{- 1 ∕ 2} {\hat{σ}}_{A} (\cdot)),

where c_α is chosen such that

pr (sup_{t \in [a, b]} ∣ \frac{\hat{E (R^{*})} (t) - \hat{E (R)} (t)}{{\hat{σ}}_{R} (t)} ∣ > c_{α}) = α,

and d_α is chosen such that

pr (sup_{t \in [a, b]} ∣ \frac{\hat{E (A^{*})} (t) - \hat{E (A)} (t)}{{\hat{σ}}_{A} (t)} ∣ > d_{α}) = α .

With the data from the placebo arm of the BEST study for the five distinct events discussed in the introduction, Figure 3 gives the estimate $\hat{E (R)} (t)$ with the 0.95 pointwise confidence intervals and simultaneous confidence bands for E(R)(t) for 1 ≤ t ≤ 48 months based on M = 500 sets of perturbed data. These bands are quite informative; for example, in the placebo group, at t = 48 months, on average, there are 2.00 events not occurring before the death with a 0.95 pointwise confidence interval of (1.81, 2.18). The estimated sum of all the event-free survival times, $\hat{E (A)} (48)$ , is 150.8 months with a 0.95 confidence interval of (146.8, 154.8) months. Correspondingly, the estimated proportion of maximum morbidity/mortality experienced was $\hat{E (P)} (48)$ is 0.372 (0.355, 0.388).

Point and interval estimates of E(R)(t) over time from the placebo arm of the Beta-Blocker Evaluation of Survival Trial. Solid curve represents point estimates, with 0.95 pointwise and simultaneous confidence intervals denoted by dashed lines and gray shading, respectively

Now, if we are interested in making inferences about the difference $D_{R} (\cdot)$ of E(R_j)(·) between two treatment groups j( = 0, 1), the resulting ${\hat{D}}_{R} (\cdot) = \hat{E (R_{1})} (\cdot) - \hat{E (R_{0})} (\cdot)$ can be obtained via the corresponding empirical counterparts, $\hat{E (R_{j})} (\cdot)$ . The distribution of ${\hat{D}}_{R} (\cdot)$ can be approximated via the aforementioned resampling method. Our procedure is an extension of a proposal³⁵ for the case with the univariate event time observations. The difference $D_{A} (t)$ of two E(A)(t)s can be estimated by its counterpart via ${\hat{D}}_{A} (t) = \hat{E (A_{1})} (t) - \hat{E (A_{0})} (t)$ . With the data from the BEST study, Figure 4 shows the estimated $\hat{E} (R) (\cdot)$ process for both bucindolol and the placebo groups, along with the corresponding contrast ${\hat{D}}_{R} (\cdot)$ between the beta blocker and the control arms. At t = 24 months, the estimated difference is 0.19 events with a 0.95 confidence interval of (0.03, 0.36). At t = 48 months, the estimated difference is 0.18 but with a wider 0.95 confidence interval of (−0.09, 0.46). Note that for each of these comparisons, no information is used regarding the temporal profile of events occurring prior to the selected time point. In order to utilize both the occurrence and the timing of the events, we may use the estimated cumulative difference in total event-free survival time $D_{A} (t)$ . At the end of follow-up, this is 7.6 months with a 0.95 confidence interval of (1.5, 13.7) months, demonstrating a significant overall beneficial effect of the active therapy over placebo. Alternatively, this overall treatment difference can be expressed as ${\hat{R}}_{A} (t) = \hat{E (A_{1})} (t) ∕ \hat{E (A_{0})} (t) = 1.05 (1.01, 1.09)$ , indicating an estimated 5% increase in event-free survival time, with p = 0.015 for the test of equality between treatment groups. Another interesting expression is via the comparison of the proportion of follow-up time lost to morbidity and mortality, P_j(t). The ratio of these two estimates ${\hat{R}}_{P} (t) = \hat{E (P_{1})} (t) ∕ \hat{E (P_{0})} (t) = 0.92 (0.85, 0.98)$ shows an 8% decrease in morbidity/mortality.

Left: estimated number of events not yet experienced in each treatment arm. Right: treatment effect ${\hat{D}}_{R} (\cdot)$ as a function of follow-up time. Solid curve represents point estimates, with 0.95 pointwise and intervals denoted by dashed lines [Colour figure can be viewed at wileyonlinelibrary.com]

3 ∣. SIMULATIONS

In order to assess the properties of the proposed area under the curve, $\hat{E (A)} (t)$ , for the purpose of comparing two treatment groups, we performed an extensive simulation, intended to mimic a trial setting similar to that of the BEST trial. In the following simulations, we consider a trial with N = 1500, 1000, or 500 patients followed for a maximum time τ of 4 years, in which there are a total of four clinical event of interest, ie, three nonfatal events in addition to all-cause mortality. In all scenarios, the event times in the placebo group are drawn from Weibull distributions with shape parameter 0.8, and scale parameters 2000, 3000, and 4000 for the nonfatal event and 8000 for the fatal events, which correspond to survival probabilities of 46%, 57%, 64%, and 77%, respectively, at the end of the follow-up period. In order to reflect the common scenario in which event times are correlated within patients, we induce a shared frailty parameter drawn from a gamma distribution with unit mean and variance = 2.^25,26,36 In Scenario 0, the treatment has no effect on any of the four clinical outcomes, representing the null hypothesis. We then consider treatment effects, which reduce time lost to morbidity/mortality by either 10% (moderate effect) or 20% (strong effect). In Scenario 1, the treatment effect is strong with respect to the two more frequent nonfatal events but moderate for the other two events. In Scenario 2, the treatment effect is strong with respect to the two less frequent events but moderate for the more frequent events. In Scenario 3, the treatment effect is strong with respect to all four events. Within each scenario, we considered that the treatment effect may manifest itself through a constant reduction in hazard, (ie, the shape parameter remains constant, and the scale parameters is increased in the treated arm, PH assumption), or alternatively, through a delay in event times, such that the treatment and control groups’ survival curves become equal at the end of the study, but the treatment group’s survival curve is uniformly above the control group’s for the duration of the study (ie, the shape parameter is increased in the treated arm, nonPH assumption). We assume independent administrative censoring, reflecting a hypothetical five-year trial with three years of uniform enrollment, so that every patient is followed for at least two years. We compare the proposed reverse counting process (RCP) method to the traditional “time-to-first” composite outcome compared via the log-rank (LR) test. Table 2 shows the proportion of simulated data sets in which the null hypothesis of no treatment effect is rejected at the α = 0.05 level.

TABLE 2.

Two-sample power

Scenario	Treatment Effect Frequent Events	Treatment Effect Less Frequent Events	Proportional Hazards?	Method 1: RCP	Method 2: LR (first event)
				n=1500
0	none	none	Yes	6%	5%
1	strong	moderate	Yes	48%	48%
			No	53%	42%
2	moderate	strong	Yes	52%	31%
			No	62%	29%
3	strong	strong	Yes	73%	62%
			No	83%	57%
				n=1000
0	none	none	Yes	6%	4%
1	strong	moderate	Yes	34%	36%
			No	40%	28%
2	moderate	strong	Yes	37%	22%
			No	44%	20%
3	strong	strong	Yes	59%	46%
			No	68%	40%
				n=500
0	none	none	Yes	4%	6%
1	strong	moderate	Yes	19%	17%
			No	21%	18%
2	moderate	strong	Yes	21%	13%
			No	24%	13%
3	strong	strong	Yes	37%	31%
			No	40%	24%

Open in a new tab

In Scenario 0, we find that type-I error is well controlled. In Scenarios 1 to 3, we see that the proposed metric has equal or greater power than the standard “time-to-first” event approach in all settings, particularly when the treatment effect is strong with respect to the less frequent (fatal) events and when the PH assumption does not hold.

4 ∣. DISCUSSION

Although many statistical methods are currently available to compare two treatment groups in the presence of multiple outcomes, a method that is not dependent on a particular parametric modeling assumption is preferable. The ability to produce estimates of treatment effects which cannot be undermined by model misspecification should be seen as a benefit to investigators, sponsors, and regulators, each of whom rely on the robustness of the inferences drawn from clinical studies. Moreover, an intuitive and interpretable measure of the magnitude of treatment effect expressed in concrete terms such as the numbers of days spent event-free or the number of events prevented is quite attractive. For example, the constant intensity or rate function model for recurrent event times^6,22 may be theoretically interesting, but the results are difficult to interpret, especially when the model assumption is violated.

The methods proposed in this paper represent extensions of relatively standard concepts in the analysis of survival data to address to an important open question in the general community of clinical trialists. We note that, under certain circumstances, it may be desirable to modify the starting value of the reverse counting process or the relative values of the individual events in the reverse counting process R(0). For example, reducing the starting value to R(0) = 1 results in a conventional “time-to-first-event” analysis. One may also desire to implement weights w_k associated with each of the K + 1 event types, similar to adhoc procedures, which have appeared in the clinical literature.³⁷

Supplementary Material

supp

NIHMS1551881-supplement-supp.pdf^{(119.1KB, pdf)}

ACKNOWLEDGEMENTS

This paper was prepared using BEST Research Materials obtained from the NHLBI Biologic Specimen and Data Repository Information Coordinating Center and does not necessarily reflect the opinions or views of the BEST investigators or the NHLBI. This research was partially supported by US NIH grants and contracts.

Funding information

US NIH

Footnotes

SUPPORTING INFORMATION

Technical appendices and additional results are provided in on-line Supplementary Materials.

Additional supporting information may be found online in the Supporting Information section at the end of the article.

REFERENCES

1.Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc. 1989;84(408):1065–1073. [Google Scholar]
2.Li QH, Lagakos SW. Use of the Wei–Lin–Weissfeld method for the analysis of a recurring and a terminating event. Statist Med. 1998;16(8):925–940. [DOI] [PubMed] [Google Scholar]
3.Wang M-C, Qin J, Chiang C-T. Analyzing recurrent event data with informative censoring. J Am Stat Assoc. 2001;96(455):1057–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Wang M-C, Chiang C-T. Non-parametric methods for recurrent event data with informative and non-informative censorings. Statist Med. 2002;21(3):445–456. [DOI] [PubMed] [Google Scholar]
5.Ghosh D, Lin DY. Semiparametric analysis of recurrent events data in the presence of dependent censoring. Biometrics. 2003;59(4):877–885. [DOI] [PubMed] [Google Scholar]
6.Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. J R Stat Soc Ser B Stat Methodol. 2000;62(4):711–730. [Google Scholar]
7.Huang C-Y, Wang M-C. Joint modeling and estimation for recurrent event processes and failure time data. J Am Stat Assoc. 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Wang M-C, Huang C-Y. Statistical inference methods for recurrent event processes with shape and size parameters. Biometrika. 2014;101(3):553–566. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Kalbfleisch JD, Prentice RL. Estimation of the average hazard ratio. Biometrika. 1981;68(1):105–112. [Google Scholar]
10.Struthers CA, Kalbfleisch JD. Misspecified proportional hazard models. Biometrika. 1986;73(2):363–369. [Google Scholar]
11.Lin DY, Wei LJ. The robust inference for the Cox proportional hazards model. J Am Stat Assoc. 1989;84(408):1074–1078. [Google Scholar]
12.Hernän MA. The hazards of hazard ratios. Epidemiology. 2010;21(1):13–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Uno H, Claggett B, Tian L, et al. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. J Clin Oncol. 2014;32(22):2380–2385. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Uno H, Wittes J, Fu H, et al. Alternatives to hazard ratios for comparing the efficacy or safety of therapies in noninferiority studies. Ann Intern Med. 2015;163(2):127–134. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Karrison T Restricted mean life with adjustment for covariates. J Am Stat Assoc. 1987;82(400):1169–1176. [Google Scholar]
16.Zucker DM. Restricted mean life with covariates: modification and extension of a useful survival analysis method. J Am Stat Assoc. 1998;93(442):702–709. [Google Scholar]
17.Royston P, Parmar MKB. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Statist Med. 2011;30(19):2409–2421. [DOI] [PubMed] [Google Scholar]
18.Zhao L, Tian L, Uno H, et al. Utilizing the integrated difference of two survival functions to quantify the treatment contrast for designing, monitoring, and analyzing a comparative clinical study. Clin Trials. 2012;9(5):570–577. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Tian L, Zhao L, Wei LJ. Predicting the restricted mean event time with the subject’s baseline covariates in survival analysis. Biostatistics. 2014;15(2):222–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Trinquart L, Jacot J, Conner SC, Porcher R. Comparison of treatment effects measured by the hazard ratio and by the ratio of restricted mean survival times in oncology randomized controlled trials. J Clin Oncol. 2016;34(15):1813–1819. [DOI] [PubMed] [Google Scholar]
21.A’Hern RP. Restricted mean survival time: an obligatory end point for time-to-event analysis in cancer trials? J Clin Oncol. 2016;34(28):3474–3476. [DOI] [PubMed] [Google Scholar]
22.Andersen PK, Gill RD. Cox’s regression model for counting processes: a large sample study. Ann Stat. 1982;10(4):1100–1120. [Google Scholar]
23.Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94(446):496–509. [Google Scholar]
24.Ghosh D, Lin DY. Nonparametric analysis of recurrent events and death. Biometrics. 2000;56(2):554–562. [DOI] [PubMed] [Google Scholar]
25.Liu L, Wolfe RA, Huang X. Shared frailty models for recurrent events and a terminal event. Biometrics. 2004;60(3):747–756. [DOI] [PubMed] [Google Scholar]
26.Rondeau V, Mathoulin-Pelissier S, Jacqmin-Gadda H, Brouste V, Soubeyran P. Joint frailty models for recurring events and death using maximum penalized likelihood estimation: application on cancer events. Biostatistics. 2007;8(4):708–721. [DOI] [PubMed] [Google Scholar]
27.Mao L, Lin DY. Semiparametric regression for the weighted composite endpoint of recurrent and terminal events. Biostatistics. 2016;17(2):390–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Glasziou PP, Simes RJ, Gelber RD. Quality adjusted survival analysis. Statist Med. 1990;9(11):1259–1276. [DOI] [PubMed] [Google Scholar]
29.Lin DY. Regression analysis of incomplete medical cost data. Statist Med. 2003;22(7):1181–1200. [DOI] [PubMed] [Google Scholar]
30.Lin DY, Feuer EJ, Etzioni R, Wax Y. Estimating medical costs from incomplete follow-up data. Biometrics. 1997;53(2):419–434. [PubMed] [Google Scholar]
31.Bang H, Tsiatis AA. Estimating medical costs with censored data. Biometrika. 2000;87(2):329–343. [Google Scholar]
32.Park Y, Wei LJ. Estimating subject-specific survival functions under the accelerated failure time model. Biometrika. 2003;90(3):717–723. [Google Scholar]
33.Cai T, Tian L, Uno H, Solomon SD, Wei LJ. Calibrating parametric subject-specific risk estimation. Biometrika. 2010;97(2):389–404. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Gilbert PB, Wei LJ, Kosorok MR, Clemens JD. Simultaneous inferences on the contrast of two hazard functions with censored observations. Biometrics. 2002;58(4):773–780. [DOI] [PubMed] [Google Scholar]
35.Parzen MI, Wei LJ, Ying Z. Simultaneous confidence intervals for the difference of two survival functions. Scand J Stat. 1997;24(3):309–314. [Google Scholar]
36.Rondeau V, Gonzalez JR, Mazroui Y, Mauguen A, Diakite A, Laurent A. Package ‘frailtypack’; 2013.
37.Armstrong PW, Westerhout CM, Van de Werf F, et al. Refining clinical trial composite outcomes: an application to the Assessment of the Safety and Efficacy of a New Thrombolytic-3 (ASSENT-3) trial. Am Heart J. 2011;161(5):848–854. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supp

NIHMS1551881-supplement-supp.pdf^{(119.1KB, pdf)}

[R1] 1.Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc. 1989;84(408):1065–1073. [Google Scholar]

[R2] 2.Li QH, Lagakos SW. Use of the Wei–Lin–Weissfeld method for the analysis of a recurring and a terminating event. Statist Med. 1998;16(8):925–940. [DOI] [PubMed] [Google Scholar]

[R3] 3.Wang M-C, Qin J, Chiang C-T. Analyzing recurrent event data with informative censoring. J Am Stat Assoc. 2001;96(455):1057–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Wang M-C, Chiang C-T. Non-parametric methods for recurrent event data with informative and non-informative censorings. Statist Med. 2002;21(3):445–456. [DOI] [PubMed] [Google Scholar]

[R5] 5.Ghosh D, Lin DY. Semiparametric analysis of recurrent events data in the presence of dependent censoring. Biometrics. 2003;59(4):877–885. [DOI] [PubMed] [Google Scholar]

[R6] 6.Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. J R Stat Soc Ser B Stat Methodol. 2000;62(4):711–730. [Google Scholar]

[R7] 7.Huang C-Y, Wang M-C. Joint modeling and estimation for recurrent event processes and failure time data. J Am Stat Assoc. 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Wang M-C, Huang C-Y. Statistical inference methods for recurrent event processes with shape and size parameters. Biometrika. 2014;101(3):553–566. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Kalbfleisch JD, Prentice RL. Estimation of the average hazard ratio. Biometrika. 1981;68(1):105–112. [Google Scholar]

[R10] 10.Struthers CA, Kalbfleisch JD. Misspecified proportional hazard models. Biometrika. 1986;73(2):363–369. [Google Scholar]

[R11] 11.Lin DY, Wei LJ. The robust inference for the Cox proportional hazards model. J Am Stat Assoc. 1989;84(408):1074–1078. [Google Scholar]

[R12] 12.Hernän MA. The hazards of hazard ratios. Epidemiology. 2010;21(1):13–15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Uno H, Claggett B, Tian L, et al. Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis. J Clin Oncol. 2014;32(22):2380–2385. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Uno H, Wittes J, Fu H, et al. Alternatives to hazard ratios for comparing the efficacy or safety of therapies in noninferiority studies. Ann Intern Med. 2015;163(2):127–134. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Karrison T Restricted mean life with adjustment for covariates. J Am Stat Assoc. 1987;82(400):1169–1176. [Google Scholar]

[R16] 16.Zucker DM. Restricted mean life with covariates: modification and extension of a useful survival analysis method. J Am Stat Assoc. 1998;93(442):702–709. [Google Scholar]

[R17] 17.Royston P, Parmar MKB. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Statist Med. 2011;30(19):2409–2421. [DOI] [PubMed] [Google Scholar]

[R18] 18.Zhao L, Tian L, Uno H, et al. Utilizing the integrated difference of two survival functions to quantify the treatment contrast for designing, monitoring, and analyzing a comparative clinical study. Clin Trials. 2012;9(5):570–577. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Tian L, Zhao L, Wei LJ. Predicting the restricted mean event time with the subject’s baseline covariates in survival analysis. Biostatistics. 2014;15(2):222–233. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Trinquart L, Jacot J, Conner SC, Porcher R. Comparison of treatment effects measured by the hazard ratio and by the ratio of restricted mean survival times in oncology randomized controlled trials. J Clin Oncol. 2016;34(15):1813–1819. [DOI] [PubMed] [Google Scholar]

[R21] 21.A’Hern RP. Restricted mean survival time: an obligatory end point for time-to-event analysis in cancer trials? J Clin Oncol. 2016;34(28):3474–3476. [DOI] [PubMed] [Google Scholar]

[R22] 22.Andersen PK, Gill RD. Cox’s regression model for counting processes: a large sample study. Ann Stat. 1982;10(4):1100–1120. [Google Scholar]

[R23] 23.Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94(446):496–509. [Google Scholar]

[R24] 24.Ghosh D, Lin DY. Nonparametric analysis of recurrent events and death. Biometrics. 2000;56(2):554–562. [DOI] [PubMed] [Google Scholar]

[R25] 25.Liu L, Wolfe RA, Huang X. Shared frailty models for recurrent events and a terminal event. Biometrics. 2004;60(3):747–756. [DOI] [PubMed] [Google Scholar]

[R26] 26.Rondeau V, Mathoulin-Pelissier S, Jacqmin-Gadda H, Brouste V, Soubeyran P. Joint frailty models for recurring events and death using maximum penalized likelihood estimation: application on cancer events. Biostatistics. 2007;8(4):708–721. [DOI] [PubMed] [Google Scholar]

[R27] 27.Mao L, Lin DY. Semiparametric regression for the weighted composite endpoint of recurrent and terminal events. Biostatistics. 2016;17(2):390–403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Glasziou PP, Simes RJ, Gelber RD. Quality adjusted survival analysis. Statist Med. 1990;9(11):1259–1276. [DOI] [PubMed] [Google Scholar]

[R29] 29.Lin DY. Regression analysis of incomplete medical cost data. Statist Med. 2003;22(7):1181–1200. [DOI] [PubMed] [Google Scholar]

[R30] 30.Lin DY, Feuer EJ, Etzioni R, Wax Y. Estimating medical costs from incomplete follow-up data. Biometrics. 1997;53(2):419–434. [PubMed] [Google Scholar]

[R31] 31.Bang H, Tsiatis AA. Estimating medical costs with censored data. Biometrika. 2000;87(2):329–343. [Google Scholar]

[R32] 32.Park Y, Wei LJ. Estimating subject-specific survival functions under the accelerated failure time model. Biometrika. 2003;90(3):717–723. [Google Scholar]

[R33] 33.Cai T, Tian L, Uno H, Solomon SD, Wei LJ. Calibrating parametric subject-specific risk estimation. Biometrika. 2010;97(2):389–404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Gilbert PB, Wei LJ, Kosorok MR, Clemens JD. Simultaneous inferences on the contrast of two hazard functions with censored observations. Biometrics. 2002;58(4):773–780. [DOI] [PubMed] [Google Scholar]

[R35] 35.Parzen MI, Wei LJ, Ying Z. Simultaneous confidence intervals for the difference of two survival functions. Scand J Stat. 1997;24(3):309–314. [Google Scholar]

[R36] 36.Rondeau V, Gonzalez JR, Mazroui Y, Mauguen A, Diakite A, Laurent A. Package ‘frailtypack’; 2013.

[R37] 37.Armstrong PW, Westerhout CM, Van de Werf F, et al. Refining clinical trial composite outcomes: an application to the Assessment of the Safety and Efficacy of a New Thrombolytic-3 (ASSENT-3) trial. Am Heart J. 2011;161(5):848–854. [DOI] [PubMed] [Google Scholar]

PERMALINK

Quantifying the totality of treatment effect with multiple event-time observations in the presence of a terminal event from a comparative clinical study

Brian Claggett

Lu Tian

Haoda Fu

Scott D Solomon

Lee-Jen Wei

Abstract

1 ∣. INTRODUCTION