Abstract
Recurrent event data are commonly encountered in clinical and epidemiological studies. A major complication arises when recurrent events are terminated by death. To assess the overall effects of covariates on the two types of events, we define a weighted composite endpoint as the cumulative number of recurrent and terminal events properly weighted by the relative severity of each event. We propose a semiparametric proportional rates model which specifies that the (possibly time-varying) covariates have multiplicative effects on the rate function of the weighted composite endpoint while leaving the form of the rate function and the dependence among recurrent and terminal events completely unspecified. We construct appropriate estimators for the regression parameters and the cumulative frequency function. We show that the estimators are consistent and asymptotically normal with variances that can be consistently estimated. We also develop graphical and numerical procedures for checking the adequacy of the model. We then demonstrate the usefulness of the proposed methods in simulation studies. Finally, we provide an application to a major cardiovascular clinical trial.
Keywords: Counting process, Dependent censoring, Intensity function, Inverse probability of censoring weighting, Mean function, Survival analysis
1. Introduction
In clinical and epidemiological studies, a subject can potentially experience multiple episodes of an adverse event, such as headache and pyogenic infection (Fleming and Harrington, 1991). Traditional survival analysis methods focusing on the time to the first event do not make full use of available data or characterize the entire clinical experience of the subject. It is statistically more efficient and clinically more meaningful to consider all recurrent events.
A number of statistical models and methods have been developed to analyze recurrent event data. Specifically, Andersen and Gill (1982) proposed the multiplicative intensity model by treating recurrent events as a non-homogeneous Poisson process, under which the risk of recurrence does not depend on the prior event history. To remove the Poisson assumption, Pepe and Cai (1993), Lawless and Nadeau (1995), and Lin and others (2000), hereafter referred to as LWYY, proposed to model the marginal rate function, which is easier to interpret than the intensity function. Prentice and others (1981) considered the hazard functions of the gap times between recurrent events, while Wei and others (1989) considered the marginal hazard function of each recurrent event.
Repeated occurrences of a serious adverse event, such as heart failure (Pfeffer and others, 2003), opportunistic HIV infection (Vlahov and others, 1991; Abrams and others, 1994), and cancer (Byar, 1980), tend to cause deterioration of health so that the subject may die during the course of the study. This phenomenon poses two challenges. First, the presence of a terminal event (i.e., death) invalidates the aforementioned methods for analyzing recurrent event data. Second, assessing the effects of treatments or other covariates on the entire clinical experience of a patient would need to take into account both recurrent and terminal events.
Two major approaches have been suggested to analyze recurrent and terminal events. The first one deals with the marginal rate or mean function of recurrent events, acknowledging the fact that there is no recurrent event after the terminal event (Cook and Lawless, 1997; Ghosh and Lin, 2000; Wang and others, 2001; Ghosh and Lin, 2002; Chen and Cook, 2004; Schaubel and others, 2006; Cook and others, 2009). The second one is the joint modelling for the two types of events (Huang and Wang, 2004; Liu and others, 2004; Ye and others, 2007; Zeng and Lin, 2009; Zeng and Cai, 2010). Both approaches treat recurrent and terminal events as two separate endpoints. The marginal rate and mean functions are affected by the distribution of the terminal event. The joint modelling approach assumes that a latent variable captures the dependence among recurrent events as well as the dependence between recurrent and terminal events, which is a simplistic and unverifiable assumption. For these reasons, the two approaches have rarely been used in actual clinical trials.
The current practice is to use the time to the first composite event (i.e, the first recurrent event or the terminal event, whichever occurs first) (Pfeffer and others, 2003; Yusuf and others, 2003; Anand and others, 2009; O'Connor and others, 2009; Zannad and others, 2011). This simple strategy is line with the ICH guidelines (Lewis, 1999) that “There should generally be only one primary variable” and that “If a single primary variable cannot be selected from multiple measurements associated with the primary objective, another useful strategy is to integrate or combine the multiple measurements into a single or composite variable, using a predefined algorithm.” The first composite event, however, is statistically inefficient and clinically unsatisfactory because it disregards all the events beyond the first one and does not distinguish recurrent and terminal events such that a subject who has a hospital admission first is treated the same as a subject who dies first.
Based on our recent conversations with cardiologists and regulatory statisticians, a weighted composite endpoint of all recurrent and terminal events, i.e., the cumulative number of recurrent and terminal events properly weighted by their degrees of severity, is an appealing alternative that is likely to be accepted by clinicians and regulatory agencies. This endpoint is a natural extension of the current practice of the first composite event so as to capture all the clinical events experienced by each patient. Compared with the first composite event, the weighted composite event process is not only statistically more efficient due to the use of all available data but also clinically more meaningful due to incorporation of the entire clinical experience of each patient and appropriate weighting of different types of events. This proposal reflects the recommendation of Neaton and others (2005) to optimally weight components of composite outcomes and to better use the entire event history of patients. An unweighted version of this composite endpoint is being used in a major clinical trial on the efficacy of an angiotensin receptor neprilysin inhibitor in reducing heart failures and cardiovascular death for patients with preserved ejection fraction.
The purpose of this article is to show how to properly analyze the weighted composite event process. We formulate the effects of treatments and other covariates on the weighted composite event process through a semiparametric proportional rates model and provide the corresponding inference procedures. In particular, we derive a non-parametric test statistic for assessing the treatment difference that does not involve any modelling assumption. The nonparametric nature is highly attractive for regulatory purposes. Because it is tempting to apply LWYY to the (unweighted) composite event process, we investigate the potential pitfalls of this strategy. We demonstrate the superiority of the new methods through simulated and real data.
2. Methods
Suppose that there are
different types of events, including the terminal event, where
is a fixed positive integer. For
, let
denote the cumulative number of the
th type of event the subject has experienced by time
. We assign the weight
to the
th type of event according to its relative severity and define the weighted sum of the
counting processes:
. Let
denote a
-vector of possibly time-varying external covariates, and
denote the survival time, i.e., time to the terminal event. We specify that
has multiplicative effects on the marginal rate function of
, i.e.,
![]() |
(2.1) |
where
is a
-vector of unknown regression parameters, and
is an arbitrary increasing function. Note that the dependence structure among recurrent and terminal events is completely unspecified. For time-invariant covariates, model (2.1) reduces to the proportional means model:
, where
is the baseline mean function.
In practice,
and
are subject to right censoring. Let
be the censoring time, which is assumed to be independent of
and
conditional on
. Write
,
, and
, where
, and
is the indicator function. For a study with
subjects, the data consist of
.
The only available approach to fitting model (2.1) is the LWYY estimating function
![]() |
where
, and
denotes the end of the study. In this approach, which is only applicable to the unweighted composite event process, death is part of the composite endpoint and also a censoring variable. This estimating function can be written as
![]() |
where
is some positive function. For this estimating function to be unbiased, we must have
, i.e.,
. Thus, the LWYY inference pertains to the conditional rate
![]() |
(2.2) |
where
. The integrated conditional rate does not have a clinical interpretation and is always greater than the marginal mean because
.
It is possible for models (2.1) and (2.2) to hold for the same
, in which case LWYY would provide valid inference on
. For example, suppose that the process
has the intensity function
with respect to the filtration
, where
is a positive frailty. Assume also that the distribution of the survival time
does not depend on
or
. Then
![]() |
and
![]() |
Thus, proportionality holds on both the marginal and conditional rate functions, although the baseline functions are different. If
depends on
or the dependence between recurrent events and death cannot be explained by a simple frailty, then the conditional rate model does not hold and LWYY will estimate a quantity that is different from
of model (2.1).
To make valid inference for model (2.1), we need to exclude the dependent censoring by death from the “at-risk” indicators in the estimating function. Specifically, a subject should remain in the risk set until independent censoring occurs even if the subject dies before the independent censoring time (i.e.,
). In other words, the at-risk process is
instead of
. If there is no early withdrawal or loss to follow-up, then the censoring is all administrative (i.e., caused by the termination of the study) and the censoring time is known to be the difference between the study end date and the subject's entry time. Replacing
in the LWYY estimating function with
, we obtain the estimating function
![]() |
(2.3) |
which can be written as
![]() |
This is an unbiased estimating function because
under model (1).
In most studies, there is random loss to follow-up in addition to administrative censoring, so that
is not fully observed. Thus, we use the inverse probability of censoring weighting technique (Robins and Rotnitzky, 1992). Define
![]() |
where
. Clearly,
. We estimate
by
![]() |
where
is the estimator of
under the proportional hazards model (Cox, 1972)
![]() |
(2.4) |
If
is discrete, we may set
to be the Kaplan–Meier estimator for covariate value
. Replacing
in (2.3) with
![]() |
we obtain an estimating function that allows unknown censoring times
![]() |
(2.5) |
Let
denote the root of
, which is obtained by the Newton–Raphson algorithm. The estimator
is consistent and asymptotically normal with a covariance matrix estimator given in Section S.1 of supplementary material available at Biostatistics online. We make inference about
or a subset of
by the Wald method. If
is the treatment indicator and
pertains to the treatment-specific Kaplan–Meier estimator, then the Wald statistic provides a nonparametric test for the equality of the two mean functions (since there is no modelling assumption under
).
To estimate the baseline mean function
, we employ a weighted version of the Breslow estimator
![]() |
This estimator is consistent and asymptotically normal with a covariance function given in Section S.1 of supplementary material available at Biostatistics online. Since
is nonnegative, we construct the confidence interval for
based on the log transformation. To be specific, the
confidence interval for
is given by
, where
is the variance estimator for
, and
is the
th percentile of the standard normal distribution. Incidentally, the LWYY estimator of
is
![]() |
which overestimates
because
for all
and
.
To assess the adequacy of model (2.1), we define
and
. Because the
are mean-zero processes under model (2.1), we plot the cumulative sum of the
against the model component of interest. Specifically, to check the functional form of the
th (time-invariant) covariate, we consider the cumulative-sum process
![]() |
where
is the
th component of
. To check the exponential link function, we consider
![]() |
To check the proportionality assumption, we consider the standardized “score” process
![]() |
where
is a covariance matrix estimator for
. To check the overall fit of the model, we consider
![]() |
We show in Section S.2 of supplementary material available at Biostatistics online, that, under model (2.1), all the above processes are asymptotically zero-mean Gaussian processes whose distributions can be approximated by Monte Carlo simulation along the lines of LWYY. We can graphically compare the observed cumulative-sum process with a few realizations from its null distribution or perform a numerical test based on the maximum absolute value of the process.
3. Simulation Studies
We assess the finite-sample performance of the new methods through extensive simulation studies. We consider one sequence of recurrent events, along with a terminal event. In order to compare with existing methods, we focus on the unweighted version of the composite endpoint. It is not trivial to generate the composite event process that satisfies the proportional means assumption. We outline below our data generation scheme while relegating the details to Section S.3 of supplementary material available at Biostatistics online.
Let
be a homogeneous Poisson process with intensity
, and let
be a stopping time such that there is at least one event in the interval
. Then by labeling the last event in
as the terminal event, we have a well-defined composite event process given by
. If the distribution of
is independent of
and uninformative about
, the optional sampling theorem implies that
![]() |
Thus, the process satisfies the proportional means assumption. In fact, given some appropriate
, we can simulate
that follows the exponential distribution with hazard
. Furthermore, we can introduce a frailty term
to the
and
of each subject so as to induce dependence among the event times of the same subject. We let
and
. Let the administrative censoring time be distributed as
and the random loss of follow-up be distributed as exponential with hazard 0.1. Let the frailty term
follow the gamma distribution with mean 1 and variance
. Under these conditions, each subject has an average of 2–3 events, and the censoring rate is
30%.
We conduct two sets of simulation studies to compare the new and LWYY methods for making inference on
. In the first set, we let
for all subjects, so that the two treatment groups have the same distribution of the terminal event. The results are displayed in Table 1. The new estimator
is virtually unbiased, and its variance estimator accurately reflects the true variation. As expected, LWYY also provides correct inference since the simulation set-up also conforms to the conditional rate model (2.2).
Table 1.
Simulation results comparing the new and LWYY methods in estimating the treatment difference under equal distributions of death
| New |
LWYY |
|||||||||
|---|---|---|---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
Bias | SE | SEE | CP | Bias | SE | SEE | CP |
| 100 | 0 | 0 | 0.002 | 0.158 | 0.161 | 0.952 | 0.002 | 0.163 | 0.165 | 0.951 |
| 100 | 0 | 0.5 | -0.001 | 0.150 | 0.150 | 0.957 | -0.002 | 0.149 | 0.149 | 0.946 |
| 100 | 0.25 | 0 | -0.001 | 0.200 | 0.198 | 0.954 | 0.003 | 0.205 | 0.204 | 0.955 |
| 100 | 0.25 | 0.5 | 0.005 | 0.192 | 0.190 | 0.953 | 0.003 | 0.191 | 0.188 | 0.948 |
| 100 | 0.5 | 0 | -0.003 | 0.240 | 0.243 | 0.953 | 0.003 | 0.240 | 0.239 | 0.952 |
| 100 | 0.5 | 0.5 | 0.007 | 0.213 | 0.214 | 0.957 | 0.002 | 0.212 | 0.212 | 0.950 |
| 200 | 0 | 0 | 0.002 | 0.111 | 0.114 | 0.945 | 0.002 | 0.111 | 0.114 | 0.956 |
| 200 | 0 | 0.5 | 0.001 | 0.106 | 0.106 | 0.951 | 0.002 | 0.107 | 0.109 | 0.951 |
| 200 | 0.25 | 0 | 0.003 | 0.140 | 0.141 | 0.944 | -0.002 | 0.143 | 0.144 | 0.945 |
| 200 | 0.25 | 0.5 | 0.000 | 0.130 | 0.131 | 0.952 | 0.005 | 0.132 | 0.130 | 0.946 |
| 200 | 0.5 | 0 | -0.001 | 0.164 | 0.162 | 0.948 | 0.002 | 0.168 | 0.167 | 0.950 |
| 200 | 0.5 | 0.5 | 0.000 | 0.148 | 0.146 | 0.944 | 0.001 | 0.156 | 0.153 | 0.952 |
| 500 | 0 | 0 | 0.001 | 0.067 | 0.067 | 0.951 | 0.002 | 0.072 | 0.071 | 0.944 |
| 500 | 0 | 0.5 | 0.001 | 0.066 | 0.064 | 0.954 | 0.000 | 0.069 | 0.068 | 0.950 |
| 500 | 0.25 | 0 | 0.002 | 0.093 | 0.094 | 0.947 | 0.002 | 0.085 | 0.083 | 0.945 |
| 500 | 0.25 | 0.5 | 0.002 | 0.088 | 0.087 | 0.953 | -0.002 | 0.086 | 0.089 | 0.952 |
| 500 | 0.5 | 0 | -0.001 | 0.101 | 0.099 | 0.954 | -0.002 | 0.106 | 0.108 | 0.955 |
| 500 | 0.5 | 0.5 | 0.000 | 0.091 | 0.090 | 0.948 | -0.002 | 0.097 | 0.097 | 0.951 |
Bias is the bias of the parameter estimator
, SE is the empirical standard error
of
, SEE is the empirical mean of the standard error estimator of
, and CP
is the coverage probability of the
confidence interval. Each entry is based on
10 000 replicates.
In the second set of studies, we let the distributions of the stopping time
differ between the two groups, such that the hazards of death are different while the mean functions of
are the same between the two groups (i.e.,
). Specifically, we generate a homogeneous Poisson process with intensity
for both groups and label the last event in the time interval
as death with probabilities
and
for groups 0 and 1, respectively. We let
and
or 0.5. Thus, the mean functions for the two groups are both
for
but their death rates are different. We let the administrative censoring time be distributed as
, let
, and keep the other conditions the same as before. As shown in Table 2, the new estimator is approximately unbiased and the corresponding test has correct type I error. In contrast, the LWYY method is biased and its type I error is inflated; the problem worsens as the death rates between the two groups become more different and as the sample size increases.
Table 2.
Simulation results comparing the new and LWYY methods in estimating and testing the treatment difference under unequal distributions of death
| New |
LWYY |
||||
|---|---|---|---|---|---|
![]() |
![]() |
Bias | Size | Bias | Size |
| 100 | 0.2 | 0.002 | 0.052 | 0.017 | 0.058 |
| 100 | 0.3 | -0.004 | 0.046 | 0.037 | 0.108 |
| 100 | 0.5 | 0.005 | 0.054 | 0.133 | 0.208 |
| 200 | 0.2 | 0.009 | 0.053 | 0.016 | 0.066 |
| 200 | 0.3 | 0.006 | 0.058 | 0.039 | 0.204 |
| 200 | 0.5 | -0.005 | 0.052 | 0.131 | 0.321 |
| 500 | 0.2 | -0.002 | 0.045 | 0.018 | 0.102 |
| 500 | 0.3 | 0.002 | 0.048 | 0.038 | 0.389 |
| 500 | 0.5 | -0.004 | 0.054 | 0.132 | 0.615 |
Bias is the bias of the parameter estimator, and size is the empirical type I error of the Wald statistic for testing
at the nominal significance level of
. Each entry is based on 10 000 replicates.
We adopt the first simulation set-up to assess the performance of the new and LWYY methods for estimating the baseline mean function
. By treating death as censoring, the LWYY method will over-estimate the mean function. Indeed, LWYY estimates
, which is strictly greater than
. Simulation results are summarized in Table 3 and are consistent with the expectations.
Table 3.
Simulation results comparing the new and LWYY methods in estimating the baseline mean function
| New |
LWYY |
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Mean | SE | SEE | CP | Mean | SE | SEE | CP |
| 100 | 0 | 1 | 0.571 | 0.571 | 0.084 | 0.082 | 0.952 | 0.616 | 0.092 | 0.095 | 0.927 |
| 2 | 1.088 | 1.085 | 0.134 | 0.137 | 0.953 | 1.246 | 0.155 | 0.158 | 0.796 | ||
| 3 | 1.555 | 1.560 | 0.175 | 0.176 | 0.945 | 1.900 | 0.223 | 0.221 | 0.610 | ||
| 0.5 | 1 | 0.558 | 0.556 | 0.106 | 0.104 | 0.944 | 0.590 | 0.119 | 0.120 | 0.923 | |
| 2 | 1.041 | 1.041 | 0.173 | 0.170 | 0.947 | 1.182 | 0.208 | 0.206 | 0.863 | ||
| 3 | 1.463 | 1.461 | 0.230 | 0.232 | 0.950 | 1.753 | 0.299 | 0.301 | 0.784 | ||
| 200 | 0 | 1 | 0.571 | 0.570 | 0.059 | 0.062 | 0.950 | 0.617 | 0.064 | 0.067 | 0.892 |
| 2 | 1.088 | 1.091 | 0.099 | 0.098 | 0.951 | 1.250 | 0.115 | 0.117 | 0.681 | ||
| 3 | 1.555 | 1.558 | 0.122 | 0.124 | 0.944 | 1.907 | 0.158 | 0.159 | 0.595 | ||
| 0.5 | 1 | 0.558 | 0.558 | 0.071 | 0.070 | 0.954 | 0.593 | 0.079 | 0.077 | 0.915 | |
| 2 | 1.041 | 1.042 | 0.116 | 0.119 | 0.951 | 1.182 | 0.140 | 0.139 | 0.819 | ||
| 3 | 1.463 | 1.459 | 0.167 | 0.169 | 0.948 | 1.750 | 0.205 | 0.205 | 0.641 | ||
| 500 | 0 | 1 | 0.571 | 0.572 | 0.038 | 0.036 | 0.954 | 0.610 | 0.046 | 0.048 | 0.828 |
| 2 | 1.088 | 1.083 | 0.061 | 0.062 | 0.953 | 1.251 | 0.070 | 0.068 | 0.616 | ||
| 3 | 1.555 | 1.559 | 0.083 | 0.083 | 0.951 | 1.903 | 0.096 | 0.097 | 0.277 | ||
| 0.5 | 1 | 0.558 | 0.554 | 0.045 | 0.047 | 0.945 | 0.590 | 0.047 | 0.049 | 0.736 | |
| 2 | 1.041 | 1.042 | 0.077 | 0.078 | 0.949 | 1.177 | 0.096 | 0.097 | 0.656 | ||
| 3 | 1.463 | 1.459 | 0.102 | 0.103 | 0.945 | 1.750 | 0.131 | 0.129 | 0.309 | ||
Mean is the empirical mean of
, SE is the empirical standard error of
, SEE is
the empirical mean of the standard error estimator of
, and CP is the coverage probability of the
log-transformed confidence interval. Each entry is based on 10 000 replicates.
We also compare the power of the new method using different weighting schemes with the current practice of performing the Cox regression on the time to the first composite event. The results for the first simulation set-up are shown in Table S1 of supplementary material available at Biostatistics online. The power of the new method decreases as the weight on death increases. This is not surprising since the distributions of death are identical between the two groups. For all weighting schemes, the new method yields much higher power than the Cox regression.
Next, we consider mis-specified censoring distributions. We use the first simulation set-up but generate the time to random loss of follow-up
from the proportional odds model:
. We estimate the censoring distributions by the Kaplan–Meier estimator or under the Cox model. The results are summarized in Table S2 of http://biostatistics.oxfordjournals.org/lookup/suppl/doi:10.1093/biostatistics/kxv050/-/DC1 online. Under the Cox model, the type I error is only slightly inflated, and the power tends to be higher than the use of the Kaplan–Meier estimator.
Finally, we evaluate the type I error of the supremum tests for model adequacy. The simulation set-up is the same as the first one except that we let
be the indicator variable and add a continuous variable
that is standard normal. For
, we simulate 1000 datasets with
. For each dataset, we obtain 1000 realizations from the null distribution to perform the supremum test at the nominal significance level of 0.05. We assess the functional form of
, the proportionality assumption on
, the exponential link function, and the overall goodness of fit. The empirical type I error rates are
,
,
and
, respectively. Thus, the goodness-of-fit tests are accurate for practical use.
4. A real example
Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training (HF-ACTION) was a randomized controlled clinical trial to evaluate the efficacy and safety of exercise training among patients with heart failure (O'Connor and others, 2009). A total of 2331 medically stable outpatients with heart failure and reduced ejection fraction were recruited between April 2003 and February 2007 at 82 centers in the USA, Canada, and France. Patients were randomly assigned to usual care alone or usual care plus aerobic exercise training that consists of 36 supervised sessions followed by home-based training. The usual care group consisted of 1172 patients (follow-up data not available for 1 patient), and the exercise training group consisted of 1159 patients. There were a large number of hospital admissions (due to heart failure, other cardiovascular causes, or non-cardiovascular causes) and a considerable number of deaths in each treatment arm, as shown in Table S3 of supplementary material available at Biostatistics online.
The primary endpoint was a composite of all-cause mortality and all-cause hospitalization. Secondary endpoints included the composite of cardiovascular mortality and cardiovascular hospitalization, and the composite of cardiovascular mortality and heart failure hospitalization. Under the Cox models on the time to the first event adjusting for heart failure etiology (ischemic or not), the p-values for these three endpoints were found to be 0.13, 0.14, and 0.06, respectively (O'Connor and others, 2009). This analysis disregarded all the clinical events that occurred after the first one and attached the same clinical importance to hospitalization and death.
To provide a statistically more efficient and clinically more relevant evaluation of the benefits of exercise training, we use the proposed weighted composite event process for death and recurrent hospitalization. For each of the three endpoints, we first consider an unweighted version of the composite event process—each event receives the same weight. To reflect the unequal severity of death versus hospitalization, we also consider a weighted version which assigns the weights of 2 and 1 to death and hospitalization, respectively. Because heart failure is a life-threatening event, we consider another weighting scheme which assigns the weights of 3, 2, and 1 to cardiovascular death, heart failure, and other cardiovascular hospitalization, respectively. These weights are in line with the cardiology literature (e.g., Califf and others, 1990; Braunwald and others, 1992; Armstrong and others, 2011).
We apply the proportional means model to the aforementioned weighted composite event processes. Table 4 displays the results on the ratios of the mean frequencies of the weighted composite events between exercise training and usual care adjusted for heart failure etiology. The
-values for the unweighted composite endpoints of all-cause mortality and all-cause hospitalization, cardiovascular mortality and cardiovascular hospitalization, and cardiovascular mortality and heart failure hospitalization are 0.06, 0.087, and 0.022, respectively, which are substantially smaller than the corresponding
-values in the analysis of the first event. Because treatment differences are more profound for hospitalization than for mortality, assigning more weight to death than hospitalization tends to reduce the level of statistical significance. Because there are a large number of recurrent hospital admissions, however, the use of the weighted composite event process (with less weight on hospitalization than on death) still tends to yield stronger evidence for the benefits of exercise training than the use of the first composite event, especially for the composite of mortality and heart failure.
Table 4.
Proportional mean regression analysis of the HF-ACTION data under different weighting schemes
| Weights | Ratio | SE | 95% CI |
-value |
|---|---|---|---|---|
| All-cause mortality and all-cause hospitalization | ||||
|
0.886 | 0.057 | ![]() |
0.060 |
|
0.895 | 0.061 | ![]() |
0.104 |
| Cardiovascular mortality and cardiovascular hospitalization | ||||
|
0.892 | 0.059 | ![]() |
0.087 |
|
0.906 | 0.070 | ![]() |
0.196 |
Heart
|
0.854 | 0.059 | ![]() |
0.022 |
Heart
|
0.863 | 0.062 | ![]() |
0.040 |
Heart
|
0.862 | 0.061 | ![]() |
0.037 |
Ratio is the estimated ratio of the mean frequencies of the weighted composite events between exercise training and usual case, SE is the (estimated) standard error for the ratio estimate, and 95%CI is the 95% confidence interval.
We compare the new and LWYY methods in the estimation of the mean functions for the unweighted composite event process of all-cause mortality and all-cause hospitalization. As shown in Figure 1, the LWYY estimates of the mean functions are considerably higher than ours. This phenomenon is consistent with the theory and simulation results.
Fig. 1.
Estimated mean functions for all-cause mortality and all-cause hospitalization by treatment group for non-ischemic patients in the HF-ACTION study: the left and right panels pertain to usual care and exercise training, respectively. The new and LWYY methods are denoted by the solid and dashed curves, respectively.
For further illustration, we apply the proportional means model to the last composite event process in Table 4 by adjusting for four additional covariates that were identified to be highly prognostic (O'Connor and others, 2009). These covariates are duration of the cardiopulmonary exercise test (CPX), left ventricular ejection fraction (LVEF), Beck Depression Inventory II score (BDI), and history of atrial fibrillation or flutter (AFF). The results on the regression effects are summarized in Table S4 of supplementary material available at Biostatistics online. With the covariate adjustment, the effect of exercise training is highly significant. Figure S1 of supplementary material available at Biostatistics online provides an example of predicting the number of events for a patient with given covariate values.
We check the model assumptions by using the diagnostic tools described in Section 2. The supremum tests for checking the functional forms of the continuous variables CPX, LVEF, and BDI have
-values of 0.523, 0.217, and 0.308, respectively. The supremum tests for checking the proportionality assumptions on CPX, LVEF, BDI, AFF, HF etiology, and treatment group have
-values of 0.138, 0.328, 0.070, 0.300, 0.105, and 0.256, respectively. The
-value for checking the exponential link function is 0.083. Thus, the model fits the data reasonably well. A subset of the residual plots are displayed in Figure S2 supplementary material available at Biostatistics online.
5. Discussion
The presence of a terminal event poses serious challenges in the analysis of recurrent event data. The existing methods treating recurrent and terminal events as two separate endpoints have not been well received by clinicians or regulatory agencies. The nonparametric tests of Ghosh and Lin (2000) have been used in recent cardiovascular trials (Anand and others, 2009; Rogers and others, 2012), but only as secondary analysis; none of the other methods seem to have been used in actual clinical trials. The current practice is to use the first composite event as the primary endpoint. This endpoint disregards the information on the clinical events beyond the first one and does not distinguish the two types of events. The weighted composite event process is a natural extension of the current measure to enhance statistical power and clinical relevance. This endpoint is particularly useful when there are several types of recurrent events, some of which might have too few occurrences to be analyzed separately.
We have proposed a novel proportional rates/means model for studying the effects of treatments and other covariates on the weighted composite event process and provided the corresponding inference procedures. We have demonstrated that the proposed inference procedures have desirable asymptotic and finite-sample properties. We have shown both analytically and numerically that the LWYY approach always over-estimates the mean function of the (unweighted) composite event process (whether or not recurrent and terminal events are correlated) and generally yields biased estimation of the regression parameters.
Although the concept of proportional rates/means is simple and attractive, it is not obvious that the model can hold for the weighted composite event process. We have shown that there are realistic data generation mechanisms which satisfy this model. In addition, we have provided graphical and numerical methods to assess the adequacy of the model.
When constructing the estimating function for model (2.1), we exclude the censoring by death from the at-risk indicators. It seems counter-intuitive to regard a subject to be at risk after death. However, “at risk” is a mathematical construct to ensure unbiased estimating functions. If there is no censoring by
, the composite endpoint process
will be fully observed. In that case, it is clear that censoring
at
is mathematically wrong.
Regulatory submissions require the treatment efficacy be represented by a single parameter in the primary analysis. The rate (or mean) ratio for the weighted composite event process proposed in this article satisfies this requirement and provides a fuller and more meaningful characterization of the clinical course than the hazard ratio for the first composite event. It is sensible to combine death and life-threatening recurrent events (e.g., heart failure or stroke) with appropriate weighting in the primary analysis.
The analysis based on the composite event process provides an overall assessment of the treatment efficacy. A significant treatment effect on the composite endpoint does not imply significant treatment effects on all its components. The existing methods that treat terminal and recurrent events as two separate endpoints can be used to determine the nature of the treatment effect. If the treatment reduces the frequencies of both terminal and recurrent events, then its clinical benefits are clear. Because the occurrence of the terminal event precludes further development of recurrent events, it is possible for the treatment to reduce the risk of the terminal event and increase the incidence of recurrent events.
The choices of the weights will affect the power of statistical analysis and the interpretation of results. If the treatment effect on the terminal event is similar to or smaller than the treatment effect on recurrent events, then giving more weight to the terminal event than recurrent events will reduce statistical power, as evident by the simulation results and the HF-ACTION study. On the other hand, a composite endpoint that is dominated by recurrent events may not be of great interest to clinicians. One may choose the weights in a data-adaptive manner such that the weight for the terminal event depends on how many patients have experienced the terminal event. The weighting scheme should be specified a priory in consultation with the appropriate drug approval agency and clinicians.
Supplementary material
Supplementary material is available online at http://biostatistics.oxfordjournal.org.
Acknowledgments
Conflict of Interest: None declared.
Funding
This research was supported by the NIH grants R01GM047845, R01AI029168, and P01CA142538.
Supplementary Material
References
- Abrams D. I., Goldman A. I., Launer C., Korvick J. A., Neaton J. D., Crane L. R., Grodesky M., Wakefield S., Muth K., Korne-gay S., Cohn D. L., Harris A., Luskin-Hwark R., Markowitz N., Sampson J. H., Thompson M., Deyton L. (1994). A comparative trial of Didanosine or Zalcitabine after treatment with Zidovudine in patients with human immunodeficiency virus infection. New England Journal of Medicine 330, 657–662. [DOI] [PubMed] [Google Scholar]
- Anand I. S., Carson P., Galle E., Song R., Boehmer J., Ghali J. K., Jaski B., Lindenfeld J., O'Connor C., Steinberg J. S., Leigh J., Yong P., Kosorok M. R., Feldman A. M., DeMets D., Bristow M. R. (2009). Cardiac resynchronization therapy reduces the risk of hospitalizations in patients with advanced heart failure: results from the Comparison of Medical Therapy, Pacing and Defibrillation in Heart Failure (COMPANION) trial. Circulation 119, 969–977. [DOI] [PubMed] [Google Scholar]
- Andersen P. K., Gill R. D. (1982). Cox's regression model for counting processes: a large sample study. The Annals of Statistics 10, 1100–1120. [Google Scholar]
- Armstrong P. W., Westerhout C. M., Van de Werf F., Califf R. M., Welsh R. C., Wilcox R. G., Bakal J. A. (2011). Refining clinical trial composite outcomes: an application to the Assessment of the Safety and Efficacy of a New Thrombolytic-3 (ASSENT-3) trial. American Heart Journal 161, 848–854. [DOI] [PubMed] [Google Scholar]
- Braunwald E., Cannon C. P., McCabe C. H. (1992). An approach to evaluating thrombolytic therapy in acute myocardial infarction. The “unsatisfactory outcome” end point. Circulation 86, 683–687. [DOI] [PubMed] [Google Scholar]
- Byar D. P. (1980). The Veterans Administration study of chemoprophylaxis for recurrent stage I bladder tumors: comparisons of placebo, pyridoxine, and topical thiotepa. In Pavone-Macaluso M., Smith P. H. and Edsmyr F. (editors), Bladder Tumors and Other Topics in Urological Oncology. New York: Plenum, pp. 363–370. [Google Scholar]
- Califf R. M., Harrelson-Woodlief L., Topol E. J. (1990). Left ventricular ejection fraction may not be useful as an end point of thrombolytic therapy comparative trials. Circulation 82, 1847–1853. [DOI] [PubMed] [Google Scholar]
- Chen B. E., Cook R. J. (2004). Tests for multivariate recurrent events in the presence of a terminal event. Biostatistics 5, 129–143. [DOI] [PubMed] [Google Scholar]
- Cook R. J., Lawless J. F. (1997). Marginal analysis of recurrent events and a terminating event. Statistics in Medicine 16, 911–924. [DOI] [PubMed] [Google Scholar]
- Cook R. J., Lawless J. F., Lakhal-Chaieb L., Lee K. A. (2009). Robust estimation of mean functions and treatment effects for recurrent events under event-dependent censoring and termination: application to skeletal complications in cancer metastatic to bone. Journal of the American Statistical Association 104, 60–75. [Google Scholar]
- Cox D. R. (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical Society, Series B 34, 187–200. [Google Scholar]
- Fleming T. R., Harrington D. (1991) Counting Processes and Survival Analysis. New York: Wiley. [Google Scholar]
- Ghosh D., Lin D. Y. (2000). Nonparametric analysis of recurrent events and death. Biometrics 56, 554–562. [DOI] [PubMed] [Google Scholar]
- Ghosh D., Lin D. Y. (2002). Marginal regression models for recurrent and terminal events. Statistica Sinica 12, 663–688. [Google Scholar]
- Huang C., Wang M. (2004). Joint modeling and estimation for recurrent event processes and failure time data. Journal of the American Statistical Association 99, 1153–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawless J. F., Nadeau C. (1995). Some simple robust methods for the analysis of recurrent events. Technometrics 37, 158–168. [Google Scholar]
- Lewis J. A. (1999). Statistical principles for clinical trials (ICH E9): an introductory note on an international guideline. Statistics in Medicine 18, 1903–1942. [DOI] [PubMed] [Google Scholar]
- Lin D. Y., Wei L. J., Yang I., Ying Z. (2000). Semiparametric regression for the mean and rate functions of recurrent events. Journal of the Royal Statistical Society, Series B 62, 711–730. [Google Scholar]
- Liu L., Wolfe R. A., Huang X. (2004). Shared frailty models for recurrent events and a terminal event. Biometrics 60, 747–756. [DOI] [PubMed] [Google Scholar]
- Neaton J. D., Gray G., Zuckerman B. D., Konstam M. A. (2005). Key issues in end point selection for heart failure trials: composite end points. Journal of Cardiac Failure 11, 567–575. [DOI] [PubMed] [Google Scholar]
- O'Connor C. M., Whellan D. J., Lee K. L., Keteyian S. J., Cooper L. S., Ellis S. J., Leifer E. S., Kraus W. E., Kitzman D. W., Blumenthal J. A.. and others (2009). Efficacy and safety of exercise training in patients with chronic heart failure: HF-ACTION randomized controlled trial. Journal of American Medical Association 301, 1439–1450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pepe M. S., Cai J. (1993). Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. Journal of the American Statistical Association 88, 811–820. [Google Scholar]
- Pfeffer M. A., Swedberg K., Granger C. B., Held P., McMurray J. J., Michelson E. L., Olofsson B., Ostergren J., Yusuf S. (2003). Effects of candesartan on mortality and morbidity in patients with chronic heart failure: the CHARM-Overall programme. The Lancet 362, 759–66. [DOI] [PubMed] [Google Scholar]
- Prentice R. L., Williams B. J., Peterson A. V. (1981). On the regression analysis of multivariate failure time data. Biometrika 68, 373–379. [Google Scholar]
- Robins J. M., Rotnitzky A. (1992). Recovery of information and adjustment for dependent censoring using surrogate markers. In Jewell N., Dietz K. and Farewell V. (editors), AIDS Epidemiology. Boston: Birkhauser, pp. 297–331. [Google Scholar]
- Rogers J. K., McMurray J. J. V., Pocock S. J., Zannad F., Krum H., van Veldhuisen D. J., Swedberg K., Shi H., Vincent J., Pitt B. (2012). Eplerenone in patients with systolic heart failure and mild symptoms analysis of repeat hospitalizations. Circulation 126, 2317–2323. [DOI] [PubMed] [Google Scholar]
- Schaubel D. E., Zeng D., Cai J. (2006). A semiparametric additive rates model for recurrent event data. Lifetime Data Analysis 12, 389–406. [DOI] [PubMed] [Google Scholar]
- Vlahov D., Anthony J. C., Munoz A., Margolick J., Nelson K. E., Celentano D. D., Solomon L., Polk B. F. (1991). The ALIVE study: a longitudinal study of HIV-1 infection in intravenous drug users: description of methods. Journal of Drug Issues 21, 758–776. [PubMed] [Google Scholar]
- Wang M. C., Qin J., Chiang C. T. (2001). Analyzing recurrent event data with informative censoring. Journal of American Statistical Association 96, 1057–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wei L. J., Lin D. Y., Weissfeld L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. Journal of the American Statistical Association 84, 1065–1073. [Google Scholar]
- Ye Y., Kalbfleisch J. D., Schaubel D. E. (2007). Semiparametric analysis of correlated recurrent and terminal events. Biometrics 63, 78–87. [DOI] [PubMed] [Google Scholar]
- Yusuf S., Pfeffer M. A., Swedberg K., Granger C. B., Held P., McMurray J. J., Michelson E. L., Olofsson B., Östergren J. (2003). Effects of candesartan in patients with chronic heart failure and preserved left-ventricular ejection fraction: the CHARM-Preserved Trial. The Lancet 362, 777–781. [DOI] [PubMed] [Google Scholar]
- Zannad F., McMurray J. J., Krum H., van Veldhuisen D. J., Swedberg K., Shi H., Vincent J., Pocock S. J., Pitt B. (2011). Eplerenone in patients with systolic heart failure and mild symptoms. New England Journal of Medicine 364, 11–21.21073363 [Google Scholar]
- Zeng D., Cai J. (2010). A semiparametric additive rate model for recurrent events with an informative terminal event. Biometrika 97, 699–712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng D., Lin D. Y. (2009). Semiparametric transformation models with random effects for joint analysis of recurrent and terminal events. Biometrics 65, 746–752. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.























































