Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 15.
Published in final edited form as: Biometrics. 2020 Feb 18;76(4):1229–1239. doi: 10.1111/biom.13229

Semiparametric Modelling and Estimation of Covariate-Adjusted Dependence Between Bivariate Recurrent Events

Jing Ning 1,*, Chunyan Cai 2, Yong Chen 3, Xuelin Huang 1, Mei-Cheng Wang 4
PMCID: PMC7384929  NIHMSID: NIHMS1596616  PMID: 31994170

Summary:

A time-dependent measure, termed the rate ratio, was proposed to assess the local dependence between two types of recurrent event processes in one-sample settings. However, the one-sample work does not consider modelling the dependence by covariates such as subject characteristics and treatments received. The focus of this paper is to understand how and in what magnitude the covariates influence the dependence strength for bivariate recurrent events. We propose the covariate-adjusted rate ratio, a measure of covariate-adjusted dependence. We propose a semiparametric regression model for jointly modeling the frequency and dependence of bivariate recurrent events: the first level is a proportional rates model for the marginal rates and the second level is a proportional rate ratio model for the dependence structure. We develop a pseudo-partial likelihood to estimate the parameters in the proportional rate ratio model. We establish the asymptotic properties of the estimators and evaluate the finite sample performance via simulation studies. We illustrate the proposed models and methods using a soft tissue sarcoma study that examines the effects of initial treatments on the marginal frequencies of local/distant sarcoma recurrence and the dependence structure between the two types of cancer recurrence.

Keywords: Bivariate recurrent event, Covariate-adjusted rate ratio, Dependence structure, Joint model, Rate ratio


Recurrent event data are often encountered in reliability experiments (Dalal and McIntosh, 1994), insurance warranty claims (Lawless and Nadeau, 1995), and biomedical studies (Cook and Lawless, 2007). In these situations, there may exist more than one type of recurrent event of interest (Cai and Schaubel, 2004; Schaubel and Cai, 2005; Sun et al., 2009; Cook et al., 2010; Zhu et al., 2010; Zhao et al., 2012). When analyzing such data, it is important to assess the dependence structure between different types of recurrent events, which, even if it is not of primary scientific interest, at least ensures the correctness of joint model assumptions and then guides the model selection. However, such an assessment is challenging as the dependence between recurrent events may vary, not only with time, but also with patients’ characteristics and initial treatments received. Previous work has focused on a time-variant property but ignored the fact that the strength of dependence can be affected by covariates (Ventura et al., 2005; Ning et al., 2015).

This research is motivated by a soft tissue sarcoma study, in which 674 patients with stage III soft tissue sarcoma who were treated between 1984 and 1999 were identified from two comprehensive cancer centers and then were followed for cancer recurrences (Cormier et al., 2004). The study documented the timing of bivariate recurrence events: local disease recurrence, in which tumor cells that remain in the original site form another tumor over time, and distant disease recurrence, in which sarcoma spreads through the bloodstream to distant sites such as the lungs or liver. Previous research has shown that the dependence between local and distant cancer recurrences for patients with soft tissue sarcoma was not constant over time (Ning et al., 2015). A natural question is whether the dependency structure of the bivariate sarcoma recurrences depends not only on time, but also on other covariates, such as the initial treatments received by the patients with sarcoma, and if so, in what way?

A time-dependent measure, termed the rate ratio, has been used to assess the local dependence between two types of recurrent event processes (Ning et al., 2015). One advantage of using the rate ratio as the dependence measure is that it has an attractive relative probability interpretation, which is simple for practitioners to understand. There are often factors, such as patient characteristics and treatments received, that affect the strength of the dependence between different types of recurrent events. It may be important to understand how the dependence structure depends on such factors, although little has been done to develop measures and models for characterizing the dependence structure. For bivariate survival time, there is some work in the literature regarding how to model the covariate effects on the dependence structure, in which the covariate effects are often modeled through marginal distributions. For example, Fan and Prentice (2002) used a proportional hazards model on the marginal survival function to accommodate covariate effects. However, when the dependence structure itself is of major interest, modeling the covariate effects via marginal regression models does not explicitly determine how covariates change the strength of dependence nor by how much. In this paper, we propose the covariate-adjusted rate ratio for the dependence structure and consider a joint model of the bivariate recurrent events. In the joint model, we directly link the covariates and the covariate-adjusted rate ratio via a regression model, where the covariate effects are multiplicative on the baseline rate ratio function.

The rest of the paper is organized as follows. In Section 1, we introduce the notation, the covariate-adjusted dependence measure, and the semiparametric model. We then construct a two-stage estimating procedure in which a pseudo-partial likelihood is used to estimate parameters. We establish the large sample properties for the estimated parameters. We assess the empirical performance of the proposed estimators in Section 2, describe an application to the soft tissue sarcoma study in Section 3, and conclude with a discussion in Section 4.

1. METHOD

1.1. Notation and Model

Let {N1(t), N2(t), t ⩾ 0} represent the bivariate counting process for the number of type-specific events during the time [0, t]. Assume Xj (t) to be a p-dimensional vector of possibly time-dependent covariates for the jth type of recurrent event. We denote λj{t|xj(t)} as the type-specific and covariate-adjusted rate function,

λj{t|xj(t)}=limΔ0+P{Nj(t+Δ)Nj(t)>0|xj(t)}/Δ.

We generalize the rate ratio to accommodate the covariate effects on the dependence measure, termed the covariate-adjusted rate ratio,

ρ{s,t|x1(s),x2(t)}=λ1|2{s|t,x1(s),x2(t)}λ1{s|x1(s)},

where λ1|2{s|t,x1(s),x2(t)} is a covariate-adjusted conditional rate function defined as

limΔ0+P{N1(s+Δ)N1(s)>0N2(t+Δ)N2(t)>0,X1(s)=x1(s),X2(t)=x2(t)}Δ.

Under the assumption that λj(t|xj(t),xj′(s)) = λj(t|xj(t))(jj′ ∈ {1, 2}), it can be shown that the covariate-adjusted rate ratio is symmetric,

ρ{s,t|x1(s),x2(t)}=λ1|2{s|t,x1(s),x2(t)}λ1{s|x1(s)}=λ2|1{t|s,x1(s),x2(t)}λ2{t|x2(t)},

where

λ2|1{t|s,x1(s),x2(t)}=limΔ0+P{N2(t+Δ)N2(t)>0|N1(s+Δ)N1(s)>0,x1(s),x2(t)}/Δ.

The above assumption implies that, given the covariate information up to time t of one process, the covariate information of the other process does not provide any additional information on the marginal rate function of the process. The covariate-adjusted rate ratio shares the same desirable properties as the rate ratio, including the relative probability interpretation (Ning et al., 2015). We use the covariate-adjusted rates in both the numerator and denominator to eliminate the influence of covariate effects on the frequencies of recurrent events, such that the covariate-adjusted rate ratio can only capture the covariate effects on the strength of dependence between different types of recurrent events.

Note that the rate ratio can be rewritten as

ρ{s,t|x1(s),x2(t)}=limΔ0+P{N2(t+Δ)N2(t)>0,N1(s+Δ)N1(s)>0|x1(s),x2(t)}/Δλ1{s|x1(s)}λ2{t|x2(t)}, (1)

in which two marginal rate functions are involved. It implies that the covariate-adjusted rate ratio depends not only on the joint probability, but also on how the covariate effect affects the two marginal rate function. This expression gives an alternative interpretation for the rate ratio. It provides a standardized co-occurrence rate of event pair consisting a type-1 event at s and a type-2 event at t, with the standard being the co-occurrence rate in a situation where these two types of events are independent of each other.

Without any parametric assumption, estimation of the covariate-adjusted rate ratio can be computationally prohibitive in the presence of continuous covariates. Our strategy is to impose semiparametric regression models with model flexibility and easy interpretation. There are certain considerations for constructing regression models with a dependence structure for bivariate recurrent event data. First, we choose the exponential function as the link function in the regression model for the rate ratio, due to an easy interpretation of the regression coefficients and non-negative nature of the rate ratio. Second, when calculating the rate of the observed event that subject i experiences both types of events respectively at times s and t, which plays a fundamental role on the construction of the likelihood function, we have

P(dNi1(s)=1,dNi2(t)=1|xi1(s),xi2(t))=λ1{s|xi1(s)}λ2|1{t|s,xi1(s),xi2(t)}dsdt=λ1{s|xi1(s)}λ2{t|xi2(t)}ρ{s,t|xi1(s),xi2(t)}dsdt. (2)

This equation implies that the probability of observing a pair of events again depends on both the rate ratio and two marginal rates. Both equations (1) and (2) illustrate that the covariate-specific marginal rate functions need to be specified to evaluate the covariate effects on the strength of dependence. Following the above considerations, we propose the following joint model.

Level 1: Proportional rate model for Nj(.) conditioning on Xj(.) (Lin et al., 2000),

λj{s|xj(s)}=λj0(s)exp{γjTxj(s)},j=1,2, (3)

where λj0(s) is an unspecified rate function for the jth type of event and γj is a p-dimensional parameter vector to characterize the covariate effects on marginal rates.

Level 2: Proportional rate ratio model conditioning on (X1(.),X2(.)),

ρ{s,t|x1(s),x2(t),β,α}=ρ0(s,t;β)exp{α1Tx1(s)+α2Tx2(t)}, (4)

where α=(α1T,α2T)T is a 2p-dimensional parameter vector and ρ0(s,t;β) is a prespecified baseline rate ratio with a q-dimensional parameter β. For simplicity of notation, we use the same vector of covariate in the marginal models and rate ratio model, but the two models are allowed to have different sets of covariates. The proportional rate ratio model directly links the covariate and the rate ratio by assuming that the covariate effect is multiplicative on the baseline rate ratio function. The parameter α describes the covariate effects on the strength of dependence of the bivariate recurrent events and has an interpretation as the log-relative rate ratio related to the covariate. The baseline rate ratio function serves to identify the degree of dependence over time and is set to be a constant under the commonly used joint random effect models (Sun et al., 2009; Zhu et al., 2010; Zhao et al., 2012; Ning et al., 2015). Flexible functions of time, such as polynomial functions or regression splines, could be used to specify the baseline rate ratio function.

The probability structure on the bivariate recurrent events is determined by the bivariate intensity function, h(s, t|x1(s),x2(t)), which is defined as

limΔ0+P{N1(s+Δ)N1(s)>0,N2(t+Δ)N2(t)>0|H(s,t),x1(s),x2(t)}/Δ.

Here H(s,t) is the history of the bivariate counting process up to time (s,t). The corresponding rate function represents the probability of events occurring at time (s,t) without conditioning on the event history. Although the proposed model assumes a parametric regression model on the rate ratio, it does not fully impose the probability structure of the bivariate processes (e.g., bivariate intensity function) and is semiparametric and enjoys model robustness. In contrast, for the shared random effects models with fully parametric assumptions, our semiparametric model framework provides more flexibility in modeling and data application.

1.2. Estimation Procedure

Consider a study that involves n independent subjects, each of whom may experience two types of recurrent events in observation time period [0,τ]. For subject i, denote Cij and Yij(t) = I(Cijt) as the censoring time of the jth type of recurrent event and its risk function, j = 1, 2. Let Nij be the number of events of type j for subject i that occur over the interval [0, Cij] and mij = Nij(Cij), i = 1, ⋯, n, j = 1,2. The observed event times of the ith subject are respectively ti11ti12 ⩽ ⋯ ⩽ ti1mi1 and ti21ti22 ⩽ ⋯ ⩽ ti2mi2 for the two types of recurrent events. Denote the observed covariate information as Xij(.). For simplicity of notation, denote xijk = Xij(tijk) for i = 1, ⋯, n, j = 1, 2 and k = 1, ⋯, mij.

The rate function gives the marginal probability, and its integral from 0 to t represents the mean of number of events occurring up to time t. Therefore, the parameters under the proportional rate models can be estimated by solving the following moment-based estimating equations (Lin et al., 2000),

Uj(γ)=i=1n0τ{xij(u)x¯j(γj;u)}dNij(u) (5)

where a⊗0 = 1, a⊗1 = a, a⊗2 = aaT, xj(γj;u)=S(1)(γj;u)S(0)(γj;u), and

S(k)(γj;t)=n1i=1nYij(t)xij(t)kexp{γjTxij(t)}.

Under the proposed model, we specify the rates and rate ratio of the recurrent event processes, but do not fully impose the probability structure of the bivariate recurrent event processes (e.g., bivariate intensity function). Accordingly, the full likelihood method can not be applied to estimate the unknown parameters under the proposed joint model. Different from the proportional rate model, the proportional rate ratio model is built on the ratio of conditional probabilities; it has no direct relationship with the mean/variance of the recurrent events. Hence the moment-based approach is not an ideal option to estimate the unknown parameters under the proportional rate ratio model.

To solve this estimation challenge, we construct a pseudo-partial likelihood by mimicking the partial likelihood under the Cox proportional hazards model (Cox, 1975). Denote R(s, t) = {i : Ci1s, Ci2t} as the bivariate risk set for bivariate recurrent events at time (s, t), and n(s,t) as the size of the risk set R(s, t). When we apply the partial likelihood idea to the observed bivariate recurrent event data, we need to identify all risk sets at all possible paired observation times for the different types of events. Hence, unlike the construction of univariate risk sets in the standard partial likelihood, the observed recurrent event times from one subject can contribute to multiple risk sets due to the unique data structure of bivariate recurrent event data. Note that the paired time points can come from the same subject or from two different subjects, even though they are different types of events.

For each bivariate risk set at time (s, t), we consider the following event, called eii′(s,t):

eii(s,t)={i,iR(s,t):dNi1(s)=1,dNi2(t)=1}.

When i = i′ event eii′(s,t) occurs if subject i in the risk set R(s,t) experiences both types of events respectively at times s and t. When ii′ event eii′(s,t) occurs if subject i in the risk set R(s,t) experiences the first type of event and the other subject i′ in the risk set R(s,t) experiences the second type of event. Calculating the event probabilities conditional on the composition of the risk set, we have

Pr{eii(s,t)|R(s,t)}=Iii(s,t)λ1{s|xi1(s)}λ2|1{t|s,xi1(s),xi2(t)}i=1nIii(s,t)λ1{s|xi1(s)}λ2|1{t|s,xi1(s),xi2(t)}+i=1niiIii(s,t)λ1{s|xi1(s)}λ2{t|xi2(t)}

and for ii′,

Pr{eii(s,t)R(s,t)}=I(ii)Iii(s,t)λ1(sxi1(s))λ2(ts,xi2(t))i=1nIii(s,t)λ1(sxi1(s))λ21(ts,xi1(s),xi2(t))+i=1niiIii(s,t)λ1(sxi1(s))λ2(txi2(t))

where Iii(s,t) = I(Ci1s, Ci2t) and Iii′(s,t) = Iii(s,t)Ii′i′(s,t). Using the proportional rate model and the definition of the rate ratio, we can eliminate the unspecified baseline rate function and further simplify the above two conditional probabilities:

Pr{eii(s,t)|R(s,t)}=Iii(s,t)ρ(s,t|xi1(s),xi2(t),η)exp{γ1Txi1(s)+γ2Txi2(t)}/P(s,t;γ,η)
Pr{eii(s,t)|R(s,t)}=I(ii)Iii(s,t)exp{γ1Txi1(s)+γ2Txi2(t)}/P(s,t;γ,η)

where γ = (γ1,γ2), η = (β,α) and

P(s,t;γ,η)=i=1nIii(s,t)ρ(s,t;xi1(s),xi2(t),η)exp{γ1Txi1(s)+γ2Txi2(t)}+i=1niiIii(s,t)exp{γ1Txi1(s)+γ2Txi2(t)}.

Of note, for each bivariate risk set that can be constructed from the observed event times (s, t), either eii(s, t) or eii′(s,t)(ii′) must occur. The pseudo-partial likelihood is the product of all of the conditional probabilities of eii or eii′ (ii′), given the corresponding risk sets constructed from the observed event times. Synthesizing this information, we have the log-pseudo-partial likelihood as

Lp(γ,η)=i=1nk=1mi1k=1mi2log[ρ(ti1k,ti2k|xi1k,xi2k,η)exp{γ1Txi1k+γ2Txi2k}P(ti1k,ti2k;γ,η)]+i=1niink=1mi1k=1mi2Iii(ti1k,ti2k)log[exp{γ1Txi1k+γ2Txi2k}P(ti1k,ti2k;γ,η)]. (6)

This pseudo-partial likelihood naturally extends the partial likelihood method for right-censored single time-to-event data to bivariate recurrent event data. For the estimation procedure, we can simultaneously solve the joint estimations in equation (5) for the marginal rates and the score equation of the pseudo-partial likelihood. Alternatively, we can first estimate the covariate effects on the marginal rates, denoted as γ^, by solving estimating equation (5), since that equation does not involve parameter η. We then replace γ^ in the log-pseudo-partial likelihood function, denoted as Lp(γ^,η), and maximize Lp(γ^,η) to obtain the estimator of η, denoted as η^. The corresponding score equation is

U(η,γ^)=i=1ni=1nk=1mi1k=1mi2[I(i=i)ηlog{ρ(ti1k,ti2k|xi1k,xi2k,η)}Iii(ti1k,ti2k)ηlog{P(ti1k,ti2k;γ^,η)}],

where ∇η denotes the first derivative with respect to η. Theoretically, the two estimators should be identical by solving the two set equations simultaneously or sequentially; while numerically, the two estimators could be different since they are usually not the exact solutions to the equations. We have conducted simulation studies and confirmed that the differences between the two estimating procedures are ignorable in our setting. Accordingly, we solved the two sets of equations sequentially in our simulation studies and application, due to its computational efficiency.

1.3. Asymptotic Behavior

The asymptotic properties of γ^ have been well established (Lin et al., 2000). We need to consider the asymptotic performance of the pseudo-partial likelihood and the influence of plugging γ^ into the pseudo-partial likelihood. Note that the proposed likelihood is not a true partial likelihood because the conditional events from which components are constructed do not come from a nested sequence; hence, the martingale theorem cannot be applied to derive the asymptotic properties of the estimators by maximizing the pseudo-partial likelihood. Nevertheless, each component in the pseudo-partial likelihood is a legitimate conditional density, and it can be shown that the corresponding score equation is an unbiased estimating equation for η, although the correlation between the observed events on the observed bivariate risk sets is not accounted for in the construction of the pseudo-partial likelihood. We apply the empirical process theorem and U-statistics theorem to establish the consistency and asymptotic normality of the estimators under the regularity conditions listed in the Supporting Information. We summarize these results in the following theorem and provide the proof in the Supporting Information.

Theorem 1: Under regularity conditions [C.1-C.5] listed in the Supporting Information, the estimator η^ converges to the true value η0 in probability. Moreover, n(η^η0) converges in distribution to a normal distribution with mean 0 and the covariance matrix as defined in the Supporting Information.

2. SIMULATIONS

2.1. Design and Data Generation

We conducted simulation studies to evaluate the finite sample performance of the proposed method. We simulated bivariate recurrent event data through the shared random effects models. Specifically, we generated the shared random effects from a Gamma distribution with a shape parameter a and scale parameter b. We further increased the random effects by δ to avoid extremely small random effects that might lead to highly rare recurrent events. We considered two covariates for both the proportional rate models and proportional rate ratio models: X1 ~ binomial(0.5) and X2 ~ Uniform(0,1). We generated independent censoring times from a uniform distribution on the interval [3, 12]. We used two sample sizes (n = 200 and n = 400) with 1000 replications for each scenario. We generated bivariate recurrent event data from two scenarios with different dependence structures (Level 2 model). In Scenario 1, the two recurrent events have covariate-varying dependence, and in Scenario 2, the events have time-varying dependence in addition to covariate-varying dependence. Please see the Supporting Information for more details on the data generation.

For the variance estimation, because the explicit form of the asymptotic variance of the estimators is very complicated, involving unknown functions such as the baseline rate functions, it is computationally difficult and unstable to directly obtain a consistent variance estimator. Given the established square root of n convergence rate, we can use resampling methods (e.g., bootstrap or Jackknife methods) to estimate the variance of estimators. We choose the jackknife method for two main reasons: 1) For time-to event data with a small or moderate sample size, the nonparametric bootstrap method has the potential to produce a data set with fewer events, due to its random sampling with replacement, which may cause the non-convergence issue in the estimation procedure for this re-sampled data. This is not an issue, however, using the Jackknife method. 2) Based on our previous experience, the Jackknife-bias correction formula can reduce the estimator biases substantially when the sample size is small or moderate. Although the bootstrap can provide more general information (the sampling distribution of an estimator), considering our purpose, the Jackknife method is preferred for its explicit bias-correction formula (Shao and Wu, 1989). Denote η^(i) as the estimator after the i-th observation being deleted. Then the bias corrected Jackknife estimator is η^jack=nη^(n1)η^(.) where η^(.) is the empirical average of the Jackknife replicates and equals to i=1nη^(i)n. The corresponding Jackknife standard error is defined as

SE(η^)jack={n1ni=1n(η^(i)η^())2}1/2.

2.2. Simulation Results

We obtained the proposed estimates by solving estimating equation (5) and maximizing the log-pseudo-partial likelihood specified in equation (6), which was implemented by the optim function in R (R Development Core Team). Tables 1 and 2 respectively summarize simulation results under Scenarios 1 and 2, including empirical biases of estimators and bias corrected Jackknife estimators, empirical standard deviation, standard error estimators by the Jackknife method and coverage probabilities of the 95% Wald type confidence intervals of the original estimators.

Table 1.

Simulation results under Scenario 1

Proportional rate model
Proportional rate ratio model
n γ Bias ESD JSE Biasj CP (β,α) Bias ESD JSE Biasj CP
Low event scenario: # of first type of events=1.48, # of second type of events=1.49
200 γ11 = .2 .010 .168 .165 .005 94.4 β1 = .693 −.066 .324 .309 .000 92.7
γ12 = 0 .003 .273 .275 .003 95.8 β2 = .0 −.001 .021 .020 −.001 93.3
γ21 = .4 .010 .172 .167 .004 95.1 β3 = .0 −.001 .021 .021 .000 95.4
γ22 = 0 .004 .267 .268 .004 95.5 α1 = −.511 .038 .254 .244 −.004 92.6
α2 = .0 .001 .418 .385 .002 94.6
400 γ11 = .2 .001 .121 .116 −.002 94.0 β1 = .693 −.048 .248 .282 .015 94.8
γ12 = 0 .001 .205 .194 .001 93.2 β2 = .0 .000 .014 .018 −.002 95.4
γ21 = .4 −.003 .120 .117 −.006 93.9 β3 = .0 −.001 .015 .018 −.003 95.2
γ22 = 0 .000 .189 .189 .000 95.8 α1 = −.511 .017 .189 .224 −.007 94.3
α2 = .0 .024 .332 .331 .022 94.9
High event scenario: # of first type of events=2.04, # of second type of events=2.07
200 γ11 = .2 .010 .155 .152 .005 95.0 β1 = .693 −.064 .295 .286 .009 91.0
γ12 = 0 .004 .255 .252 .004 94.6 β2 = .0 −.001 .018 .018 −.001 95.1
γ21 = .4 .008 .158 .154 .003 94.5 β3 = .0 −.001 .017 .018 −.001 94.9
γ22 = 0 .004 .246 .244 .004 95.5 α1 = −.511 .037 .240 .231 −.006 91.4
α2 = .0 .007 .391 .354 .000 94.8
400 γ11 = .2 .000 .111 .108 −.002 94.0 β1 = .693 −.047 .208 .212 −.010 91.8
γ12 = 0 −.001 .183 .179 .000 94.4 β2 = .0 .001 .012 .013 .000 95.1
γ21 = .4 −.001 .111 .107 −.004 93.6 β3 = .0 −.001 .012 .013 −.001 95.4
γ22 = 0 .003 .171 .172 .003 95.5 α1 = −.511 .020 .172 .173 −.005 92.5
α2 = .0 .018 .271 .273 .020 95.7

Bias, empirical bias; ESD, empirical standard deviation; JSE, standard error estimator by the jackknife method; Biasj, empirical bias of bias corrected Jackknife estimator; CP, coverage probability of the 95% confidence intervals.

Table 2.

Simulation results under Scenario 2

Proportional rate model
Proportional rate ratio model
n γ Bias ESD JSE Biasj CP (β,α) Bias ESD JSE Biasj CP
Low event scenario: # of first type of events=1.48, # of second type of events=1.49
200 γ11 = .2 .000 .214 .212 .003 95.4 β1 = .511 −.073 .424 .393 .013 94.9
γ12 = 0 −.017 .365 .370 −.018 95.0 β2 = .182 .005 .225 .220 −.009 94.0
γ21 = .4 .001 .208 .208 .003 95.0 α1 = .336 −.049 .340 .323 −.030 92.9
γ22 = 0 −.008 .371 .368 −.008 94.9 α2 = .0 −.032 .648 .585 −.039 95.5
400 γ11 = .2 .000 .147 .149 .001 94.7 β1 = .511 −.067 .313 .316 −.004 94.8
γ12 = 0 .009 .266 .262 .009 94.4 β2 = .182 .012 .170 .170 .003 93.7
γ21 = .4 .000 .148 .146 .002 95.3 α1 = .336 −.019 .259 .253 −.015 92.1
γ22 = 0 .013 .269 .262 .013 94.5 α2 = .0 .007 .505 .479 .012 94.6
High event scenario: # of first type of events=1.97, # of second type of events=2.08
200 γ11 = .2 −.003 .203 .202 .001 95.4 β1 = .511 −.070 .405 .374 .016 94.7
γ12 = 0 −.018 .348 .354 −.018 94.4 β2 = .182 .006 .206 .203 −.007 94.4
γ21 = .4 .002 .195 .197 .005 95.4 α1 = .336 −.048 .324 .306 −.032 92.5
γ22 = 0 −.004 .353 .351 −.004 94.8 α2 = .0 −.030 .623 .561 −.037 95.3
400 γ11 = .2 −.001 .144 .143 .001 93.9 β1 = .511 −.062 .301 .295 −.011 94.9
γ12 = 0 .008 .254 .252 .007 93.8 β2 = .182 .010 .161 .155 .002 93.8
γ21 = .4 .001 .141 .139 .002 94.8 α1 = .336 −.017 .250 .239 −.004 92.7
γ22 = 0 .012 .255 .250 .012 94.5 α2 = .0 .003 .490 .457 .009 94.1

Bias, empirical bias; ESD, empirical standard deviation; JSE, standard error estimator by the jackknife method; Biasj, empirical bias of bias corrected Jackknife estimator; CP, coverage probability of the 95% confidence intervals.

Seen from the left panel of Table 1, regression coefficients under the proportional rate models were well estimated with small empirical biases under Scenario 1. The associated empirical standard deviations and estimated standard errors agreed well and the coverage probabilities of the 95% confidence intervals were close to 95%. As expected, the standard errors of estimators decreased with increasing average numbers of recurrent events and increasing sample sizes. For the parameters under the proportional rate ratio model, the empirical biases were relatively large with a small sample (n=200), but these biases decreased quickly with increasing sample size (n=400). All coverage probabilities were in a reasonable range from 91.0% to 95.7% . We found that the Jackknife method had a considerable advantage to reduce biases, particularly for small sample sizes, besides that it can accurately capture the variability of the proposed estimating procedure. By comparing the results from low-event and high-event scenarios, we found that the increasing frequency of the recurrent events can improve the precision of the estimated regression coefficients under both the proportional rate and rate ratio models. Note that we included two time-related terms and an un-related continuous covariate X2 under the model, although the degree of dependence between the two recurrent events did not depend on the time and this covariate. The corresponding parameters β2, β3, and α2 were well estimated with empirical biases smaller than 0.025, suggesting that the proposed method performed very well even when the true values of these parameters are zeros.

We also evaluated the performance of the proposed method in the scenarios with time-varying dependence (Scenario 2). We fitted the proportional rate models specified in Equations (1) and (2) of the Supporting Information and the proportional rate ratio model specified in Equation (4) of the Supporting Information. Seen from Table 2, both the estimators and the associated inferences were accurate; the empirical biases were small and coverage probabilities were close to the nominal values. Again, the jackknife method could help reduce the estimation biases for the scenarios with a small sample size (n=200). Under this scenario, the degree of dependency changed with the time of the second event occurrence, besides the covariate X1. The satisfactory performance of estimated β2, α1 and α2 confirmed that the proposed model and method can simultaneously evaluate the time trend and covariate effects on the dependence structure.

The rate ratio function under the commonly used shared random effects model for bivariate recurrent event data (Zhu et al., 2010; Ning et al., 2017) is a constant, determined by the coefficient of variation of the shared random effects (Ning et al., 2015). It implies that the shared random effects models assume that the strength of dependency is a constant over time and can not be affected by covariate. Hence the shared random effects models can not be used to characterize the dependence structure if the constant dependence assumption is violated. We conducted sensitivity studies to evaluate the robustness of the shared random effects model with respect to violations of this assumption. We fitted the semiparametric shared random effects model (Zhu et al., 2010; Ning et al., 2017), and summarized results in Table S2 of the Supporting Information. The results confirmed that the shared random effects model cannot capture the underlying dependence structure between the bivariate recurrent events. For example, the estimated rate ratio under Scenario 1 was 1.56, although the true rate ratios were 1.2 and 2.0 for the two subgroups with X1 = 0 and X1 = 1, respectively. Interestingly, for the two scenarios, the estimated regression coefficients under the marginal rate models had reasonable performance. This suggests that the shared random effects models had a certain robustness in terms of the parameters under the marginal models when the dependence structure between bivariate recurrent events was misspecified.

In summary, the simulation studies suggest that the proposed method can accurately estimate the covariate effects on the frequency and dependence of recurrent events under both time-invariant and time-varying dependence, with small biases and well-estimated standard errors.

3. APPLICATION

Understanding the frequency and dependence structure between different types of cancer recurrence and associated treatment effects may help clinicians and patients to make better treatment decisions. We applied the proposed method to a soft tissue sarcoma study (Cormier et al., 2004). A cohort of 679 patients was identified from two major cancer centers, in which patients may experience local recurrence of sarcoma (in the same or nearby part of the body where the primary cancer occurred) and distant recurrence (in a different part of the body). All patients in the cohort received definitive surgical resection of the tumor as their initial treatment. A clinical question of interest is whether other initial treatment choices, such as chemotherapy and radiation, have an impact on the frequency and the dependence between the two different types of sarcoma recurrence. The follow-up period ranged from 0.01 to 18.57 years, with a median of 4.2 years. During the follow-up, 820 cancer recurrences were observed among the 674 patients. At least one event of local cancer recurrence was experienced by 235 patients, at least one event of distant cancer recurrence was experienced by 411 patients, and multiple events of cancer recurrence were experienced by 271 patients. Of the patients who experienced multiple events, 200 experienced both types of sarcoma recurrence.

We considered three sets of models to evaluate the relationship between initial treatments and the frequency and dependency of sarcoma recurrence after adjusting for patient age at receiving the initial treatments. We have incorporated our previous findings about the time-varying dependence into the baseline rate ratio by dichotomizing the time scales of the local and distant recurrences at the median follow-up time (4.2 years)

Model(1):λ1(s)=λ101(s)exp{γ111I(Chemotherapy)+γ121Age}λ2(t)=λ201(t)exp{γ211I(Chemotherapy)+γ221Age}ρ(s,t)=exp{β11+β21I(s<4.2)+β31I(t<4.2)+β41I(s<4.2,t<4.2)+α11I(Chemotherapy)+α22Age}
Model(2):λ1(s)=λ102(s)exp{γ112I(Radiation)+γ122Age}λ2(t)=λ202(t)exp{γ212I(Radiation)+γ222Age}ρ(s,t)=exp{β12+β22I(s<4.2)+β32I(t<4.2)+β42I(s<4.2,t<4.2)+α12I(Radiation)+α22Age},
Model(3):λ1(s)=λ103(s)exp{γ113I(Chemotherapy)+γ123I(Radiation)+γ133Age}λ2(t)=λ203(t)exp{γ213I(Chemotherapy)+γ223I(Radiation)+γ233Age}ρ(s,t)=exp{β13+β23I(s<4.2)+β33I(t<4.2)+β43I(s<4.2,t<4.2)+α13I(Chemotherapy)+α23I(Radiation)+α33Age},

where λ1(.) and λ2(.) respectively represent the marginal rates of local disease recurrence and distant disease recurrence. In this application, the two types of recurrent events shared the same covariate information (e.g., initial treatment and patients’ characteristics), therefore the assumption of λj(t|xj(t),xj′(s)) = λj(t|xj(t))(jj′ ∈ {1, 2}) was satisfied.

Table 3 summarizes the estimates of parameters under the proportional rate models and the proportional rate ratio model, standard error estimates by the Jackknife method, and the associated p-values obtained by the Wald test. The analytic results of Model (1) suggest that, after adjusting age at baseline, the use of adjuvant chemotherapy with definitive surgical resection decreased the frequencies of local and distant cancer recurrences by exp(γ111)=0.968 and exp(γ211)=0.999, respectively. Also, the use of chemotherapy was associated with a 0.118 increase in the log of rate ratio. That is to say, using the interpretation based on equation (1), the standardized co-occurrence rate of local and distant recurrences in the chemotherapy group is exp(0.118) = 1.125 times of that in the no-chemotherapy group. However, these effects were not statistically significant. The age effects on the frequencies and dependence were not statistically significant either, although older patients tended to have weaker dependency between the two types of cancer recurrence; there was an exp(α21)=0.999 decrease in the rate ratio relative to a one year increase in the age at receiving the initial treatments. Interestingly, the results of Model (2) indicated that the use of adjuvant radiation with definitive surgical resection had opposite effects on frequencies of local and distant recurrences compared to those associated with adjuvant chemotherapy; however, neither of these effects were statistically significant, nor were the effects of the adjuvant radiation on the rate ratio. After including both the adjuvant radiation and chemotherapy into the marginal rate and rate ratio models, we observed similar results with those by Models (1) and (2).

Table 3.

Summary of data analysis for soft tissue sarcoma

Parameter Estimate JSE Wald p-value

Model (1)
Chemotherapy effect on rate of local recurrence −0.033 0.127 0.798
Age effect on rate of local recurrence 0.004 0.004 0.270
Chemotherapy effect on rate of distant recurrence −0.001 0.098 0.991
Age effect on rate of distant recurrence −0.004 0.003 0.211
Intercept 1.170 0.215 <0.001
Early local recurrence (I(s<4.2)) −1.036 0.294 <0.001
Early distant recurrence (I(t<4.2)) −0.889 0.291 0.002
Early recurrences of both events (I(s<4.2, t<4.2)) 1.501 0.460 0.001
Chemotherapy effect on rate ratio 0.118 0.115 0.305
Age effect on rate ratio −0.001 0.003 0.739
Model (2)
Radiation effect on rate of local recurrence 0.137 0.138 0.318
Age effect on rate of local recurrence 0.004 0.004 0.240
Radiation effect on rate of distant recurrence 0.006 0.127 0.961
Age effect on rate of distant recurrence −0.004 0.003 0.207
Intercept 1.183 0.209 <0.001
Early local recurrence (I(s<4.2)) −1.044 0.291 <0.001
Early distant recurrence (I(t<4.2)) −0.903 0.291 0.002
Early recurrences of both events (I(s<4.2, t<4.2)) 1.505 0.440 0.001
Radiation effect on rate ratio 0.114 0.140 0.413
Age effect on rate ratio −0.001 0.003 0.647
Model (3)
Chemotherapy effect on rate of local recurrence −0.028 0.167 0.867
Radiation effect on rate of local recurrence 0.137 0.158 0.385
Age effect on rate of local recurrence 0.004 0.004 0.270
Chemotherapy effect on rate of distant recurrence −0.002 0.107 0.985
Radiation effect on rate of distant recurrence 0.006 0.153 0.968
Age effect on rate of distant recurrence −0.004 0.003 0.213
Intercept 1.083 0.279 <0.001
Early local recurrence (I(s<4.2)) −1.043 0.285 <.001
Early distant recurrence (I(t<4.2)) −0.896 0.287 0.002
Early recurrences of both events (I(s<4.2, t<4.2)) 1.510 0.431 <.001
Chemotherapy effect on rate ratio 0.122 0.160 0.445
Radiation effect on rate ratio 0.117 0.196 0.549
Age effect on rate ratio −0.001 0.003 0.764

Similar to our previous findings (Ning et al., 2015), our analysis indicated that, conditional on the initial treatments and baseline age, the dependence between the local and distant cancer recurrences was significantly positive and not constant over time. For example, conditional on the use of adjuvant chemotherapy and baseline age, the risk of early local recurrence and risk of early distant recurrence was positively dependent, with a rate ratio of exp(β11+β21+β31+β41)=2.109 (p-value < 0.001). It indicated a multiplicative increase of 2.109 in the risk of early local cancer recurrence for patients who had early distant cancer recurrence versus those who did not. Alternatively, based on the interpretation for equation (1), this can also be understood as that the standardized co-occurrence rate of early local and early distant disease recurrence is 2.109.

4. DISCUSSION

In this paper, we propose a semiparametric regression model for studying bivariate recurrent event data. The model consists of two levels: the first level employs proportional rate models for marginal rates, and the second level uses a proportional rate ratio model to characterize the dependence structure between the two different types of recurrent events. In the proposed models, we specify the rates and rate ratio (dependence structure) of the recurrent events, but not the full distribution. While the proposed pseudo-partial likelihood may be less efficient than the full likelihood, the specification of the full distribution of bivariate processes generally is notoriously difficult. Furthermore, the assumptions on the dependence structure in the commonly used shared frailty models are usually considered restrictive, in which the dependence is assumed to be constant regardless of the time and covariate values.

In this paper, we extend the partial likelihood method for right-censored single time-to-event data to bivariate recurrent event data to estimate parameters under the proportional rate ratio model. Although the proposed pseudo-partial likelihood involves the conditional probabilities on the bivariate risk sets, the associated computational expenses are reasonable. We used a high-performance computing system deployed at Texas Advanced Computing Center and parallel computing to conduct our simulations and application. For example, in a 200-run simulation for the data under Scenario 2 with a sample size of 200, it took 0.06 hours for the point estimation and 11.54 hours for the variance estimation by the Jackknife method. For the soft sarcoma data, it took 0.57 hours for both the point and variance estimation.

Another advantage of our proposed models is that the marginal rates and rate ratio have separate models, which are not linked by shared random effects. Although the estimated marginal rates play a role in the pseudo-partial likelihood, the rate ratio model and its estimation does not affect the marginal models and the associated estimating results. From this perspective, the proposed model framework is robust and reduces the induced biased inference due to misspecification of the rate ratio model.

One limitation of the proposed inferential procedure is the assumption of non-informative censoring. In many applications, this non-informative censoring assumption may not hold. For regression analysis with single/multiple recurrent events, various approaches have been suggested to accommodate the informative censoring (Wang et al., 2001; Zhao et al., 2012). The inferential procedure for the dependency measure needs to be further generalized to relax the non-informative censoring assumption.

Another challenge of our method is the model specification for the rate ratio. In application, similar to the parametric baseline hazard function, a piece-wise constant function is generally preferred for the baseline rate ratio function, due to its easy interpretation. However, standard diagnostic tools, such as residual plots, cannot be used directly to assess the model’s adequacy. When all covariates are categorical, we can graphically evaluate the goodness-of-fit of the fitted model by the definition of rate ratio. For each subgroup determined by the categorical variables under the model, we respectively estimate the bivariate rate function using its definition and the rate ratio model. Specifically, the two estimators are

λ^12M(s,t)=ρ^(s,t)i=1n0τK1(su1h)dNi1(u1)j=1nI(Cj1u1)i=1n0τK2(tu2h)dNi2(u2)j=1nI(Cj2u2)

and

λ^12J(s,t)=i=1n0τ0τK12(su1h,su2h)dNi1(u1)dNi2(u2)j=1nI(Cj1u1,Cj2u2),

where Kj, j = 1, 2, is a symmetric kernel function and K12 is a bivariate kernel function with bandwidth h. Here, the two marginal rate functions and λ^12J are computed by smoothing the empirical subject-specific rate estimators (Chiang et al., 2005), while λ^12M(s,t) uses the fitted rate ratio by our model. If the two estimated bivariate rate functions do not show clear discrepancy, the assumed model on the rate ratio is reasonable. An alternative way to evaluate model adequacy is to use likelihood-ratio-based inference tools. Here, we use the pseudo-partial likelihood, instead of the full likelihood, for the estimation. To this end, the asymptotic behavior of the pseudo-partial likelihood ratio test for two nested models needed to be thoroughly studied. Developing rigorous statistical tools for model checking is beyond the scope of this paper, and this is a worthy objective for future research.

Supplementary Material

Supplementary

Acknowledgements

The authors thank the editor, associate editor and reviewers for helpful comments and suggestions, which have led to improvements of this article. This work was partially supported by grants from the National Institute of Health (R01CA193878, P30CA016672, and UL1TR003167) and the Andrew Sabin Family Fellowship. The authors acknowledge the Texas Advanced Computing Center at the University of Texas at Austin for providing high performance computing resources that have contributed to the research results reported within this paper.

Footnotes

Supporting Information

Regularity conditions and proofs of Theorem 1 in Section 1, additional simulation details referenced in Section 2, and computational codes with example data sets are available with this paper at the Biometrics website on Wiley Online Library.

Data Availability Statement

We are not able to share the soft tissue sarcoma data per the data sharing policy of MD Anderson Cancer Center. In the Supporting Information, we have included simulated data sets for illustration with the computational code.

References

  1. Cai JW and Schaubel DE (2004). Marginal means/rates models for multiple type recurrent event data. Lifetime Data Analysis 10, 121–138. [DOI] [PubMed] [Google Scholar]
  2. Chiang C-T, Wang M-C, and Huang C-Y (2005). Kernel estimation of rate function for recurrent event data. Scandinavian journal of statistics 32, 77–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cook RJ and Lawless JF (2007). The Statistical Analysis of Recurrent Events. Springer, New York. [Google Scholar]
  4. Cook RJ, Lawless JF, and Lee KA (2010). A copula-based mixed Poisson model for bivariate recurrent events under event-dependent censoring. Statistics in Medicine 29, 694–707. [DOI] [PubMed] [Google Scholar]
  5. Cormier JN, Huang X, Xing Y, Thall PF, Wang X, Benjamin RS, Pollock RE, Antonescu CR, Maki RG, Brennan MF, and Pisters PW (2004). Cohort analysis of patients with localized, high-risk, extremity soft tissue sarcoma treated at two cancer centers: chemotherapy-associated outcomes. The Journal of Clinical Oncology 22, 4567–4574. [DOI] [PubMed] [Google Scholar]
  6. Cox DR (1975). Partial likelihood. Biometrika 62, 269–276. [Google Scholar]
  7. Dalal SR and McIntosh AM (1994). When to stop testing for large software systems with changing code. IEEE Trans. Software Eng 20, 318–323. [Google Scholar]
  8. Fan J and Prentice RL (2002). Covariate-adjusted dependence estimation on a finite bivariate failure time region. Statistica Sinica pages 689–705. [DOI] [PubMed] [Google Scholar]
  9. Lawless JF and Nadeau C (1995). Some simple robust methods for the analysis of recurrent events. Technometrics 37, 158–168. [Google Scholar]
  10. Lin D, Wei L, Yang I, and Ying Z (2000). Semiparametric regression for the mean and rate functions of recurrent events. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 62, 711–730. [Google Scholar]
  11. Ning J, Chen Y, Cai C, Huang X, and Wang M-C (2015). On the dependence structure of bivariate recurrent event processes: inference and estimation. Biometrika pages 345–358. [Google Scholar]
  12. Ning J, Rahbar MH, Choi S, Piao J, Hong C, del Junco DJ, Rahbar E, Fox EE, Holcomb JB, and Wang M-C (2015). Estimating the ratio of multivariate recurrent event rates with application to a blood transfusion study. Statistical methods in medical research page 0962280215593974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Ning J, Rahbar MH, Choi S, Piao J, Hong C, Del Junco DJ, Rahbar E, Fox EE, Holcomb JB, and Wang M-C (2017). Estimating the ratio of multivariate recurrent event rates with application to a blood transfusion study. Statistical methods in medical research 26, 1969–1981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Schaubel DE and Cai JW (2005). Semiparametric methods for clustered recurrent event data. Lifetime Data analysis 11, 405–425. [DOI] [PubMed] [Google Scholar]
  15. Shao J and Wu CJ (1989). A general theory for jackknife variance estimation. The Annals of Statistics pages 1176–1197. [Google Scholar]
  16. Sun L, Zhu L, and Sun J (2009). Regression analysis of multivariate recurrent event data with time-varying covariate effects. Journal of Multivariate Analysis 100, 2214–2223. [Google Scholar]
  17. Ventura V, Cai C, and Kass RE (2005). Statistical assessment of time-varying dependency between two neurons. J Neurophysiol. 94, 2940–7. [DOI] [PubMed] [Google Scholar]
  18. Wang MC, Qin J, and Chiang CT (2001). Analyzing recurrent event data with informative censoring. Journal of the American Statistical Association 96, 1057–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Zhao X, Liu L, and Xu W (2012). Analysis of multivariate recurrent event data with time-dependent covariates and informative censoring. Biometrical Journal 54, 585–595. [DOI] [PubMed] [Google Scholar]
  20. Zhu L, Sun J, Tong X, and Srivastava DK (2010). Regression analysis of multivariate recurrent event data with a dependent terminal event. Lifetime data analysis 16, 478–490 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

RESOURCES