Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Feb 1.
Published in final edited form as: Stat Methods Med Res. 2024 Jan 23;33(2):309–320. doi: 10.1177/09622802231226330

Regression Analysis of Multivariate Recurrent Event Data Allowing Time-Varying Dependence with Application to Stoke Registry Data

Wen Li 1,2, Mohammad H Rahbar 1,2,3, Sean I Savitz 4, Jing Zhang 2,5, Sori Kim Lundin 2,6, Amirali Tahanan 2, Jing Ning 7
PMCID: PMC11080814  NIHMSID: NIHMS1962401  PMID: 38263734

Abstract

In multivariate recurrent event data, each patient may repeatedly experience more than one type of event. Analysis of such data gets further complicated by the time-varying dependence structure among different types of recurrent events. The available literature regarding the joint modeling of multivariate recurrent events assumes a constant dependency over time, which is strict and often violated in practice. To close the knowledge gap, we propose a class of flexible shared random effects models for multivariate recurrent event data, that allow for time-varying dependence to adequately capture complex correlation structures among different types of recurrent events. We developed an expectation-maximization (EM) algorithm for stable and efficient model fitting. Extensive simulation studies demonstrated that the estimators of the proposed approach have satisfactory finite sample performance. We applied the proposed model and the estimating method to data from a cohort of stroke patients identified in the University of Texas Houston Stroke Registry, and evaluated the effects of risk factors and the dependence structure of different types of post-stroke readmission events.

Keywords: EM algorithm, Multivariate recurrent events, Random effects, Survival analysis, Time-varying dependence, Stroke

1. Introduction

Readmission after hospital discharge among stroke patients is commonly used as an indicator to evaluate the quality and efficiency of hospital-level care for stroke.12,13 Due to the complex nature of the disease, hospital readmission data of stroke patients have unique but complicated features. First, readmission events could be a mixture of stroke reoccurrences and other events.14 Second, readmission events could occur repeatedly. These repeated and mixed readmission events may lead to prolonged hospitalization, worsened functional outcomes, and increased mortality.15,16 It has been reported that appropriate medical management could provide substantial benefits in reducing the risk of hospital readmission, and stroke severity.17,18 Therefore, it is imperative to identify patient characteristics associated with different types of readmission events and apply appropriate preventive strategies.

This research is enabled by the availability of the University of Texas Houston Stroke Registry (UTHSR), which collects data on demographics, hospitalization, medication, lab and imaging results of stroke patients from multiple campuses of the Memorial Hermann Health System including 4 comprehensive stroke centers.19 Designed for improving the quality of care for stroke patients, the UTHSR offers valuable data for identifying the risk factors for hospital readmission in stroke patients. We identified a cohort of 20,253 patients encountered between January 2010 and December 2019 from UTHSR, in which patients may have experienced multiple episodes of stroke-related readmissions (including intracerebral hemorrhage, intraventricular hemorrhage, sub-arachnoid hemorrhage, ischemic stroke, and transient ischemic attack) and non-stroke readmissions (including pneumonia, other infection, myocardial infarction, other chest pain, and so on). In existing medical literature, it has been reported that stroke subtype, age, gender, and hospital teaching status are significant predictors of 30-day or 90-day readmission.14,20 However, in these studies, the recurrent nature of readmission events was ignored and only the binary readmission status during a period of time was analyzed. In addition, only a single type of event or a composite endpoint by combining multiple types of events was studied. Although these approaches simplify the analysis, they are not sufficient to fully assess the effects of risk factors on disease progression and the dependence between stroke-related readmissions and other readmission events.

One major challenge in modeling multi-type or multivariate recurrent events data pertains to how to address the dependence between different types of events while studying the role of covariates. Commonly used approaches for multivariate recurrent event data analysis include marginal models2123 and joint models2426 whose strategies of treating dependency are different. Marginal models adopt the method of estimating equations and use the robust sandwich formula to account for the impact of dependency on variance estimation. One limitation of marginal models is that the dependence between different types of recurrent events is not modeled and evaluated. Joint models, on the other hand, use shared random effects to accommodate the dependency, and would be more efficient if the joint models are correctly specified. In existing work of joint models, the dependence between different types of recurrent events is assumed to be constant over time. However, this assumption may not hold in real-life applications and these existing approaches for recurrent event data are not adequate to model data with time-varying dependence. Recent advancements in modeling have considered this temporal variability in dependence for other types of survival data. For instance, models with auto-correlated random effects for univariate, bivariate, and clustered time-to-event data have been proposed based on compound birth-death processes.1,8 The time-varying dependence of bivariate time-to-event data can be subsequently quantified based on Clayton’s cross-ratio,3,6 as shown in Putter and Van Houwelingen.8 Munda et al.5 proposed a time-varying frailty model designed to assess heterogeneity in clustered survival data. While the literature has explored the incorporation of time-varying random effects in the context of univariate recurrent event data,4,7,911 these flexible modeling approaches have not yet been thoroughly investigated for multivariate recurrent events data. To tackle this challenge, we have developed an innovative joint modeling approach explicitly tailored for multivariate recurrent event data allowing for time-varying dependence over time. Our goal is to study the full aspects including the role of covariate effects on the readmission incidence as well as the time-varying dependence.

The remainder of this paper is organized as follows. In Section 2, we describe a flexible shared random effects model in which time-varying dependence is allowed. In Section 3, we adopt the maximum likelihood approach and provide an expectation-maximization (EM) algorithm for the model fitting and inference. In Section 4, we evaluate the performance of the proposed approach by conducting simulation studies. In Section 5, we apply the proposed approach to data from the UTHSR introduced above. Last, we provide a brief discussion of our findings in Section 6.

2. Model

Consider a longitudinal study that consists of n independent subjects and involves K types of recurrent events. Let Nik(t) denote the number of type k events that the ith patient has experienced before time t, t0, k=1,,K, i=1,,n. Let Xi() be a vector of covariates for patient i which may contain time-dependent covariates, and bi=(bi1,,biK)T be a vector of event-specific random effects for patient i. We assume bi follows a multivariate distribution f(b;γ) indexed by a parameter vector of γ. There is some flexibility in the choice of the distribution for the random effects. As highlighted in Zeng and Lin,28 distributions with tails thinner than that of an exponential distribution would meet the necessary conditions for establishing the asymptotic properties for univariate recurrent events. Similar results have been stated for multivariate recurrent events.25 In the simulation studies and the real data application, we have adopted a multivariate normal distribution for bi, featuring a zero mean vector and an unstructured variance-covariance matrix. The multi-type and recurrent event times are usually subject to right censoring. We denote the censoring time of the kth type of recurrent event for subject i as Cik and assume the following conditions:

A1. Conditioning on Xi() and bi, Nik() is a nonstationary Poisson process with the intensity function λik().

A2. Conditioning on Xi() and bi, Ni1(),..., NiK() are independent.

A3. Cik is independent of Nik() and bi given Xi().

To evaluate the effect of the covariate Xi on the frequencies of K types of recurrent events, we assume these K recurrent event processes arise from a shared random effects model through a nonstationary Poisson process with the intensity function λik(t). Specifically, we specify the intensity function λik() for Nik() given bi and the time-specific covariate Xi(t) as follows:

λik(tXi(t),bik)=λ0k(t)exp{h(t;β)bik+αkTXi(t)}, (1)

where λ0k(.) is the unspecified baseline intensity function of event k, h(.) is a pre-specified time-varying function with a vector of parameters β, and αk is vector of event-specific regression parameters. Note that the h(.) function characterizes changes in dependence over time and these changes are usually not dramatic. Consequently, polynomials up to the quadratic term or a piecewise constant function, as demonstrated in Ning et al.,27 suffices to capture the dynamics adequately. In both our simulation studies and data analysis, we have opted for linear functions and piecewise constant functions to balance between model flexibility and model parsimony. In these cases, the function h(t;β) can be specified as βT{hj(t)}j=1p where β=(β1,,βp)T, {hj(t)}j=1p={h1(t),,hp(t)}, and hj(t) denotes a basis function for polynomials or piecewise constant functions.

Conditional on the covariate Xi and the latent variable bi, the intensity function is the same as the rate function due to the independent increment property of Poisson processes. For notational simplicity, the same vector of covariates is used in all intensity functions, but different recurrent events can be regressed on different sets of covariates. Note that, our proposed model is flexible to characterize the dependence structure between different types of recurrent events. Different from the commonly used shared random effects model such as Cook et al.,24 it allows the time-varying dependence among different event processes and can capture the complicated dependence structure illustrated in our motivating study.

Ning et al.27 proposed a dependence measure, called rate ratio, to measure the dependence between two recurrent events. Without loss of generality, for any pair of the event time (s1,s2) with s10 denoting the event time from type 1 recurrent event N1() and s20 denoting the event time from type 2 recurrent event N2(), the rate ratio ρ(s1;s2) is defined as

ρ(s1,s2)=λ12(s1s2)λ1(s1)=λ21(s2s1)λ2(s2),

where λ1(s1) and λ2(s2) are the marginal rate for N1 and N2, respectively, and λ12(s1s2) and λ21(s2s1) are the conditional rates, defined as

λkk(sksk)=limΔ0+P{Nk(sk+Δ)Nk(sk)>0Nk(sk+Δ)Nk(sk)>0}/Δ

with (k=1,k=2) or (k=2,k=1).

Conditional on the covariate Xi and the latent variable bi, the intensity function under model (1) is the same as the rate function due to the independent increment property of Poisson processes. Then the rate ratio between two recurrent events (Ni1(s1), Ni2(s2)) is

ρ(s1,s2)=1+Cov[exp{h(s1;β)bi1},exp{h(s2;β)bi2}]E[exp{h(s1;β)bi1}]E[exp{h(s2;β)bi2}].

The magnitude of the time-varying rate ratio depends on the intra-correlation of the latent variables and the time-varying function h(t;β) as well, indicating our proposed model can capture the non-constant dependency structure between two recurrent event processes. When h(t;β) is constant over time, the rate ratio ρ(s1;s2) degenerates to a constant and our proposed model in (1) is reduced to a commonly used joint model.25

3. Estimation

We denote the observed data of patient i as Oi={Nik(t),Xi(t);tCik,k=1,,K} and all the observed data as O={Oi,i=1,,n}. Under the model specifications and conditional independence assumptions, the log-likelihood of the observed data is:

i=1nlogbif(Oibi)f(bi;γ)dbi=i=1nlogbik=1K(t[λ0k(t)exp{h(t;β)bik+αkTXi(t)}]ΔNik(t)×exp[0Cikexp{h(u;β)bik+αkTXi(u)}dΛ0k(u)])f(bi;γ)dbi,

where θ=(α1T,,αKT,βT,γT), Λ0=(Λ01,,Λ0K), and ΔNik(t) denotes the jump size of Nik at t. Here, Λ0k(t)=0uλ0k(u)du is the cumulative baseline intensity function of event type k. If Λ0k is restricted to be absolutely continuous, the maximum of the above function does not exist. Therefore, following28, we treat Λ0k as a step function with jumps at the observed recurrent event times of type k, and replace λ0k(t) with the jump size of Λ0k at time t, denoted by Λ0k{t}. Hence, we maximize the following modified log-likelihood function

logL(θ,Λ0;O)=i=1nlogbik=1K(t[Λ0k{t}exp{h(t;β)bik+αkTXi(t)}]ΔNik(t)×exp[0Cikexp{h(u;β)bik+αkTXi(u)}dΛ0k(u)])f(bi;γ)dbi, (2)

with respect to θ and Λ0. For identifiability purposes, set β1=1. Denote the resulting nonparametric maximum likelihood estimators as θ^ and Λ^0.

The direct maximization of the log-likelihood function in (2) is computationally challenging since the integration over bi has no closed form and thus we develop an EM algorithm by treating bi as missing data. After initializing unknown parameters, we iterate E-step and M-step as described below until convergence. Inside the M step of the EM algorithm, we extend the recursive formula proposed in Zeng and Lin28 to reduce a large number of unknown parameters to only K parameters in the estimation of the baseline intensity functions.

E-step:

In the E-step, we compute the conditional expectation of the complete data log-likelihood with respect to bi given the observed data and the parameter estimates from the previous iteration. Let mk denote the total number of events of type k, and ω1,k<ω2,k<ω3,k<<ωmk,k denote the ordered event times of type k. Let mik denote the total number of the observed events before censoring for the kth type of event for subject i, and tik1tikmik denote these mik event times, and b={bi,i=1,,n}. The complete data log-likelihood given (O,b) takes the form:

logL(θ,Λ0;O,b)=i=1nk=1K(l=1mik[logΛ0k{tikl}+h(tikl;β)bik+αkTXi(tikl)]j=1mk[I(ωj,kCik)exp{h(ωj,k;β)bik}exp{αkTXi(ωj,k)}Λ0k{ωj,k}])+i=1nlogf(bi;γ). (3)

Let g(bi) be some function g() of bi involved in (3). To simplify notation, we denote the conditional expectation E{g(bi)θ^(s),Λ^0(s),Oi} as E^{g(bi)}, where θ^(s), Λ^0(s) are parameter estimates in the sth iteration. We have

E^{g(bi)}=[big(bi)k=1K{l=1mik[exp{h(tikl;β^(s))bik}]×exp(j=1mk[I(ωj,kCik)exp{h(ωj,k;β^(s))bik+α^k(s)TXi(ωj,k)}Λ^0k(s){ωj,k}])}×f(bi;γ^(s))dbi]×[bik=1K{l=1mik[exp{h(tikl;β^(s))bik}]×exp(j=1mk[I(ωj,kCik)exp{h(ωj,k;β^(s))bik+α^k(s)TXi(ωj,k)}Λ^0k(s){ωj,k}])}×f(bi;γ^(s))dbi]1. (4)

The integration in (4) can be accomplished by using numerical integration methods such as Gauss Hermite Quadrature or Monte-Carlo simulation.

M-step:

In the M-step, we maximize the conditional expectation of the complete data log-likelihood function from E-step to update the estimates of the parameters of interest.

M(θ,Λ0)=i=1nk=1K(l=1mik[logΛ0k{tikl}+h(tikl;β)E^{bik}+αkTXi(tikl)]j=1mk[I(ωj,kCik)E^{exp{h(ωj,k;β)bik}}exp{αkTXi(ωj,k)}Λ0k{ωj,k}])+i=1nE^{logf(bi;γ)}. (5)

Note that updating γ in the M-step only relies on i=1nE^{logf(bi;γ)}, and thus it can be done separately and straightforwardly. Next, we update the other parameters Λ0, β, and α in the M-step. Define fk(t)=Λ0k{t}/Λ0k(ωmk,k), as the scaled jump size. Because Λ0k(t)=Λ0k{ω1,k}++Λ0k{ωj,k} for ωj,kt<ωj+1,k, we have j=1mkfk(ωj,k)=1. Let α˜k={logΛ01(ωm1,1),,logΛ0K(ωmK,K),αkT}T, and X˜i={I(k=1),,I(k=K),XiT}T. Subsequently, the objective function turns into

M(f,β,α˜)=i=1nk=1K(l=1mik[logfk(tikl)+E^{h(tikl;β)bik+α˜kTX˜i(tikl)}]j=1mk[I(ωj,kCik)E^(exp{h(ωj,k;β)bik+α˜kTX˜i(ωj,k)})fk(ωj,k)]),

with constraints j=1mkfk(ωj,k)=1,k=1,,K. For notational simplicity, we let f˜j,k=fk(ωj,k).

The objective function is then

M(f,β,α˜)=k=1Kj=1mklogf˜j,k+i=1nk=1K(l=1mik[E^{h(tikl;β)bik+α˜kTX˜i(tikl)}]j=1mk[I(ωj,kCik)E^(exp{h(ωj,k;β)bik+α˜kTX˜i(ωj,k)})f˜j,k]).

In the above equation, we have used the equality between i=1nk=1Kl=1miklogfk(tikl) and k=1Kj=1mklogf˜j,k. They are equal based on the following reasons. First, fk(tikl) is the scaled jump size at time tikl for event type k, and tikl denotes one of the totally mik event times for the ith subject within event type k. Therefore, the inner summation in i=1nl=1miklogfk(tikl) is over the mik event times for the ith subject within the kth type, and the outer summation is over all n subjects. Second, f˜j,k is the shorthand for fk(ωj,k), which stands for the scaled jump size at event time ωj,k, and ωj,k is one of the totally mk event times for the event type k. So, the summation in j=1mklogf˜j,k is over the mk event times for the kth type of event. Note that based on the notation setup we have mk=inmik. Thus i=1nl=1miklogfk(tikl) is equivalent to j=1mklogf˜j,k, and accordingly, i=1nk=1Kl=1miklogfk(tikl) is equivalent to k1Kj=1mklogf˜j,k.

To maximize M(f,β,α˜) subject to the constraints, we introduce the Lagrange multipliers μ1,,μK. Then we solve the following score equations:

i=1nk=1K(l=1mik[E^{{hν(tikl)}ν=1pbik}]j=1mk[I(ωj,kCik)E^(exp{h(ωj,k;β)bik+α˜kTX˜i(ωj,k)}{hν(ωj,k}ν=1pbik)f˜j,k])=0, (6)
i=1nk=1K(l=1mik{X˜i(tikl)}j=1mk[I(ωj,kCik)E^(exp{h(ωj,k;β)bik+α˜kTX˜i(ωj,k)}X˜i(ωj,k))f˜j,k])=0, (7)
1f˜j,ki=1n[I(ωj,kCik)E^(exp{h(ωj,k;β)bik+α˜kTX˜i(ωj,k)})]μk=0, (8)

j=1,,mk, k=1,,K, and

j=1mkf˜j,k1=0,k=1,,K. (9)

From (8), we have

1f˜j,k=1f˜j+1,k+i=1n[I(ωj,kCik)E^(exp{h(ωj,k;β)bik+α˜kTX˜i(ωj,k)})]i=1n[I(ωj+1,kCik)E^(exp{h(ωj+1,k;β)bik+α˜kTX˜i(ωj+1,k)})]. (10)

Notice that equation (10) is a recursive formula for calculating f˜mk1,k,,f˜1,k from f˜mk,k. If f˜mk1,k,,f˜1,k are treated as functions of f˜mk,k, β, and α˜k, then solving the above system of score equations is equivalent to solving (6), (7), and (9). We then apply the Newton-Raphson method to solve the equations. In essence, we have reduced a large number of equations by using the recursive formula, which allows us to work on only a few parameters in the M-step. The maximizer from the M-step is then denoted as θ^(s+1), Λ^0(s+1) in the (s+1)th iteration.

The variance estimation of the parameters can be obtained by taking the inverse of the observed information matrix and follow the formula given in Louis,29 which requires the computation of the first and second derivatives of the complete-data log-likelihood. Specifically, we obtain the observed information matrix by i=1nE^{2logLi(θ,Λ;bi,Oi)}i=1n[E^{logLi(θ,Λ;bi,Oi)2}E^{logLi(θ,Λ;bi,Oi)}2]. Here, logLi(θ,Λ;bi,Oi) denotes the complete-data log-likelihood corresponding to subject i, a2=aaT, and and 2 are the first and second derivatives with respect to θ and Λ0. The asymptotic properties of the estimates of the regression coefficients and the baseline cumulative intensity functions have been rigorously studied in Zeng and Lin, and Zhu et al.25,28 Let θ˜ and Λ˜0(t) denote the true values of θ and Λ0(t). Following the similar arguments, we can show that n(θ^θ˜,Λ^0Λ˜0) converges weakly to a mean-0 Gaussian process.

4. Simulation studies

In the simulation studies, we examined the finite sample performance of the proposed method in different scenarios with both constant and time-varying dependence structures.

Scenario 1 (Constant dependence):

We generated bivariate recurrent event times from the Poisson process with the intensity function λik(tXi1,Xi2,bik)=λ0k(t)exp(bik+αk1Xi1+αk2Xi2), k=1,2, where Xi1 followed a Bernoulli(0.5) distribution, Xi2 followed a standard normal distribution, α11=0.5, α12=0.5, α21=1, and α22=0. To introduce the dependence between two different types of recurrent events, we generated the shared random variable bi from a multivariate normal distribution with mean (0,0)T, and covariance matrix (0.250.200.200.25). By the equation in Section 2, the rate ratio ρ(s1,s2)=1.221 under this setting, indicating a positive and stationary dependence between the two types of events.

We considered both relatively low and moderate frequencies of recurrent events. We specified the baseline intensity function λ0k(t) as a linear function of time and tuned the intercept and slope parameters to achieve different recurrence frequencies. Specifically, we chose the baseline intensity functions to be λ01=0.005+0.001t, and λ02=0.01+0.001t for a low recurrence frequency; and λ01=0.01+0.002t, and λ02=0.02+0.002t for a moderate frequency. Last, we generated an independent censoring time for the recurrent events from the uniform [10,τ] distribution, where τ=30. For each setting, we considered two sample sizes: n=200 and 400.

For each of the simulated datasets, we applied the proposed model in equation (1) and the estimation procedure detailed in Section 3. We specified h(t;β)=β1+β2t/τ, where β=(β1,β2), and β1=1. As a result, the true value for β2 is 0 under the scenario with constant dependency. We set starting value for each of the regression coefficients to 0, that of the covariance matrix to (1001), and that of each of the jump sizes of Λ0k to 1/mk.

Figure 1 and Table S1 in the Supplementary Materials present the simulation results including the empirical means, empirical biases, empirical standard errors, estimated standard errors, and coverage probabilities of the 95% confidence intervals based on 1000 replicates of simulations. When the sample size changed from 200 to 400, both bias and standard error of β2 decreased as expected. Although the standard error of β2 was overestimated when the sample size was small and the event rate was low, the resulting coverage probability was still close to the nominal level. This means that the variance estimation worked well. When the event rate increased, there was a slightly decreasing trend in bias and standard error of β2 as more recurrent events were available for estimation.

Figure 1.

Figure 1.

Simulation results under Scenario 1 with time-invariant dependence. The upper panel plots bias ± standard error (SE) and a horizontal solid black line at 0. The lower panel plots coverage probability (CP) and a horizontal solid black line at 0.95, the nominal level. In the low frequency setting, the average numbers of recurrent events per subject are 1.4 and 2.0 for the two processes, respectively. In the moderate frequency setting, they are 2.1 and 3.0. β1 is set to 1 for identifiability.

The remaining parameters α11, α12, α21, α22, Λ01(), and Λ02() were also estimated well with empirical biases below 0.02 and coverage probabilities close to 0.95 in all settings. When the event rate increased, their standard errors decreased slightly. As expected, the standard error of the baseline cumulative intensity function increased over time as less information was available.

Scenario 2 (Time-varying dependence):

We generated bivariate recurrent events such that the dependence was strong in the beginning and became weak later, to mimic the varying dependence structure we would expect in the motivating study. Specifically, the true intensity function was λik(tXi1,Xi2,bik)=λ0k(t)exp{h(t;β)bik+αk1Xi1+αk2Xi2}, k=1,2, where h(t;β)=1+β2I(t12) and β2=0.3. We set λ01=0.015+0.002t and λ02=0.016+0.002t for the low frequency setting; and set λ01=0.023+0.002t and λ02=0.023+0.003t for the moderate frequency setting. The other aspects of the data configuration were the same as those in the constant dependence scenario. In this scenario, the rate ratio ρ(s1,s2)=1.402 when s1,s2(0,12]; 1.221 when s1,s2(12,30); and 1.296 otherwise, i.e., when (s1(0,12], s2(12,30)) or (s1(12,30), s2(0,12]), indicating a positive and decreasing dependence over time.

To each simulated dataset we fit the proposed method in which h(t;β)=1+β2I(t12). We set starting values to those specified in the constant dependence settings except for β2, which was 0.1 instead. Figure 2 and Table S2 in the Supplementary Materials summarize the simulation results based on 1000 simulations for different combinations of sample sizes and event frequencies under the time-varying dependence scenario. The parameter β2 was estimated well with the empirical bias as low as 0.027 when the sample size was 400 and the event rate was moderate. Considering that the initial value was set to a number (0.1) different from the truth (0.3), the proposed method could satisfactorily capture the true time-varying dependence. We observed similar patterns of estimators of other parameters compared with those under the scenario with constant dependency.

Figure 2.

Figure 2.

Simulation results under Scenario 2 with time-varying dependence. The upper panel plots bias ± standard error (SE) and a horizontal solid black line at 0. The lower panel plots coverage probability (CP) and a horizontal solid black line at 0.95, the nominal level. In the low frequency setting, the average numbers of recurrent events per subject are 1.2 and 1.8 for the two processes, respectively. In the moderate frequency setting, they are 1.8 and 2.7. β1 is set to 1 for identifiability.

Scenario 3 (Higher baseline intensity):

Different baseline intensity functions were used, resulting in an S-shaped baseline intensity with a steep increase after year 5 and a plateau at around year 15. In particular, λ01(t)=λ02(t)=(2/3)|log(1+exp(t/25))0.004. The maximum baseline intensity was 0.30, a significant contrast to the 0.11 used in Scenario 2. The other aspects of the data generation schemes were the same as those used in Scenario 2. The simulation results for n=200 & n=400 are shown in Table S3 in the Supplementary Materials. The proposed estimation procedure again yielded estimates with negligible biases and coverage probabilities close to the nominal level.

To examine whether the initial values in the EM algorithm heavily impacted the optimization, we repeated the estimation process with different random starting values. Table S4 in the Supplementary Materials provides the β estimates using different starting values on the same data set under Scenario 1 with a sample size of 400 and average event rates of 1.4 and 2.0. The starting values ranged from −1 to 1, and the resulting estimates were identical up to the third digit after the decimal point. This demonstrates that the recursive formula along with the optimization algorithm in Section 3 exhibits robustness and insensitivity to initial values. Note that a big starting value may make the term exp{h(t;β)bik} explode to infinity. In practice, we recommend using zero as the starting value for the estimation of β. Table S5 in the Supplementary Materials summarizes the convergence performance in the simulation studies. The algorithm had an overall good convergence rate, particularly with larger sample sizes, such as n=400.

5. Data application

Understanding the frequency and dependence between stroke-related readmissions and other readmission events helps provide clinical guidance on how to effectively apply preventive strategies to stroke patients. We applied the proposed method to data from the UTHSR cohort introduced in Section 1. Time zero of the analysis was set as the time at the first admission. Age at the first admission or equivalently age at enrollment of our cohort was used as a baseline covariate. The average age in the cohort was 63 years. A total of 2,055 recurrences were observed during the follow-up, and the number of recurrences per patient ranged from 0 to 6 (median=0). Eight hundred and eighty-four patients experienced at least one stroke-related readmission, 900 patients experienced at least one non-stroke readmission, and 299 patients experienced multiple readmissions. Of these who had multiple encounters of hospital readmissions, 122 had both types of readmissions, 101 had at least two stroke-related readmissions, and 132 had at least two non-stroke related readmissions. Forty-eight percent of readmissions happened within half a year and 60% readmissions in one year.

To examine the effects of Age on incidence of readmission events and the possibly time-varying dependence structure between stroke-related and non-stroke readmissions, we specified the following model:

λik(tAgei,bik)=λ0k(t)exp{h(t;β)bik+αkTAgei}, (11)

where Age was scaled to be age in ten years, and k=1,2 indexed stroke-related readmissions and non-stroke readmissions, respectively. We dichotomized the time scale at the median of the pooled event times and assessed the dependence on the generated two-bin time grid:

h(t;β)=1+β2I(tτ0), (12)

where τ0=0.628 years. Table 1 presents the parameter estimates, standard errors, and p-values from the Wald test. Overall, the degree of dependence between stroke-related readmissions and non-stroke readmissions was positive, but the strength of the dependence decreased over time (h(t;β) was 1 before 0.628 years and was 0.771 after 0.628 years). The p-value for β2 was 0.17, suggesting the decreasing trend in the dependence structure between the two types of readmissions was not significant. Interestingly, the age effects on the incidences of the two types of events were opposite. A ten-year increase in age corresponded to a 6.8% increase (exp(0.066) = 1.068) in the intensity of stroke-related readmissions but an 18.2% decrease (exp(−0.201) = 0.818) in the intensity of non-stroke readmissions. To have a better understanding of the dependence structure, we calculated the rate ratio:

ρ(s1,s2)={1.031whens1,s212,1.018whens1,s2>12,1.023when(s112,s2>12)or(s1>12,s212).

Table 1.

Analysis Results for the Stroke Registry Data

Parameter Estimate Standard error p-value

β2 −0.229 0.166 0.17
α1 0.066 0.030 0.03
α2 −0.201 0.029 < 0.001

Stroke and non-stroke readmissions were positively dependent, as evidenced by ρ(s1,s2)>1, but the association was weak. For example, the rate ratio was 1.031 between early stroke readmission and early non-stroke readmission. This finding implies that the risk of having at least one early stroke readmission increased by 3.1% for patients who had an early non-stroke readmission compared to those who did not. We also plotted the estimated baseline cumulative intensity functions in Figure 3.

Figure 3.

Figure 3.

Estimated baseline cumulative intensity functions of stroke-related readmission (solid) and non-stoke readmission (dashed).

In addition, we present a conventional analysis using only the information related to the time to the first readmission. We fitted two Cox proportional hazards models separately for time to first stroke-related readmission and for time to first non-stroke readmission. The results are shown in Table S6 in the Supplementary Materials. Similarly, the baseline age was found to be a risk factor for stroke-related readmission and a protective factor for non-stroke readmission. Specifically, there was a 7.4% increase (exp(0.071) = 1.074) in the hazard of the first stroke-related readmission and a 15.7% decrease (exp(−0.170) = 0.843) in the hazard of the first non-stroke readmission relative to a ten-year increase in age. However, as discussed previously, the conventional analysis cannot quantify the dependence between the two types of readmission events and may lose power by ignoring subsequent recurrent events. This limitation underscores the significance of our proposed, more sophisticated model.

6. Discussion

In this work, we propose a class of flexible shared random effects models for multivariate recurrent event data. The proposed framework relaxes the restrictive assumption in the commonly used shared random effects models and allows for time-varying dependence between different types of recurrent events. A stable and computationally efficient EM algorithm is developed for estimation where the functional form of the baseline cumulative intensity function is left unspecified. Even though the number of unknown parameters in the baseline intensity function can be several hundreds (such as the case in the simulation studies and the stroke data application) or more, the computation is relatively fast partly due to the use of the recursive formula in the M step. The CPU time of a desktop with 1.80GHz CPU was on average 12 minutes for one simulation run under Scenario 1 with a sample size of 200 and moderate event rate. The computational time can be further reduced if parallel computing is enabled. Although the proposed model is motivated by bivariate recurrent hospital readmission data for stroke patients, it can be easily applied to other disease settings with a similar data structure. Our proposed method can be extended to the setting with dependent terminal events by addressing an additional challenge due to the dependence between recurrent events and terminal events.25,30

The UTHSR cohort data used in our analysis comes from a local stroke registry database within one hospital system, and may not be able to capture all readmissions for stroke patients. The intensity functions of the two recurrent events consequently tended to be underestimated. Therefore, the findings should be validated by using regional or national databases that could accurately capture all readmissions for stroke patients.

Supplementary Material

Supplementary

Acknowledgements

We acknowledge the support of the National Institute of Neurological Disorders and Stroke (NINDS) for providing funding through a grant #: 1R03NS111178-01A1. We also acknowledge the support provided by the Biostatistics/Epidemiology/Research Design (BERD) component of the Center for Clinical and Translational Sciences (CCTS) for this project. CCTS is mainly funded by the NIH Centers for Translational Science Award (NIH CTSA) grant (UL1 RR024148), awarded to University of Texas Health Science Center at Houston in 2006 by the National Center for Research Resources (NCRR), and its 2012 renewal (UL1 TR000371) as well as another 2019 grant (UL1TR003167) by the National Center for Advancing Translational Sciences (NCATS), awarded to the University of Texas Health Science Center at Houston. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NINDS, NCRR, or NCATS. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the results reported in this paper.

Footnotes

Supplementary material

Supplementary material is available online at https://journals.sagepub.com/home/SMM.

Software

Software in the form of R codes that implement the proposed method is available online at https://github.com/liwenmoi/Regression-Analysis-of-Multivariate-Recurrent-Events.

References

  • 1.Balan TA (2017). dynfrail: Fitting Dynamic Frailty Models with the EM Algorithm. R package version 0.5.2. [Google Scholar]
  • 2.Bandeen-Roche K. and Ning J. (2008). Nonparametric estimation of bivariate failure time associations in the presence of a competing risk. Biometrika, 95(1):221–232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Clayton DG (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65(1):141–151. [Google Scholar]
  • 4.Fong DY, Lam K, Lawless J, and Lee Y. (2001). Dynamic random effects models for times between repeated events. Lifetime data analysis, 7:345–362. [DOI] [PubMed] [Google Scholar]
  • 5.Munda M, Legrand C, Duchateau L, and Janssen P. (2016). Testing for decreasing heterogeneity in a new time-varying frailty model. Test, 25:591–606. [Google Scholar]
  • 6.Oakes D. (1989). Bivariate survival models induced by frailties. Journal of the American Statistical Association, 84(406):487–493. [Google Scholar]
  • 7.Pennell ML and Dunson DB (2006). Bayesian semiparametric dynamic frailty models for multiple event time data. Biometrics, 62(4):1044–1052. [DOI] [PubMed] [Google Scholar]
  • 8.Putter H. and Van Houwelingen HC (2015). Dynamic frailty models based on compound birth–death processes. Biostatistics, 16(3):550–564. [DOI] [PubMed] [Google Scholar]
  • 9.Tallarita M, De Iorio M, Guglielmi A, and Malone-Lee J. (2019). Bayesian autoregressive frailty models for inference in recurrent events. The International Journal of Biostatistics, 16(1):20180088. [DOI] [PubMed] [Google Scholar]
  • 10.Yau K. and McGilchrist C. (1998). Ml and reml estimation in survival analysis with time dependent correlated frailty. Statistics in Medicine, 17(11):1201–1213. [DOI] [PubMed] [Google Scholar]
  • 11.Yue H. and Chan K. (1997). A dynamic frailty model for multivariate survival data. Biometrics, pages 785–793. [PubMed] [Google Scholar]
  • 12.Cox ZL, Lai P, Lewis CM et al. Centers for medicare and medicaid services’ readmission reports inaccurately describe an institution’s decompensated heart failure admissions. Clinical Cardiology 2017; 40(9): 620–625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.McIlvennan CK, Eapen ZJ and Allen LA. Hospital readmissions reduction program. Circulation 2015; 131(20): 1796–1803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bambhroliya AB, Donnelly JP, Thomas EJ et al. Estimates and temporal trend for US nationwide 30-day hospital readmission among patients with ischemic and hemorrhagic stroke. JAMA Network Open 2018; 1(4): e181190–e181190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jerrgensen H, Nakayama H, Reith J et al. Stroke recurrence: predictors, severity, and prognosis. the copenhagen stroke study. Neurology 1997; 48(4): 891–895. [DOI] [PubMed] [Google Scholar]
  • 16.Arsava EM, Kim GM, Oliveira-Filho J et al. Prediction of early recurrence after acute ischemic stroke. JAMA Neurology 2016; 73(4): 396–401. [DOI] [PubMed] [Google Scholar]
  • 17.Rothwell PM, Coull A, Giles M et al. Change in stroke incidence, mortality, case-fatality, severity, and risk factors in Oxfordshire, UK from 1981 to 2004 (Oxford vascular study). The Lancet 2004; 363(9425): 1925–1933. [DOI] [PubMed] [Google Scholar]
  • 18.Boan AD, Lackland DT and Ovbiagele B. Lowering of blood pressure for recurrent stroke prevention. Stroke 2014; 45(8): 2506–2513. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rahbar MH, Gonzales NR, Ardjomand-Hessabi M et al. the University of Texas Houston stroke registry (UTHSR): implementation of enhanced data quality assurance procedures improves data quality. BMC neurology 2013; 13(1): 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kilkenny MF, Dalli LL, Kim J et al. Factors associated with 90-day readmission after stroke or transient ischemic attack: linked data from the australian stroke clinical registry. Stroke 2020; 51(2): 571–578. [DOI] [PubMed] [Google Scholar]
  • 21.Cai J and Schaubel DE. Marginal means/rates models for multiple type recurrent event data. Lifetime Data Analysis 2004; 10(2): 121–138. [DOI] [PubMed] [Google Scholar]
  • 22.Schaubel DE and Cai J. Semiparametric methods for clustered recurrent event data. Lifetime Data Analysis 2005; 11(3): 405–425. [DOI] [PubMed] [Google Scholar]
  • 23.Chen X, Wang Q, Cai J et al. Semiparametric additive marginal regression models for multiple type recurrent events. Lifetime Data Analysis 2012; 18(4): 504–527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Cook RJ, Lawless JF and Lee KA. A copula-based mixed Poisson model for bivariate recurrent events under event-dependent censoring. Statistics in Medicine 2010; 29(6): 694–707. [DOI] [PubMed] [Google Scholar]
  • 25.Zhu L, Sun J, Tong X et al. Regression analysis of multivariate recurrent event data with a dependent terminal event. Lifetime Data Analysis 2010; 16(4): 478–490. [DOI] [PubMed] [Google Scholar]
  • 26.Abu-Libdeh H, Turnbull BW and Clark LC. Analysis of multi-type recurrent events in longitudinal studies; application to a skin cancer prevention trial. Biometrics 1990; : 1017–1034. [PubMed] [Google Scholar]
  • 27.Ning J, Chen Y, Cai C et al. On the dependence structure of bivariate recurrent event processes: inference and estimation. Biometrika 2015; 102(2): 345–358. [Google Scholar]
  • 28.Zeng D and Lin D. Semiparametric transformation models with random effects for recurrent events. Journal of the American Statistical Association 2007; 102(477): 167–180. [Google Scholar]
  • 29.Louis TA. Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 1982; 44(2): 226–233. [Google Scholar]
  • 30.Huang CY and Wang MC. Joint modeling and estimation for recurrent event processes and failure time data. Journal of the American Statistical Association 2004; 99(468): 1153–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

RESOURCES