Abstract
Joint models can describe the relationship between recurrent and terminal events. Typically, recurrent events are modeled using the total time scale, assuming constant covariate effects on each recurrent event. However, modeling the gap time between recurrent events could allow varying covariate effects and offer greater flexibility and accuracy. For instance, in HIV-infected patients, the intervals between the first occurrence of opportunistic infections (OIs) may follow a different distribution compared to later OIs. However, limited research has focused on mediation analysis using joint modeling of gap times and survival time. In this work, we propose a novel joint modeling approach that studies the mediation effect of recurrent events on survival outcomes by modeling the recurrent events by gap time. This allows us to handle cases where the first occurrence of a recurrent event behaves differently from subsequent events. Additionally, we use a relaxed “sequential ignorability” assumption to address unmeasured confounding. Simulation studies demonstrate that our model performs well in estimating both model parameters and mediation effects. We apply our method to an AIDS study to evaluate the comparative effectiveness of two treatments and the effect of baseline CD4 counts on overall survival, mediated by recurrent opportunistic infections modeled through gap times.
Keywords: Causal inference, Frailty model, Mediation analysis, Survival data
1. Introduction
Recurrent events, which occur repeatedly over time, are prevalent in many biomedical and clinical studies. For example, recurrent hospitalizations in heart failure patients, multiple tumor recurrences in cancer patients, recurrent opportunistic infections (OIs) in HIV-infected individuals, and recurrent joint infections in patients who received joint replacement surgeries (Lahoz et al., 2022; Avanzini and Antal, 2019; Lu et al., 2017; Justiz Vaillant and Naik, 2024). Understanding these events can provide valuable insights into the progression of diseases and the effectiveness of treatments. In particular, when studying survival outcomes such as death, analyzing the association between recurrence events and survival outcomes becomes critical for understanding long-term patient outcomes and developing effective interventions.
For instance, HIV-infected patients have a higher chance of developing various OIs throughout their lifetime (Justiz Vaillant and Naik, 2024). These infections can recur either as the same type or as different types, appearing at various stages of HIV progression. It has been demonstrated that susceptibility to OIs is closely linked to CD4 T cell counts. The frequency and severity of OIs rise as CD4 T cell levels drop toward 200 or lower, and the number of OIs has also been associated with patient survival (French et al., 2007; Jayani et al., 2020). Typically, HIV-associated OIs manifest 7–10 years after the initial HIV infection. However, once the first OI is diagnosed, the patient’s immune system is significantly damaged, increasing the risk of subsequent OIs (Bielick et al., 2024). This highlights that the occurrence of initial and subsequent OIs may follow different distributions. Understanding the relationship between CD4 T cell counts, OIs, and survival outcomes is of significant interest.
In the Community Programs for Clinical Research on AIDS (CPCRA) study, individuals with or without previous AIDS-defining conditions (PrevOI) were randomized to receive two reverse transcriptase inhibitors (RTIs), didanosine (ddI) or zalcitabine (ddC). During the follow-up, patients developed multiple (up to 5) OIs, and around 40.3% of patients died before the end of the study (Abrams et al., 1994; Neaton et al., 1994). Previous research by Niu et al. (2023) utilized a joint frailty model of recurrent OIs and survival outcomes for causal mediation analysis. They showed that the number of OIs significantly mediated the effect of baseline CD4 counts on patient survival. However, their approach used the total time model for the recurrent event, which overlooked the sequential order of OIs and did not consider the potential dependence of risk of a specific recurrence event and the history of recurrence events. Such model misspecification might introduce bias in the estimation of model parameters as well as mediation effects. A more robust model is needed to account for the differences between the initial and subsequent OIs.
Joint modeling is frequently employed to examine the relationship between recurrent events and survival outcomes, given its capacity to account for the dependency between different end-points. Aalen (1978) established the theoretical foundation for modeling recurrent events through the development of counting processes and martingales, offering a robust mathematical framework. Later, the Cox proportional hazards model was extended using a counting process approach to handle recurrent events, introducing the concept of the intensity function (Andersen and Gill, 1982).
When modeling the intensity function, two primary time scales are often considered: the total-time scale, measured from the study’s origin, and the gap-time scale, which focuses on the time between consecutive events. Models based on the total-time scale typically assume that covariate effects remain constant across all recurrent events (Rondeau, 2010; Huang and Wang, 2004). However, in many applications, such as HIV research, the gap time between events can be more informative, offering greater flexibility and model accuracy. For example, in HIV-infected patients, after the first OI, subsequent OIs may exhibit different distributions due to the patient’s declining health. Evaluating recurrent events with a constant effect assumption may introduce bias, whereas gap-time models allow for dynamic changes in covariate effects, better reflecting the progression of events (Soh and Huang, 2021). Huang and Liu (2007) proposed a joint model for gap times and survival time and allowed different sets of parameters for the effects of the covariates on gap times and survival, which can estimate the distribution of survival and gap times. Despite these advantages, limited research has focused on joint modeling using gap-time scales, especially in the mediation analysis.
In causal inference, the natural direct effect (NDE) and natural indirect effect (NIE) are key estimands that help to understand the pathways through which an exposure influences an outcome. When both recurrent events and terminal events are considered, the estimation of NDE and NIE captures the direct effects of exposure on survival time and the mediation effects through recurrent events. Several studies have explored the causal mechanism involving recurrent and terminal events. For example, Liu et al. (2018) proposed a joint model for both events to investigate the causality mechanism but primarily focused on comparing different models rather than quantifying the NDE and NIE. Zheng and Zhou (2017) defined the random interventions and proposed a model to estimate the NDE and NIE in longitudinal mediation analysis with time-varying mediators and exposures. Subsequently, Niu et al. (2023) applied mediation analysis to estimate NDE and NIE using joint models for recurrent and terminal events under relaxed sequential ignorability assumptions. However, none of these studies considered gap time to model recurrent events.
Here we summarize the main contribution of our paper. First, we propose a novel joint frailty gap-time based model to separate the causal effect of the recurrent event on the survival outcome and the baseline unmeasured confounding between the recurrent event and the survival outcome. Unlike traditional gap time model-based joint model work, this paper is the first one that allows both random effects and the function of gap times in the survival model, which thus allows researchers to separate the causal effect and baseline unmeasured confounding effect. Second, we allow flexible modeling of the covariate effect in the gap time model. Unlike the Anderson-Gill type total time model used in previous work with similar causal framework (Niu et al., 2023), the new model allows the first gap time, the second gap time, and subsequent ones to have different covariate effects and thus is more flexible and allows an easy model of the dependency of the non-constant risk of certain recurrence on the history of recurrent event time and potenntial time-varying covariates. Third, with a modified sequential ignorability assumption, the new model allows us to handle time-varying confounders that are not exposure induced, while the previous method (Niu et al., 2023) can only handle baseline covariates. In the new model, the gap times are assumed to be independent of each other and of survival time, conditional on frailty and covariates, which can be both time-independent or time-dependent. The new model provides an alternative tool to performing mediation analysis with recurrent event mediators and survival outcomes. Fourth, our reanalysis of the CPCRA study using the new flexible model showed the robustness of our previous conclusion.
The paper is organized as follows: Section 2 introduces the notations and provides the formula for calculating the NDE and NIE, along with a discussion of the numerical computation approach. Section 3 presents simulation studies to explore the finite sample performance of the proposed method, and evaluates its robustness against model misspecification when compared with the method in Niu et al. (2023). In Section 4, we apply the method to reanalyze the CPCRA study, evaluating the causal effect of recurrent OIs on the terminal event (death). We investigate how the comparative effectiveness of the two treatments (ddI/ddC) is mediated by the occurrence of opportunistic infections, and assess how the effect of baseline CD4 counts on the overall survival is mediated by changes in OIs recurrence patterns. In Section 5, we conclude with a summary of key findings and discuss potential avenues for future research.
2. Method
We denote the “treatment” (or primary exposure of interest) of individual as . In mediation analysis, the extended version of the Stable Unit Treatment Value Assumption (SUTVA) is a critical assumption. Specifically, SUTVA implies that the potential mediator for each individual under a given treatment level is uniquely determined, independent of the treatment assignments received by others. In other words, the treatment assigned to one individual should not influence the potential outcomes of other individuals. Furthermore, SUTVA requires that the potential outcome for each individual can only take a single value under a given combination of treatment and mediator levels, regardless of how the treatment and mediator are assigned to other individuals. This assumption is essential for ensuring the validity of causal inference methods and for correctly defining direct and indirect effects.
Within the potential outcome framework and under the SUTVA assumption, we can define as the potential times for the th recurrent event and as the th gap time between two consecutive recurrent events for individual if the individual was given treatment . The relationship of recurrent event time and gap time can be represented as . We denote as the total number of potential recurrent events that occur between where is the largest follow-up time. Thus, the recurrent event times for individual under treatment can be represented as . We denote as the potential counting process of recurrent events for individual at time if the individual were given treatment , where . Then we have . Let denote an arbitrary fixed counting process of recurrent events, where represents the number of observed events up to time . The potential time to the terminal event with treatment and the given recurrent events can be written as . Thus, we denote as the corresponding potential censoring time.
In our framework, the survival status inherently influences the gap times between recurrent events, as no additional events can occur after a terminal event for the subject. Same as in Niu et al. (2023), we address this issue with counterfactual for by considering two distinct and independent treatment effects: one targeting terminal events and the other influencing recurrent events. These treatments operate independently to control the respective processes. Thus, for , the first treatment is set at , impacting only the terminal events, while the second treatment remains at for the recurrent event process (Martinussen and Stensrud, 2023; Niu et al., 2023).
To connect the potential outcomes to observed outcomes, we apply the consistency assumption, which requires that the exposure and mediators are clearly defined and measured without error. The potentially observable recurrent mediator counting process , time to the terminal event , and censoring time where , can be defined. For censoring, we observe the follow-up time and define the terminal event indicator . While these definitions apply to both continuous and discrete survival times , in the joint model proposed in our study, we make the assumption that is a continuous variable.
The observed baseline covariates for individual can be defined as . Under all the assumptions described previously, the NDE and NIE on survival probability can be defined as:
| (1) |
| (2) |
For NDE and NIE among subgroups with covariates , they can be defined as
2.1. General Models and Determination of Causal Mechanisms
For each subject , we assume that all the potential gap times , , are correlated through a common random effect . Therefore, for each subject, conditional on the random effect and covariates (both baseline and number of events happened previously), gap times are independent of each other and also independent of the potential survival time with a prespecified mediator level. The hazard functions for the th gap time (for all ) and the survival time in the joint frailty model (Model I) can be defined as:
where and are the baseline hazard function for -th recurrent event process and the terminal event model respectively, represents the effect of unmeasured confounding on the terminal event hazard, and is a prespecified function form of the history of mediator up to time .
We can write where and is a pre-specified deterministic function. For example, we can use to represent time since last recurrent event or use to represent the rate of recurrent event happening. We can also make as a vector-valued function if we believe there is more than one transformation that will affect the hazard of the terminal event. To make our results and interpretation comparable to Niu et al. (2023), we use in the remaining part of the paper although the whole framework extends directly to other prefixed transformations. and are parameters for the th gap time. In real data analysis, we could assume covariate and treatment effects are homogeneous among different gaps and use common and instead of and . Here, are shared random effects (frailty) that are independent of is the hazard function for the potential time to terminal event , while is the hazard function for the th gap time of recurrent event process . Unlike traditional joint models using gap time, this novel causal model allows the separation of baseline confounding effect and recurrent event causal effect on the terminal event by including both the term and together in the survival model.
In scenarios where the random effect interacts with the mediator, an interaction term can be incorporated into the survival model (2.1) (Model II):
Our proposed framework allows us to explore different causal pathways. First, it can identify situations where the observed effect is completely driven by confounding from the shared random effect (i.e., when ). Second, it distinguishes cases where the relationship is purely causal, free from any confounding influences (i.e., when ). Finally, it captures scenarios where both causality and confounding are at play, as indicated by and .
To examine the causal pathways, we decompose the effect of on into a direct effect and an indirect effect mediated by . For a subgroup characterized by specific covariates , the estimators for the NDE and NIE are obtained by the survival probabilities differences as follows:
where is the estimated conditional survival function for the potential survival time given , whose specific form will be defined in the next section. By averaging these effects over the distribution of , we obtain the overall population-level estimates for the NDE and NIE:
2.2. Relaxation of Sequential Ignorability
In mediation analysis, addressing unmeasured confounding is a critical topic of study. Previous research has demonstrated that the relaxed Sequential Ignorability (SI) assumption is effective in modeling unmeasured confounders using latent factors within linear models with multiple exposures (Wang and Blei, 2019; McKennan and Nicolae, 2022). Niu et al. (2023) employed shared random effects in the joint modeling of survival outcomes and longitudinal measurements to manage unmeasured confounding. The key idea behind this relaxation is to utilize the shared random effect to represent the unmeasured time-independent confounders. Compared to the traditional SI assumption, the relaxed SI assumption not only requires no confounding between the exposure and the mediator process when conditioning on observed covariates, but also requires no confounding between the mediator process and the outcome of interest after conditioning on all shared random effects, observed covariates, and the exposure. In this study, we assume this weaker version of the SI assumption holds. The assumptions are outlined below:
| (3) |
The first part of the assumption requires that is ignorable given the measured covariates , while the second part requires the counting processing is ignorable given the exposure , measured covariates and shared random effects . Under the relaxed assumption, we can calculate the survival function as:
where is the cumulative density function of the shared random effect distribution; is the conditional survival function given exposure, recurrent mediator, covariates, and shared random effect; with is the conditional measure for the infinite dimension recurrent mediator counting process given exposure, covariates and shared random effect. A modified version of SI that can incorporate certain time-varying covariates is discussed in the Section 5.
Let be the function that map an arbitrary mediator process to the -th jumping time points, i.e., and be the function that map an arbitrary mediator process to the -th gap time, i.e., , then we have
where is the number of jumping points within .
The estimates , and can be obtained under the independent censoring assumption. The piecewise constant approximation of baseline hazards of gap time and survival time is detailed in the Supplementary Material Section I. can be estimated by plugging in all the estimators above,
2.3. Numerical Estimation of NDE and NIE
To estimate the NDE and NIE, we applied maximum likelihood estimation to obtain the parameters and . Detailed information on the joint likelihood, underlying assumptions, and the independent censoring assumption is provided in the Supplementary Material Section I. The joint likelihood was maximized using Gaussian quadrature tools available in standard statistical software, such as Proc NLMIXED in SAS (Liu and Huang, 2008).
Since has infinite dimensions, closed forms for the defined integration for NDEs and NIEs are unattainable. To address this computational challenge, we implemented a Monte Carlo method for numerical integration. Initially, we sample , followed by sampling the th gap time and sequentially, for until the summation of simulated gap time is greater than using the inverse approach (Çinlar, 1975). The recurrence event processes and were then constructed from gap times. Subsequently, we compute , , , . To get the Monte Carlo integration over , we repeated the sampling process above with a sufficient number of batches with enough number of replicates per batch. The estimation from each batch is obtained by averaging the estimations from each replicate, and the between-batch variation is used to evaluate the Monte-Carlo error of the numerical integration. The final estimation is obtained from averaging the estimations from these batches. To assess model performance, we generated 200 datasets under different scenarios (Model I, Model II, and misspecified models), with additional details provided in the following section. The estimates of NDE and NIE were derived from all simulations, and 95% confidence intervals were computed using Bootstrap by sampling from the asymptotic variance-covariance matrix of the estimated parameters. Additionally, we examined the NDE and NIE by varying the number of time points to assess potential errors from the Monte Carlo method. More comprehensive information is available in the Supplementary Material Section II.
3. Simulation
To assess the performance of parameter estimation for the joint modeling of gap time and survival time, as well as the performance of our proposed estimator for NDE and NIE, we conducted simulations using four distinct settings. Previous studies have examined the finite-sample performance of parameter estimation for joint models (Liu et al., 2004; Huang and Liu, 2007; Huang and Wang, 2004; Niu et al., 2023). Our focus here, however, is on evaluating the performance of NDE and NIE estimators and examining the robustness of these estimators under model misspecification. In Setting I, we employed correctly specified joint models of gap time and survival based on model I (i.e., equations (2.1) and (2.1)). Setting II utilized model II (i.e., equations (2.1) and (2.1)), which includes an interaction term between shared random effect and mediators, a feature absent in Model I. To investigate robustness under misspecification, Settings III and IV were examined. In Setting III, the random effect follows a gamma distribution, while in Setting IV, data were simulated using Model II (which includes the interaction term), but the interaction term was omitted when fitting the data using Model I. Detailed information regarding the simulation setup is provided in the Supplementary Material Section III.
We estimated the NDE and NIE for under the four settings, with the simulation results presented in Table 1. In Setting I, where the random effect distribution is accurately estimated and the joint model is correctly specified, both NDE and NIE estimations exhibit small bias and approximately correct coverage rates. In Setting II, even with the inclusion of interaction terms, the performance of NDE and NIE remains acceptable. However, when the random effect is misspecified (Setting III), the estimation of NIE is more robust compared to NDE. The NDE shows higher bias and lower coverage rate, particularly at later time points ( and ). When data are simulated using Model II (which includes the interaction term) but fitted with Model I (which omits the interaction term), the estimators demonstrate robustness at early time points but are sensitive to misspecification at later time points. These findings highlight the importance of carefully assessing model fit in real data analysis to ensure that both fixed and random effects are appropriately specified, avoiding oversimplified models when feasible sample sizes allow.
Table 1:
True Values, Bias, empirical standard deviation (SD), median estimated standard error (MeSE), and coverage rate for 95% nominal confidence interval (CR) from four simulation settings.
| Setting | Time | Effect | True Value | Bias | SD | MeSE | CR |
|---|---|---|---|---|---|---|---|
| I | 1 | NDE | 0.180 | −0.004 | 0.028 | 0.029 | 95.5% |
| NIE | 0.035 | −0.001 | 0.011 | 0.011 | 94.0% | ||
| 2 | NDE | 0.111 | −0.004 | 0.018 | 0.018 | 93.5% | |
| NIE | 0.027 | −0.001 | 0.009 | 0.009 | 94.5% | ||
| 3 | NDE | 0.069 | −0.002 | 0.013 | 0.012 | 91.5% | |
| NIE | 0.019 | −0.001 | 0.007 | 0.007 | 93.5% | ||
| II | 1 | NDE | 0.170 | −0.006 | 0.029 | 0.032 | 96.4% |
| NIE | 0.028 | 0.000 | 0.010 | 0.010 | 94.3% | ||
| 2 | NDE | 0.119 | −0.005 | 0.022 | 0.023 | 95.4% | |
| NIE | 0.018 | −0.000 | 0.008 | 0.008 | 91.8% | ||
| 3 | NDE | 0.086 | −0.002 | 0.018 | 0.017 | 94.8% | |
| NIE | 0.011 | −0.002 | 0.007 | 0.007 | 92.3% | ||
| III | 1 | NDE | 0.156 | 0.026 | 0.035 | 0.036 | 89.7% |
| NIE | 0.030 | −0.002 | 0.009 | 0.010 | 96.2% | ||
| 2 | NDE | 0.096 | 0.060 | 0.029 | 0.031 | 48.7% | |
| NIE | 0.016 | 0.008 | 0.010 | 0.011 | 96.2% | ||
| 3 | NDE | 0.068 | 0.064 | 0.025 | 0.027 | 34.6% | |
| NIE | 0.009 | 0.009 | 0.010 | 0.011 | 100.0% | ||
| IV | 1 | NDE | 0.170 | −0.019 | 0.029 | 0.029 | 90.2% |
| NIE | 0.028 | 0.005 | 0.011 | 0.011 | 93.3% | ||
| 2 | NDE | 0.119 | −0.020 | 0.019 | 0.019 | 79.4% | |
| NIE | 0.018 | 0.011 | 0.010 | 0.010 | 87.1% | ||
| 3 | NDE | 0.086 | −0.015 | 0.014 | 0.014 | 74.7% | |
| NIE | 0.011 | 0.012 | 0.008 | 0.008 | 76.3% |
In previous work, Niu et al. (2023) reported NDE and NIE estimations using joint modeling of recurrent events and survival events. They focused on the intensity function of the recurrent events, where each event time is measured from the start of the study (total time model). In contrast, the current study considers the distribution of gap times between consecutive recurrent events, allowing each gap time to follow its own distribution, offering greater flexibility in capturing heterogeneity (gap time model). To compare the performance of the gap time model and the total time model and illustrate the importance of using the new flexible model, we further introduced Setting V, in which data were simulated with drastically different baseline hazards for different gap times and thus the total time model is severely misspecified. Specifically, there were significant fluctuations in baseline hazards over time, with notable drops or increases between the first recurrent event and subsequent recurrent events, as well as between jump points. Then, we compared the performance of NDE and NIE estimations using the gap time model and the total time model. Details of the simulation are provided in the Supplementary Material Section III. As shown in Table 2, the gap time model showed better performance with a higher coverage rate and smaller bias. These results demonstrate that the gap time model is more robust for the case when earlier and later recurrent events follow different distributions with variable baseline hazards.
Table 2:
True Values, Bias, empirical standard deviation (SD), median estimated standard error (MeSE), and coverage rate for 95% nominal confidence interval (CR) from Gap time Model Setting V and Total Time Model
| Setting | Time | Effect | True Value | Bias | SD | MeSE | CR |
|---|---|---|---|---|---|---|---|
| Gap time Model (Setting V) | 1 | NDE | 0.194 | 0.000 | 0.030 | 0.031 | 95.5% |
| NIE | 0.032 | 0.000 | 0.011 | 0.011 | 95.5% | ||
| 2 | NDE | 0.132 | 0.000 | 0.021 | 0.022 | 95.0% | |
| NIE | 0.028 | 0.001 | 0.010 | 0.010 | 95.5% | ||
| 3 | NDE | 0.087 | 0.000 | 0.015 | 0.016 | 95.0% | |
| NIE | 0.021 | 0.000 | 0.007 | 0.008 | 94.5% | ||
| Total Time Model | 1 | NDE | 0.194 | −0.021 | 0.030 | 0.025 | 79.0% |
| NIE | 0.032 | −0.018 | 0.020 | 0.019 | 87.0% | ||
| 2 | NDE | 0.132 | −0.016 | 0.022 | 0.019 | 80.0% | |
| NIE | 0.028 | −0.017 | 0.014 | 0.013 | 75.0% | ||
| 3 | NDE | 0.087 | −0.008 | 0.016 | 0.016 | 86.5% | |
| NIE | 0.021 | −0.013 | 0.010 | 0.009 | 73.5% |
4. Real Data Analysis
To illustrate our proposed method and compare it with the previous findings using total time models (Niu et al., 2023), we reanalyzed the CPCRA data using our proposed gap time-based joint model.
4.1. CPCRA Study
The Community Programs for Clinical Research on AIDS (CPCRA) study was a multicenter, randomized, open-label, community-based clinical trial. It enrolled 467 HIV-infected patients aged 13 years or older with CD4 lymphocyte counts of 300 or fewer cells per cubic millimeter. These subjects had previously undergone zidovudine (AZT) therapy that led to intolerance of the drug. Baseline variables, including weight, gender, previous AIDS-defining condition, hemoglobin, and CD4 counts, were collected. Patients were then randomized, stratified by AZT intolerance, and assigned to either the didanosine (ddI, n=230) or zalcitabine (ddC, n=237) group. Mortality, health status (CD4 cell counts), and opportunistic infection (OI) diagnoses were monitored throughout the study. The patients were followed for around 20 months; however, 188 patients exited early due to death (100 in the ddI group; 88 in the ddC group) (Neaton et al., 1994; Abrams et al., 1994). During the follow-up period, patients developed up to five OIs, with a total of 363 confirmed or probable OIs recorded (172 in the ddI group; 191 in the ddC group).
To assess the mediating role of recurrent OIs in the effect of treatment on survival, we applied a joint modeling approach combining a gap time model for recurrent events and a survival model. We assumed different distributions for the first gap time of recurrent OIs and subsequent gap times. Baseline covariates included in both models were: (1) treatment (1: ddI; 0: ddC); (2) previous AIDS-defining condition (prevOI; 1: AIDS diagnosis at baseline; 0: no AIDS diagnosis);(3) stratum of response to AZT (2: AZT intolerance; 1: AZT failure); (4) gender (1: female; 0: male); (5) baseline hemoglobin (centered at mean = 12); and (6) baseline CD4 count. As noted in previous studies (Niu et al., 2023), the SUTVA, relaxed SI, and consistency assumptions hold for treatment, baseline CD4 counts, and OI mediators. Although we modeled recurrent events using gap time rather than study time, the same assumptions apply. Our analysis focuses on the exposure of treatment (ddI/ddC) (Subsection 4.2) and baseline CD4 counts (Subsection 4.3).
4.2. Comparative Effectiveness of the Two Treatment Arms
Didanosine (ddI) and Zalcitabine (ddC) are reverse transcriptase inhibitors used in patients who are intolerant to zidovudine. Previous studies have indicated that both ddI and ddC can improve patients’ survival (Darbyshire et al., 2000). The CPCRA study evaluated the efficacy and safety of ddI versus ddC. In this section, we analyze and compare the impact of ddI and ddC on patients’ survival, focusing on how OIs mediate the comparative effectiveness of these treatments using the gap time model. The parameter estimates for the recurrent event model (2.1) and the survival model (2.1) (Model I) or (2.1) (Model II) are presented in Table 3. A similar investigation has been done by a previous study (Niu et al., 2023), in which they used the total time scale to model the recurrent events as mediators. The results are consistent across the two modeling frameworks. Using Model I as an example, after adjusting for PrevOI, gender, stratum of response to AZT, baseline HgB, and CD4 counts, no significant differences were observed in the occurrence of OIs between the two treatments (). However, ddC significantly improved patient survival compared to ddI through the direct effect pathway (Log Hazard Ratio (LHR) = −0.40, ). The effects of covariates, including PrevOI, HgB, gender, and stratum of response to AZT, were consistent with findings from the previous study.
Table 3:
Fitted regression parameters from the data analysis of CPCRA study with (Model II) and without (Model I) interaction between shared random effect and the mediator.
| Model I | Model II | |||||
|---|---|---|---|---|---|---|
| Variable | Est | SE | p-value | Est | SE | p-value |
| Recurrent Event | ||||||
| Treatment | 0.00 | 0.11 | 0.97 | 0.01 | 0.11 | 0.92 |
| PrevOI | 0.29 | 0.16 | 0.08 | 0.26 | 0.15 | 0.09 |
| Gender | −0.60 | 0.26 | 0.02 | −0.61 | 0.25 | 0.01 |
| Stratum of Response to AZT | −0.10 | 0.13 | 0.44 | −0.09 | 0.12 | 0.44 |
| HgB | −0.11 | 0.04 | 0.003 | −0.10 | 0.04 | 0.01 |
| Baseline CD4 | −0.48 | 0.10 | <0.001 | −0.46 | 0.10 | <0.001 |
| −1.94 | 0.95 | 0.04 | −5.71 | 7.84 | 0.47 | |
| −2.50 | 0.33 | <0.001 | −2.44 | 0.30 | <0.001 | |
| −2.46 | 0.32 | <0.001 | −2.46 | 0.30 | <0.001 | |
| −2.51 | 0.32 | <0.001 | −2.61 | 0.30 | <0.001 | |
| −2.47 | 0.33 | <0.001 | −2.39 | 0.29 | <0.001 | |
| −2.49 | 0.40 | <0.001 | −2.50 | 0.39 | <0.001 | |
| −2.37 | 0.47 | <0.001 | −2.43 | 0.47 | <0.001 | |
| Survival | ||||||
| Treatment | −0.40 | 0.19 | 0.03 | −0.36 | 0.19 | 0.05 |
| PrevOI | 0.84 | 0.29 | 0.004 | 0.73 | 0.30 | 0.02 |
| Gender | 0.21 | 0.32 | 0.52 | 0.29 | 0.30 | 0.33 |
| Stratum of Response to AZT | −0.09 | 0.21 | 0.65 | −0.11 | 0.20 | 0.57 |
| HgB | −0.34 | 0.08 | <0.001 | −0.34 | 0.08 | <0.001 |
| Baseline CD4 | −0.75 | 0.20 | <0.001 | −0.69 | 0.20 | 0.001 |
| Number of OIs | 0.31 | 0.15 | 0.04 | 0.53 | 0.21 | 0.01 |
| Shared Random Effect () | 2.37 | 1.42 | 0.10 | 9.28 | 29.72 | 0.76 |
| NA | NA | NA | 6.17 | 27.79 | 0.82 | |
| −4.77 | 0.66 | <0.001 | −4.51 | 0.66 | <0.001 | |
| −3.97 | 0.61 | <0.001 | −3.85 | 0.62 | <0.001 | |
| −4.20 | 0.58 | <0.001 | −4.14 | 0.58 | <0.001 | |
| −4.11 | 0.57 | <0.001 | −4.11 | 0.56 | <0.001 | |
| −3.69 | 0.56 | <0.001 | −3.75 | 0.55 | <0.001 | |
| −3.70 | 0.56 | <0.001 | −3.82 | 0.54 | <0.001 | |
| −3.18 | 0.57 | <0.001 | −3.28 | 0.55 | <0.001 | |
| −3.15 | 0.56 | <0.001 | −3.25 | 0.55 | <0.001 | |
| −3.46 | 0.57 | <0.001 | −3.55 | 0.55 | <0.001 | |
| −2.96 | 0.57 | <0.001 | −3.12 | 0.56 | <0.001 | |
In the current study, we divided gap times into two segments: the first gap time and the later gap times. For each segment, the baseline hazard was specified as a piecewise constant function with three intervals, divided at the 30th and 60th percentiles of the gap times. For survival time, the intervals were defined by every 10th percentile. As detailed in Table 3, we observed that the hazard for the terminal event increased over time, implying a progressively higher risk of death during follow-up. Moreover, a significant positive association was found between the number of OIs and mortality in both Model I (LHR = 0.31, ) and Model II (LHR = 0.53, ). In contrast, the interaction between the number of OIs and the random effects ( Number of OIs) was not statistically significant (), suggesting an absence of any cross interaction.
Using Model I, we estimated the natural direct effect (NDE), natural indirect effect (NIE), and total effect (TE) at time points ranging from 30 to 450 days, with increments of 30 days, through numerical methods. Figure 1 presents these estimates along with bootstrapped 95% pointwise confidence intervals. The NDE for the treatment comparison (ddC vs. ddI) was consistently different from zero across all time points, indicating a significant direct influence on survival. Conversely, the NIE, representing the mediation effect through the number of OIs, did not reach significance, suggesting that mediation via this pathway is minimal. These results are in agreement with findings from the previous study and showed that the conclusion from the previous and current analysis is robust to the model specification. Additionally, the TE plotted using a Cox regression model (green line) showed a pattern similar to that derived from the sum of the NDE and NIE, reinforcing the consistency between our method and the Cox model.
Figure 1:

Estimation (solid lines) with bootstrapped 95% point-wise confidence intervals (dash lines) of the natural direct effect (NDE), natural indirect effects (NIE) and total effect (TE) of Treatment (ddC vs ddI) on overall survival probability for CPCRA study using gap time model 1 and alternative methods. Note the red line (NDE) almost overlaps with the black line (TE).
4.3. Effect of Baseline CD4 Count
Baseline CD4 counts were collected at the time of patient enrollment into the study as an indicator to assess the progression of HIV and the functionality of the immune system. CD4 cell counts are known to be strongly associated with patients’ health outcomes, the recurrence of OIs, and overall survival (Martin-Iguacel et al., 2022; Justiz Vaillant and Naik, 2024; Liu and Huang, 2008). CD4 T cell counts have also been used as a biological marker to monitor patient health (Serrano-Villara and Deeks, 2015). In this study, we aim to investigate both the direct and indirect effects of baseline CD4 counts on patients’ survival. We applied the recurrent event model (2.1) and the survival model (2.1) or (2.1) to estimate the parameters. In Model I, baseline CD4 counts significantly influenced patient outcomes, showing a strong association with both survival and the occurrence of OIs . A similar pattern was observed in Model II, where lower baseline CD4 counts were linked to an elevated risk of mortality and increased recurrence of OIs.
To quantify the natural direct effect (NDE), natural indirect effect (NIE), and total effect (TE) of baseline CD4 counts on overall survival, we adopted an approach similar to that used for treatment effects in Model I. Since baseline CD4 is a continuous variable, we compared its effects by contrasting patients at the third quartile (Q3: 109 cells/mm3) with those at the first quartile (Q1: 11 cells/mm3). The estimates with bootstrapped 95% point-wise confidence intervals revealed that both NDE and NIE were significantly different from zero over time. This indicates that baseline CD4 counts not only directly affect survival but also exert an influence through the mediator of recurrent OIs. Importantly, the direct effect (NDE) was considerably stronger than the mediated effect (NIE).
Furthermore, the total effect (TE) estimated from a Cox regression model (comparing Q3 to Q1) was in close agreement with the sum of the NDE and NIE, reinforcing the consistency of our findings with the Cox model. As illustrated by the similarity between Figure 2 in the current study and Figure 3 from Niu et al. (2023), these results are in line with previous work and showed that the findings from these analyses are robust to the model specification.
Figure 2:

Estimation (solid lines) with bootstrapped 95% point-wise confidence intervals (dashed lines) of the natural direct effect (NDE), natural indirect effects (NIE), and total effect (TE) of CD4 counts (Q3 vs Q1) on overall survival probability for CPCRA study using gap time model 1 and alternative methods.
5. Discussion
In this work, we extended the causal inference analysis between recurrent and terminal events by modeling the recurrent events using the gap time approach within a potential outcomes framework. This method offers a more informative perspective on the timing and frequency of recurrent events compared to the total time model. In HIV-infected patients, the recurrence of OIs can provide crucial insights into patients’ health trajectories, as subsequent OIs may follow different distributions due to the progressive decline in health status. By studying the time intervals between recurrent events, the gap time model captures the dynamic covariate effects, offering greater flexibility and accuracy compared to models that assume a constant effect. Note that as the gap times follow different distributions, the distribution of the actual total event time is harder to derive.
To address confounding between the mediator and the latent shared random effect, we adopted a gap time model for mediation analysis under a relaxed sequential ignorability (SI) assumption. While this assumption has been explored in previous studies using study time (Niu et al., 2023), our work is the first to implement it within a gap time framework. Notably, our joint models permit each gap time to follow its own distribution, allowing the shared random effect to vary across each gap time. Additionally, we incorporate frailty components in both the gap time and survival models, thereby quantifying the shared random effects across all gap times and survival outcomes.
If there are time-varying confounding variables that are not induced by exposure or mediator process, then we can handle it by modifying sequential ignorability as:
where and and . How to handle time-varying confounding variables that are induced by exposure or mediator process is challenging and is worth future study.
Although alternative approaches exist to linking terminal and recurrent events, such as using the rate of recurrent events, the time since the last recurrent event, or the cumulative intensity function - future studies could further explore these methods. Moreover, when the recurrent event or survival model does not follow the proportional hazards model, our proposed framework, which utilizes shared random effects for baseline confounding and general integration formulas for natural direct and indirect effects (NDE(1) and NIE(2)), remains theoretically applicable. However, the specific formula for the survival function, the distribution of random effect , and the distribution of need to be modified accordingly. Due to limited sample size, we use an univariate in the analysis, which can be a limitation since the unmeasured confounding might not have the same effect over different recurrent events. Mathematically, our method can be extended to allow to be a vector with different coefficients for different recurrent events as well as terminal events and the estimation procedure will be the same. However, given the need to integrate out the random effects for likelihood calculation, introducing additional dimension of random effects could lead to big computational complexity and large variation in the estimated parameters. The performance of NDE and NIE estimates relies heavily on selecting an appropriate joint model. We recommend employing model selection strategies, such as AIC, BIC, or tests for higher-order terms, to ensure optimal implementation of the proposed method.
Our approach also addresses practical challenges by allowing the relaxation of the SI assumption through the inclusion of latent shared random effects. This flexibility is crucial in real-world clinical settings. For example, unlike treatment assignments, which are typically independent of confounders in clinical trials, recurrent events cannot realistically be randomized to meet the conditions of equation (3). By addressing this limitation, our method offers a practical solution for such challenges.
Supplementary Material
Supplementary Material Sections I–IV mentioned in the main text are provided as a pdf supplement file. Data supporting the findings of this paper can be requested as described in the data availability statement on page 5 of the Supplementary Material. The SAS and R codes for the simulation and data analysis of this paper are available at https://github.com/nfang-cloud/Joint_Model_Gap_Time.
Funding
This research is partly supported by the National Institute of General Medical Sciences under grant U54 GM115458, the National Heart, Lung, and Blood Institute under grant R01 HL136942, the National Institute on Aging grant R21 AG063370, R01 AG081244, and Washington University Institute of Clinical and Translational Sciences grant UL1TR002345 from the National Center for Advancing Translational Sciences (NCATS). This work was completed utilizing the Holland Computing Center of the University of Nebraska, which receives support from the UNL Office of Research and Innovation, and the Nebraska Research Initiative.
References
- Aalen O (1978). Nonparametric inference for a family of counting processes. The Annals of Statistics, 6: 701–726. [Google Scholar]
- Abrams D, Goldman A, Launer C, Korvick J, Neaton J, Crane L, et al. (1994). A comparative trial of didanosine or zalcitabine after treatment with zidovudine in patients with human immunodeficiency virus infection. the terry beirn community programs for clinical research on AIDS. The New England Journal of Medicine, 330(10): 657–662. [DOI] [PubMed] [Google Scholar]
- Andersen PK, Gill RD (1982). Cox’s regression model for counting processes: A large sample study. The Annals of Statistics, 10: 1100–1120. [Google Scholar]
- Avanzini S, Antal T (2019). Cancer recurrence times from a branching process model. PLoS Computational Biology, 15(11): e1007423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bielick C, Strumpf A, Ghosal S, McMurry T, McManus K (2024). National hospitalization rates and in-hospital mortality rates of human immunodeficiency virus–related opportunistic infections in the united states, 2011–2018. Clinical Infectious Diseases, 79(2): 487–497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Çinlar E (1975). Introduction to stochastic processes. Prentice-Hall, New Jersey. [Google Scholar]
- Darbyshire J, Foulkes M, Peto R, Duncan W, Babiker A, Collins R, et al. (2000). Zidovudine (AZT) versus AZT plus didanosine (ddi) versus AZT plus zalcitabine (ddc) in HIV infected adults. Cochrane Database of Systematic Reviews, 3: CD002038. [Google Scholar]
- French M, Keane N, McKinnon S Phung Eand, Price P (2007). Susceptibility to opportunistic infections in hiv-infected patients with increased CD4 t-cell counts on antiretroviral therapy may be predicted by markers of dysfunctional effector memory CD4 t cells and b cells. HIV Medicine, 8(3): 148–155. [DOI] [PubMed] [Google Scholar]
- Huang CY, Wang MC (2004). Joint modeling and estimation for recurrent event processes and failure time data. Journal of the American Statistical Association, 99: 1153–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang X, Liu L (2007). A joint frailty model for survival time and gap times between recurrent events. Biometrics, 63: 389–397. [DOI] [PubMed] [Google Scholar]
- Jayani I, Susmiati, Winarti E, Sulistyawati W (2020). The correlation between CD4 count cell and opportunistic infection among HIV/AIDS patients. The Journal of Physics: Conference Series, 1569: 032066. [Google Scholar]
- Justiz Vaillant A, Naik R (2024). HIV-1–Associated Opportunistic Infections. Treasure Island (FL): StatPearls Publishing. [Google Scholar]
- Lahoz R, Fagan A, McSharry M, Proudfoot C, Corda S, Studer R (2022). Recurrent heart failure hospitalizations increase the risk of mortality in heart failure patients with atrial fibrillation and type 2 diabetes mellitus in the united kingdom: a retrospective analysis of clinical practice research datalink database. BMC Cardiovascular Disorders, 22(1): 234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu L, Huang X (2008). The use of gaussian quadrature for estimation in frailty proportional hazards models. Statistics in Medicine, 27(14): 2665–2683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu L, Wolfe R, Huang X (2004). Shared frailty models for recurrent events and a terminal event. Biometrics, 60: 747–756. [DOI] [PubMed] [Google Scholar]
- Liu L, Zheng C, Kang J (2018). Exploring causality mechanism in the joint analysis of longitudinal and survival data. Statistics in Medicine, 37: 3733–3744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu J, Han J, Zhang C, Yang Y, Yao Z (2017). Infection after total knee arthroplasty and its gold standard surgical treatment: Spacers used in two-stage revision arthroplasty. Intractable and Rare Diseases Research, 6(4): 256–261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin-Iguacel R, Reyes-Urueñaa J, Brugueraa A, Aceitóna J, Díaza Y (2022). Determinants of long-term survival in late HIV presenters: The prospective PISCIS cohort study. eClinicalMedicine, 52: 101600. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martinussen T, Stensrud M (2023). Estimation of separable direct and indirect effects in continuous time. Biometrics, 79(1): 127–139. [DOI] [PubMed] [Google Scholar]
- McKennan C, Nicolae D (2022). Estimating and accounting for unobserved covariates in high-dimensional correlated data. Journal of the American Statistical Association, 117: 225–236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neaton J, Wentworth D, Rhame F, Hogan C, Abrams D, Deyton L (1994). Considerations in choice of a clinical endpoint for aids clinical trials. Statistics in Medicine, 13(19–20): 2107–2125. [DOI] [PubMed] [Google Scholar]
- Niu F, Zheng C, Liu L (2023). Exploring causal mechanisms and quantifying direct and indirect effects using a joint modeling approach for recurrent and terminal events. Statistics in Medicine, 42(22): 4028–4042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rondeau V (2010). Statistical models for recurrent events and death: Application to cancer events. Mathematical and Computer Modelling, 52: 949–955. [Google Scholar]
- Serrano-Villara S, Deeks S (2015). CD4/CD8 ratio: an emerging biomarker for HIV. The Lancet: HIV, 2(3): e76–e77. [DOI] [PubMed] [Google Scholar]
- Soh J, Huang Y (2021). A varying-coefficient model for gap times between recurrent events. Lifetime Data Analysis, 27: 437–459. [DOI] [PubMed] [Google Scholar]
- Wang Y, Blei D (2019). The blessing of multiple causes. Journal of the American Statistical Association, 114: 1574–1596. [Google Scholar]
- Zheng C, Zhou X (2017). Causal mediation analysis on failure time outcome without sequential ignorability. Lifetime Data Analysis, 23: 533–559. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
