Abstract
Regression methods, including the proportional rates model and additive rates model, have been proposed to evaluate the effect of covariates on the risk of recurrent events. These two models have different assumptions on the form of the covariate effects. A more flexible model, the additive-multiplicative rates model, is considered to allow the covariates to have both additive and multiplicative effects on the marginal rate of recurrent event process. However, its use is limited to the cases where the time-dependent covariates are monitored continuously throughout the follow-up time. In practice, time-dependent covariates are often only measured intermittently, which renders the current estimation method for the additive-multiplicative rates model inapplicable. In this paper, we propose a semiparametric estimator for the regression coefficients of the additive-multiplicative rates model to allow intermittently observed time-dependent covariates. We present the simulation results for the comparison between the proposed method and the simple methods, including last covariate carried forward and linear interpolation, and apply the proposed method to an epidemiologic study aiming to evaluate the effect of time-varying streptococcal infections on the risk of pharyngitis among school children. The R package implementing the proposed method is available at www.github.com/TianmengL/rectime.
Keywords: additive-multiplicative rates model, kernel smoothing, recurrent events, semiparametric method, time-dependent covariates
1. Introduction
In various clinical and biomedical studies, the event of interest may happen multiple times, which is referred to as a recurrent event. Examples include recurrent bleedings in patients with hematologic malignancies (Stanworth et al., 2015) and recurrent cardiovascular events in subjects with diabetes (Van Der Heijden et al., 2013). During the follow-up of recurrent events, it is common to have repeated measurements of time-dependent covariates, and it is often of interest to investigate the effect of such covariates on the occurrence of recurrent events. Our motivation is from an observational study about pharyngitis among school children. Pharyngitis is often caused by viruses, but some bacteria, including streptococci, can cause pharyngitis as well. The goal of this study was to explore the effect of streptococci on the risk of pharyngitis. In this study, weekly visits were scheduled to monitor the recurrent occurrence of pharyngitis, and the status of streptococci infection was determined for those diagnosed with pharyngitis. In the meantime, monthly visits were scheduled for each participant to monitor the streptococci infection status regularly.
For the analysis of recurrent events, Prentice et al. (1981) and Andersen and Gill (1982) proposed the multiplicative model on the intensity function, which is interpreted as the instantaneous risk of event conditioning on the event history. To achieve a better interpretation and to allow flexible dependence structure among the recurrent events, various authors (Pepe and Cai, 1993; Lawless et al., 1997; Lin et al., 2000) discussed the regression models on mean or rate function of recurrent event process and assumed that the covariate effects were in a multiplicative fashion. Alternatively, the covariate effects may add to the baseline function. Liu and Wu (2011) considered the additive intensity model, and Schaubel et al. (2006) proposed the additive rates model where the covariate effects were in absolute forms. The multiplicative and additive models have different assumptions on the relationship between the covariate effects and the event process; thus it is desirable to estimate both types of effects under a general model setting. For univariate survival data where the event of interest only happens once, Lin and Ying (1995) proposed the additive-multiplicative model for the hazard function and Scheike and Zhang (2002) considered the Cox-Aalen model to allow the covariate effects to be time-varying. Recently, Cai et al. (2017a) studied modeling additive and multiplicative effects simultaneously in the mean residual life function. For recurrent event data, Han et al. (2016) proposed the additive-multiplicative models focusing on the markers contingent on recurrent events with an informative terminal event. Liu et al. (2010) considered the additive-multiplicative rates models for recurrent events, which allows the covariates to have both multiplicative and additive effects on the rate function of the recurrent event process.
Although the additive-multiplicative rates model allows the covariates to be time-dependent, it requires the observation of the complete history of the covariates, which is nearly impossible in practice. Typically, the time-dependent covariates are measured intermittently during the follow-up. One way to deal with the infrequently updated covariates is to predict the missing covariate values by smoothing the observed values, as was discussed in Raboud et al. (1993); Tsiatis et al. (1995); Boscardin et al. (1998); Bycott and Taylor (1998); Dafni and Tsiatis (1998) and summarized in Andersen and Liestøl (2003). Another commonly used approach is to jointly model the longitudinal covariate process and the event process. The joint models of repeated measured longitudinal data and time-to-event data have been studied extensively. Many authors considered the joint models of the two processes through latent random effects, including Wulfsohn and Tsiatis (1997); Xu and Zeger (2001); Vonesh et al. (2006), among others. In the setting of recurrent event data, Henderson et al. (2000) proposed to model the relationship between the time-dependent covariates and recurrent events by a latent Gaussian process, and Li (2016) considered the joint model of the recurrent event process and the binary covariate process. More complex models which considered the covariate process, recurrent event process and the terminal event simultaneously also have been studied, including Kim et al. (2012) and Cai et al. (2017b) among others. To our knowledge, little research has been done to explore the additive-multiplicative rates model with intermittently observed time-dependent covariates.
Recently, Li et al. (2016) and Cao et al. (2015) have proposed kernel smoothed estimators for the proportional rates model and hazard model, respectively, with intermittently observed time-dependent covariates. Specifically, Cao et al. (2015) considered the scenario where the time-dependent covariates were not measured at event times but during regular visits only and Li et al. (2016) focused on the case where the covariates were measured at both recurrent times and regular visits. Lyu et al. (2021) and Sun et al. (2021a) considered kernel weighted estimation procedures for additive rates model and additive hazard regression model, respectively. In addition, Cao and Fine (2021) proposed a weighted last covariate carried forward approach for proportional hazard model with time-dependent covariates not observed at failure times and Sun et al. (2021b) proposed an estimation procedure based on inverse-rate-weighting and kernel-smoothing to estimate the proportional rates model with intermittently observed time-dependent covariates measured at informative clinical visits. In this paper, we propose to extend the kernel smoothing method to the parameter estimation of the additive-multiplicative rates model. The proposed estimator is expected to be accurate; it relies on fewer assumptions of the underlying covariate and recurrent event processes.
The rest of the paper is organized as follows. The additive-multiplicative rates model and the proposed estimator are introduced in Section 2. Simulation studies to evaluate the performance of the proposed estimator are presented in Section 3. Section 4 includes the analysis of the pharyngitis data as introduced in the motivating example. Finally, a concluding remark is included in Section 5.
2. Model and the Proposed Estimator
2.1. Additive-multiplicative rates model
Suppose that n subjects are recruited in a study. Let i = 1, …, n index the subjects. Let denote the number of events that subject i has experienced at or prior to time t when there is no censoring. Let Wi(t) = (Zi(t)⊤, Xi(t)⊤)⊤ be a p × 1 vector of possibly time-dependent covariates and let be the corresponding true regression parameters. Following Liu et al. (2010), we assume the rate function of the counting process has the following form,
| (1) |
where λ0(t) is an unspecified baseline rate function, and the link functions g and h are assumed to be known. Specifically, if we let g(x) = x and h(x) = exp(x), then model (1) becomes
| (2) |
Therefore, the model can be regarded as a generalization of the semiparametric additive rates model and proportional rates model for recurrent event process.
Let Ci denote the censoring time of subject i and we assume that Ci is independent of the counting process given Wi(t) in the sense that . Define Yi(t) = I(Ci ≥ t) and denote by the observed number of events. Let [0, τ] be a pre-specified time interval of interest, and the recurrent event process could potentially be observed beyond τ. For model estimation, we define the following process,
where is the baseline mean function. Following Lin and Ying (1995), Liu et al. (2010) proposed the following estimating function for model (1):
| (3) |
where Di(θ, t) is a p-dimensional smooth process involving Wi(t) and θ, and
According to Lin and Ying (1995) and Liu et al. (2010), a possible choice for Di(θ, t) is
For a given θ, by solving , the baseline mean function μ0(t) in (3) can be estimated by
| (4) |
thus the estimating function in (3) becomes
| (5) |
Since we are mostly interested in simultaneously modeling the relative and absolute difference in rate function due to the covariates, we focus on the model in (2) in the remainder of this paper. Under model (2), we have
and thus
The estimating function in (5) can be written as
| (6) |
Then the estimator can be obtained by solving U(θ) = 0.
2.2. Estimation when covariates are intermittently observed
When time-dependent covariates are present, the evaluation of the estimating function requires the time-dependent covariates to be observed continuously throughout the entire follow-up time. However, as is illustrated in the motivating example, time-dependent covariates were measured at intermittent visits, including the event times and monthly regular visits. Specifically, denote by Oi(t) the number of measurements of the covariates at regular visits at or prior to time t, and we have dOi(t) = 1 when the ith subject has a regular visit at t. Moreover, we assume that the observation process Oi(·) is independent of {Wi(·), Ni(·), Ci}, i = 1, …, n. The observed data are {Ni(t), Oi(t), Wi(t)dNi(t), Wi(t)dOi(t), Ci; 0 ≤ t ≤ Ci, i = 1, …, n}, which are assumed to be independent and identically distributed. Hence the values of covariates between measurement times are unknown, which renders that the estimating function in (6) cannot be evaluated based on the observed data only. A simple approach to deal with the intermittently observed time-dependent covariates is to impute the missing values by carrying forward the last observed value. However, this approach imposes a strong assumption that the covariate processes are step functions and thus is expected to introduce bias in the model estimation in both survival and recurrent event analysis (Prentice, 1982; Faucett et al., 1998; Cao et al., 2015; Li et al., 2016; Lyu et al., 2021). Another possible approach is to impute the missing values between two observation times by linear interpolation. The linear interpolation method assumes that the covariate value is a linear function of time between every two adjacent observation times. However, this assumption may not hold in practice, especially for binary covariates. As an alternative approach, estimation methods based on kernel smoothing to deal with intermittently observed time-dependent covariates have gained growing interest recently (Cao et al., 2015; Li et al., 2016). In what follows, we propose a novel semiparametric estimator by applying the kernel smoothing method to deal with intermittently observed time-dependent covariates in the additive-multiplicative rates model. In contrast with methods that impute the missing values of each individual, the proposed method estimates the mean covariate processes via kernel smoothing and is expected to yield lower bias.
First, we show that the estimating function in (6) can be written as a function of empirical processes. Given a vector a, we define the operator ⊗ such that a⊗0 = 1, a⊗1 = a, a⊗2 = aa⊤. For k = 0, 1, we define
Then the estimating function in (6) can be re-expressed as U(θ) = (U1(θ)⊤, U2(θ)⊤)⊤, where
| (7) |
In equation (7), the quantities and can be easily calculated because the covariates are observed when recurrent events occur, as shown in the motivating example. However, calculating the ratios , , , , and requires the covariate process W(·) to be observed throughout the follow-up period. Nevertheless, in real applications, it is usually not feasible to continuously monitor the covariates and the covariates are only measured intermittently.
Next, we present how to approximate these ratios based on the observed data by applying the kernel smoothing method. It can be easily seen that the processes , , Sz2x(t, β), and Szx(t), converge in probability to the limiting functions , , , and , respectively. With intermitently observed covariate data, we approximate the unknown ratios using kernel smoothing estimators that converge to the same target functions. Consider the following smoothed processes,
| (8) |
where Kh(t) = K(t/h)/h, K is a second order kernel function with a bounded support on [−1, 1], and h is the bandwidth parameter. For bandwidth selection, as is shown in the Appendix, the bandwidth h = cn−v, where 1/4 < v < 1/2, so we chose h = cn−1/3 where the constant c was chosen by following the cross-validation approach as presented in Lyu et al. (2021).
We define m(t) = E{Oi(t)}, then it can be shown that the kernel smoothed processes, , , , and , converge in probability to , , sz2x(t, β)m(t), szx(t)m(t), respectively. Since m(t) cancels out in the ratios, the ratios of the kernel smoothed processes converge to the ratios of the corresponding limiting processes. Thus, the ratios , , , and in (7) can be replaced by the kernel smoothed counterparts. Moreover, to account for estimation bias near the boundary t = 0 due to positive observation times, we set , , , for t ∈ [0, h). Then the proposed estimating function is , where
The proposed estimator can be obtained by solving . The large sample properties of are summarized below. A detailed proof is included in the Appendix.
Theorem 1.
Under regularity conditions 1–10 in the Appendix, converges in probability to θ0. Moreover, as n → ∞, converges in distribution to a zero mean normal random variable with variance A(θ0)−1V (θ0){A(θ0)−1}⊤, where A(θ0) and V (θ0) are defined in the Appendix.
The asymptotic variance of involves unknown nuisance functions that need to be estimated using kernel smoothing methods. Hence bootstrap is recommended for variance estimation because of its better finite-sample performance.
For the estimation of the baseline mean function, we show that the estimator in (4) can be written as
Thus, the baseline mean function can be estimated by
Following Liu et al. (2010), to ensure that the estimated baseline mean function is monotone, an alternative estimator is .
3. Simulation
Simulation studies were conducted to evaluate the performance of the proposed method. 1000 data replicates with sample size 100 and 200 were generated for each simulation scenario. The resampling size in the bootstrap method for variance estimation was set to be 50. Since the rate model does not fully specify the probability feature, we simulated the recurrent events based on the following intensity model
| (9) |
where ui is the frailty variable with mean 1 and variance σ2. The frailty variable induces the within-subject correlations. We explored two distributions of the frailty variable: gamma distribution and log-normal distribution, and two values of the variance σ2 = 0.2, 0.4, for different levels of correlations. The baseline intensity function λ0(t) = 0.3I(t ≤ 10) + 0.5I(10 < t ≤ 20). Note that the intensity model in (9) implies the rate model λ{t|Wi(t)} = γ0Zi(t)+exp{β0Xi(t)}λ0(t). To evaluate the proposed method on different types of covariates, we let Zi(t) be a continuous covariate and Xi(t) be a binary covariate. Zi(t) was simulated by a linear function of time t as Zi(t) = b0i + b1it. The random intercept b0i was generated from a normal distribution with mean 0.5 and variance 0.05. The random slope was simulated from a normal distribution with mean −0.05 and variance 5 · 10−4. With a negative mean of the random slope, the covariate Z(·) has a decreasing time trend at the population level. For the binary covariate Xi(t), we first generated the baseline Xi(0) from a Bernoulli distribution with probability 0.2. Then the binary covariate process was assumed to alternate between states 0 and 1. We assumed that the duration of state 0 of subject i followed an exponential distribution with rate function 1/(ξig(t)) and the duration of state 1 followed an exponential distribution with rate 1/ξi, where ξi was a subject-specific random effect which followed a gamma distribution with mean 1 and variance 0.25. The value of g(t) was 4 for t ∈ [0, 10] and changed to 6 afterwards, which indicates a decreasing time trend at the population level. For the values of regression coefficients, we considered three scenarios: (1) the true model included both an additive part and a multiplicative part: β0 = 0.5, γ0 = 0.2; (2) the additive-multiplicative model degenerated to the additive rates model: β0 = 0, γ0 = 0.2; (3) the additive-multiplicative degenerated to the proportional rates model: β0 = 0.5, γ0 = 0.
In all scenarios, we assumed that the covariates of a subject were always observed at the event times of the same subject. For the regular visits, we assumed that the covariates were measured at the baseline visit (time 0) and at each pre-scheduled regular visit per unit time interval, which means that there were 20 regular visits in time period (0, 20] of each subject. The time of the regular visit in each unit time interval was simulated from a uniform distribution from 0 to 1. The censoring time was randomly generated from a uniform distribution from 0 to 20. Both the event times and regular visits happened after the censoring time were dropped.
We applied the proposed method to the simulated data and compared the results with those from the LCCF and linear interpolation approaches. We used the Epanechnikov kernel function in the proposed kernel smoothing method. The results for simulated datasets with gamma or lognormal frailty are presented in Table 1 and 2, respectively. We provide the relative bias (Bias) and Monte Carlo standard deviation (SD) of the point estimations for each estimation method that we compared. For the proposed method, we also report the average of the estimated standard errors by bootstrap method (ASE) and the coverage percentage (CP). When the true model is the additive-multiplicative model (β0 = 0.5, γ0 = 0.2), the LCCF method gives biased estimations for both β and γ. The linear interpolation method has small bias for the estimation of γ, which is likely due to the linear feature of covariate Z(·), but gives biased estimations for β. The proposed method provides virtually unbiased estimations for both regression coefficients. The ASEs are close to the empirical SDs and the coverage percentages are around 95%. As the variance of the frailty increases from 0.2 to 0.4, the Monte Carlo SDs of the estimations increases. When the true model degenerates to the additive model or the multiplicative model with one significant covariate (β = 0, γ = 0.2; β = 0.5, γ = 0), the proposed method provides virtually unbiased estimations for both coefficients, which indicates the robustness of the additive-multiplicative model.
Table 1:
Simulation results: the frailty followed a gamma distribution; n is the sample size; σ2 is the variance of the frailty distribution; is the average number of recurrent events; Bias is the relative bias computed by dividing the difference of the mean of the 1000 estimated parameters and the true value by the true value (if the true value is 0, Bias is the mean of the 1000 estimated parameters); SD is the standard deviation of the 1000 estimated values; ASE is the mean of the 1000 estimated standard errors by bootstrap method; CP is the proportion of 95% confidence intervals covering the true value.
| n | σ 2 | LCCF | Linear | Proposed | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Bias | SD | Bias | SD | Bias | SD | ASE | CP | ||||
| β = 0.5, γ = 0.2 | |||||||||||
| 100 | 0.2 | 4.23 | β | −0.136 | 0.126 | 0.145 | 0.158 | 0.012 | 0.144 | 0.148 | 0.933 |
| γ | −0.575 | 0.109 | 0.009 | 0.104 | 0.034 | 0.105 | 0.103 | 0.937 | |||
| 0.4 | 4.24 | β | −0.131 | 0.139 | 0.150 | 0.176 | 0.011 | 0.157 | 0.155 | 0.948 | |
| γ | −0.621 | 0.128 | −0.033 | 0.123 | −0.013 | 0.123 | 0.123 | 0.938 | |||
| 200 | 0.2 | 4.25 | β | −0.133 | 0.092 | 0.153 | 0.114 | 0.009 | 0.103 | 0.102 | 0.929 |
| γ | −0.570 | 0.082 | 0.006 | 0.078 | 0.021 | 0.078 | 0.073 | 0.931 | |||
| 0.4 | 4.24 | β | −0.139 | 0.099 | 0.143 | 0.124 | 0.000 | 0.111 | 0.108 | 0.946 | |
| γ | −0.602 | 0.096 | −0.023 | 0.092 | −0.009 | 0.092 | 0.088 | 0.935 | |||
| β = 0, γ = 0.2 | |||||||||||
| 100 | 0.2 | 3.82 | β | −0.012 | 0.148 | −0.005 | 0.195 | −0.003 | 0.169 | 0.172 | 0.953 |
| γ | −0.540 | 0.105 | −0.008 | 0.100 | 0.033 | 0.101 | 0.100 | 0.934 | |||
| 0.4 | 3.83 | β | −0.016 | 0.163 | −0.011 | 0.216 | −0.006 | 0.181 | 0.183 | 0.946 | |
| γ | −0.526 | 0.128 | 0.008 | 0.123 | 0.051 | 0.123 | 0.119 | 0.938 | |||
| 200 | 0.2 | 3.84 | β | −0.010 | 0.101 | −0.002 | 0.131 | −0.002 | 0.113 | 0.120 | 0.952 |
| γ | −0.546 | 0.075 | −0.021 | 0.073 | 0.016 | 0.073 | 0.070 | 0.943 | |||
| 0.4 | 3.82 | β | −0.017 | 0.106 | −0.009 | 0.140 | −0.006 | 0.119 | 0.125 | 0.968 | |
| γ | −0.550 | 0.092 | −0.024 | 0.089 | 0.012 | 0.089 | 0.086 | 0.937 | |||
| β = 0.5, γ = 0 | |||||||||||
| 100 | 0.2 | 3.92 | β | −0.148 | 0.122 | 0.143 | 0.152 | 0.007 | 0.136 | 0.143 | 0.955 |
| γ | −0.112 | 0.108 | 0.001 | 0.103 | 0.006 | 0.103 | 0.099 | 0.931 | |||
| 0.4 | 3.90 | β | −0.163 | 0.136 | 0.121 | 0.172 | −0.021 | 0.153 | 0.150 | 0.938 | |
| γ | −0.124 | 0.131 | −0.011 | 0.124 | −0.006 | 0.125 | 0.120 | 0.930 | |||
| 200 | 0.2 | 3.91 | β | −0.138 | 0.086 | 0.157 | 0.107 | 0.013 | 0.097 | 0.098 | 0.952 |
| γ | −0.113 | 0.074 | −0.002 | 0.070 | 0.002 | 0.071 | 0.071 | 0.940 | |||
| 0.4 | 3.91 | β | −0.136 | 0.093 | 0.161 | 0.117 | 0.015 | 0.103 | 0.104 | 0.949 | |
| γ | −0.113 | 0.096 | −0.002 | 0.090 | 0.002 | 0.090 | 0.086 | 0.929 | |||
Table 2:
Simulation results: the frailty followed a lognormal distribution; n is the sample size; σ2 is the variance of the frailty distribution; is the average number of recurrent events; Bias is the relative bias computed by dividing the difference of the mean of the 1000 estimated parameters and the true value by the true value (if the true value is 0, Bias is the mean of the 1000 estimated parameters); SD is the standard deviation of the 1000 estimated values; ASE is the mean of the 1000 estimated standard errors by bootstrap method; CP is the proportion of 95% confidence intervals covering the true value.
| n | σ 2 | LCCF | Linear | Proposed | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Bias | SD | Bias | SD | Bias | SD | ASE | CP | ||||
| β = 0.5, γ = 0.2 | |||||||||||
| 100 | 0.2 | 4.24 | β | −0.141 | 0.130 | 0.141 | 0.161 | 0.004 | 0.145 | 0.147 | 0.950 |
| γ | −0.605 | 0.110 | −0.021 | 0.106 | 0.005 | 0.106 | 0.102 | 0.930 | |||
| 0.4 | 4.22 | β | −0.141 | 0.135 | 0.140 | 0.167 | 0.004 | 0.150 | 0.157 | 0.954 | |
| γ | −0.575 | 0.134 | 0.010 | 0.129 | 0.032 | 0.130 | 0.123 | 0.938 | |||
| 200 | 0.2 | 4.25 | β | −0.141 | 0.088 | 0.141 | 0.110 | 0.006 | 0.100 | 0.102 | 0.943 |
| γ | −0.560 | 0.078 | 0.015 | 0.076 | 0.030 | 0.076 | 0.074 | 0.940 | |||
| 0.4 | 4.24 | β | −0.130 | 0.093 | 0.152 | 0.114 | 0.007 | 0.102 | 0.107 | 0.959 | |
| γ | −0.584 | 0.093 | −0.008 | 0.090 | 0.010 | 0.090 | 0.088 | 0.943 | |||
| β = 0, γ = 0.2 | |||||||||||
| 100 | 0.2 | 3.83 | β | −0.029 | 0.160 | −0.027 | 0.210 | −0.019 | 0.177 | 0.173 | 0.934 |
| γ | −0.554 | 0.105 | −0.020 | 0.101 | 0.026 | 0.102 | 0.099 | 0.940 | |||
| 0.4 | 3.82 | β | −0.016 | 0.164 | −0.012 | 0.215 | −0.008 | 0.181 | 0.180 | 0.947 | |
| γ | −0.568 | 0.130 | −0.034 | 0.126 | 0.010 | 0.127 | 0.117 | 0.932 | |||
| 200 | 0.2 | 3.82 | β | −0.019 | 0.107 | −0.013 | 0.140 | −0.010 | 0.120 | 0.120 | 0.944 |
| γ | −0.524 | 0.073 | −0.001 | 0.071 | 0.037 | 0.071 | 0.070 | 0.939 | |||
| 0.4 | 3.82 | β | −0.014 | 0.110 | −0.007 | 0.144 | −0.003 | 0.122 | 0.125 | 0.941 | |
| γ | −0.559 | 0.092 | −0.033 | 0.089 | 0.001 | 0.089 | 0.085 | 0.940 | |||
| β = 0.5, γ = 0 | |||||||||||
| 100 | 0.2 | 3.90 | β | −0.149 | 0.118 | 0.142 | 0.147 | 0.007 | 0.134 | 0.140 | 0.954 |
| γ | −0.112 | 0.108 | 0.001 | 0.102 | 0.006 | 0.103 | 0.099 | 0.931 | |||
| 0.4 | 3.91 | β | −0.155 | 0.129 | 0.133 | 0.161 | −0.009 | 0.146 | 0.150 | 0.951 | |
| γ | −0.111 | 0.127 | 0.002 | 0.120 | 0.007 | 0.121 | 0.119 | 0.947 | |||
| 200 | 0.2 | 3.91 | β | −0.146 | 0.086 | 0.148 | 0.108 | 0.005 | 0.098 | 0.098 | 0.957 |
| γ | −0.110 | 0.075 | 0.001 | 0.071 | 0.004 | 0.071 | 0.070 | 0.944 | |||
| 0.4 | 3.91 | β | −0.147 | 0.093 | 0.146 | 0.116 | 0.000 | 0.103 | 0.104 | 0.948 | |
| γ | −0.113 | 0.092 | −0.002 | 0.087 | 0.003 | 0.087 | 0.086 | 0.945 | |||
4. Real data analysis
In this section, we analyzed the Indian pharyngitis data (Jose et al., 2018) using the proposed method. Pharyngitis is the infection of the back of the throat and it can be caused by viruses or bacteria. When the cause is group A streptococcus (GAS), pharyngitis is also known as strep throat. The symptoms of GAS pharyngitis include sore throat, fever, nausea and it may cause some rare but serious diseases including rheumatic heart disease if left untreated. GAS pharyngitis is common in children from age 5 to age 15 and can be transmitted through saliva or nasal secretions. Meanwhile, other bacteria, for example, group G streptococcus (GGS), may cause pharyngitis with similar clinical symptoms as well. In the motivating example, 307 school children aged 7 to 11 years old in a rural area in Velore, India were recruited to investigate the relationship between streptococcal infections and the risk of pharyngitis. Each child was examined weekly for the symptoms of pharyngitis. For those who were diagnosed with pharyngitis, throat cultures were obtained to test if GAS and GGS were positive. In the meantime, to monitor the streptococci status regularly, monthly regular visits were scheduled for each child. Since the regular visits were pre-scheduled, it is reasonable to assume that they are independent of the recurrent event, censoring and covariate processes, as is required by the proposed method.
We considered the following four candidate models: (1) both covariates have multiplicative effects (Model MM); (2) both covariates have additive effects (Model AA); (3) GAS has an additive effect and GGS has a multiplicative effect (Model AM); (4) GAS has a multiplicative effect and GGS has an additive effect (Model MA). The same approach as described in Section 2 was applied to select the bandwidth used in the proposed kernel smoothing method. Table 3 shows the regression results of the four models. All four models suggest that the presence of GAS increases the risk of pharyngitis, while the effect of GGS is not significant in any model. In light of the same direction of the effect and the similar statistical significance from the four candidate models, the choice would be based on study objectives and the interpretation of the effect. As discussed in Schaubel et al. (2006), in certain settings, the absolute covariate effect is of more interest than the relative covariate effect. For instance, the former can directly provide information for predicting change in event rate attributable to a covariate, while the latter would need information on the baseline rate.
Table 3:
Analysis of Indian pharyngitis data. AM is the additive-multiplicative rates model which includes GAS in the additive part and GGS in the multiplicative part; MA is the additive-multiplicative rates model which includes GAS in the multiplicative part and GGS in the additive part; MM is the proportional rates model; AA is the additive rates model. Est is the estimated regression coefficient; SE is the standard error estimated by bootstrap with resampling size 100; CI is the 95% confidence interval.
| Model | GAS | GGS | ||||
|---|---|---|---|---|---|---|
| Est | SE | 95% CI | Est | SE | 95% CI | |
| MM | 0.418 | 0.117 | (0.189, 0.647) | 0.146 | 0.118 | (−0.085, 0.377) |
| AA | 0.067 | 0.020 | (0.028, 0.106) | 0.020 | 0.017 | (−0.013, 0.053) |
| AM | 0.067 | 0.019 | (0.030, 0.104) | 0.148 | 0.121 | (−0.089, 0.385) |
| MA | 0.437 | 0.120 | (0.202, 0.672) | 0.021 | 0.023 | (−0.024, 0.066) |
5. Discussion
In this paper, we proposed a semiparametric estimator for the regression coefficients of the additive-multiplicative rates model to deal with the intermittently observed time-dependent covariates. The additive-multiplicative rates model generalizes the proportional rates model and additive rates model, and hence allows some covariates to have multiplicative effects on the risk of recurrent events and others to have additive effects. The proposed method applies the nonparametric kernel smoothing approach to estimate the mean processes of the time-dependent covariates, thus it does not rely on any assumption of the covariate distribution or any specification of the covariate process and is expected to be more robust. The proposed method requires that the rate function m(t) of the observation time process is positive and bounded on [0, τ]. As the observations of time-dependent covariates become more dense, we expect better performance of the proposed method.
In this paper, we assume that the observation process is independent of the covariates, the recurrent event process, and the censoring time. When such an independence assumption is violated, the proposed method may yield biased estimation. When the rate function of the observation process is determined by observed covariates, one can assume a proportional rate model on the observation process and construct an inverse-rate-weighting in the kernel smoothing estimator for the outcome model, following the arguments of Sun et al. (2021b). Extending their method to additive-multiplicative rates model is a future research direction.
In practice, a problem is to determine the covariates included in the additive part Z(t) and the multiplicative part X(t). As discussed in Liu et al. (2010), if a covariate is expected to greatly influence the risk difference or the researchers are most interested in the absolute risks, then it should be included in Z(t). Otherwise, if a covariate is expected to strongly influence the risk ratios or the researchers are interested in the relative risks, then it should be included in X(t). If the underlying biological process is not clear and the number of covariates is small, an alternative way is to consider all the possible candidate models and then develop rigorous model selection approach to determine the best model. The mean-square-type distance measure between the observed and expected recurrences implemented in Liu et al. (2010) could be a model selection criteria for the additive and/or multiplicative model structure. However, it cannot be directly applied to data with intermittently observed time-dependent covariates. Future research on model selection procedures for intermittently observed time-dependent covariates is warranted.
An R package has been developed to implement the proposed estimator for the additive-multiplicative rates model with intermittently observed time-dependent covariates, as well as the estimators for the proportional rates model and the additive rates model with such covariates. The R package rectime is available at www.github.com/TianmengL/rectime.
Supplementary Material
Acknowledgments
The authors thank Dr. Dean Follmann for kindly sharing the India pharyngitis study data. They also thank the University of Minnesota Supercomputing Institute for providing computing resources. Lyu and Luo were partially supported by the U.S. National Institutes of Health (R03MH112895). Luo was also supported by the U.S. National Institutes of Health (P30CA077598).
Appendix
Regularity Conditions
The observed data {Ni(t), Oi(t), Wi(t)dNi(t), Wi(t)dOi(t), Ci; 0 ≤ t ≤ Ci, i = 1, …, n} are assumed to be independent and identically distributed.
Ni(τ) is bounded. Define λc(t)dt = E{dNi(t)}, and λc(·) is of bounded variation.
The true parameter θ0 is in a compact set and the baseline rate function λ0(t) is absolutely continuous.
For each element in the covariate vector Wi(t), the covariate process Wij(t) has uniformly bounded total variation, namely, for some M > 0 for all i and j. Without loss of generality, we assume Wij(t) ≥ 0.
The censoring time Ci is independent of conditional on Wi(·) in the sense that , and G(τ) = P(Ci ≥ τ) > 0.
The functions sz2x(t, β) = E[Yi(t)Zi(t)⊗2 exp{−β⊤Xi(t)}], szx(t) = E[Yi(t)Xi(t)Zi(t)⊤], , and , k = 0, 1, have bounded second derivatives for t ∈ [0, τ].
The observation time process Oi(·) is independent of {Wi(·), Ni(·), Ci} and is bounded. Moreover, its rate function m(t) is positive and has bounded second derivative for t ∈ [0, τ].
The matrix is nonsingular.
The kernel function K(·) is a symmetric density function with bounded support on [−1, 1] which satisfies: , , and is a positive constant.
The bandwidth h = cn−v, where 1/4 < v < 1/2 and c > 0 is some constant.
Proof of consistency
To show the consistency of , it is sufficient to prove that the processes that constitute the estimating function , including , , , , , k = 0, 1, and , converge to their limits uniformly. We know where γ0 is a m × 1 vector, β0 is a q × 1 vector and m + q = p. Since θ0 is in a compact set Θ in by assumption 3, β0 is contained in a compact set in . The function classes , are monotone and , have bracketing number of polynomial order.
According to Theorem 2.14.9 in Van Der Vaart and Wellner (1996) and following similar steps in the Appendix of Li et al. (2016), we can show that , , , converge in probability to 0 when nh2 → ∞.
It can be seen that
and
Then the uniform consistency of , , k = 0, 1, and have been proved. By the law of large numbers, we can show that , , and converge in probability to E{Ni(t)}, , and , respectively.
Therefore, the proposed estimator converges to the true parameter θ0 in probability.
Proof of Asymptotic Normality
We prove the asymptotic normality of . We show that has the form
Then we have
and
Thus we have
where and
Similarly, it can be shown that , where , and
Define , then we have .
Define . We define and , then we have
Also, we have . By Taylor expansion, we have , where θ* is on the line segment between θ0 and . Since we can show that converges to A(θ0), we have converges to a normal distribution with mean zero and variance A(θ0)−1V (θ0){A(θ0)−1}⊤, where V (θ0) = E{ϕ1(θ0)ϕ1(θ0)⊤}.
Footnotes
Supplementary Material
The supplementary material includes the R code that implements the proposed methods. It also includes an example file to illustrate how to simulate data and estimate model parameters using the provided code files.
References
- Andersen PK, Gill RD (1982). Cox’s regression model for counting processes: a large sample study. The Annals of Statistics, 10(4): 1100–1120. [Google Scholar]
- Andersen PK, Liestøl K (2003). Attenuation caused by infrequently updated covariates in survival analysis. Biostatistics, 4(4): 633–649. [DOI] [PubMed] [Google Scholar]
- Boscardin WJ, Taylor JM, Law N (1998). Longitudinal models for aids marker data. Statistical Methods in Medical Research, 7(1): 13–27. [DOI] [PubMed] [Google Scholar]
- Bycott P, Taylor J (1998). A comparison of smoothing techniques for cd4 data measured with error in a time-dependent cox proportional hazards model. Statistics in Medicine, 17(18): 2061–2077. [DOI] [PubMed] [Google Scholar]
- Cai J, He H, Song X, Sun L (2017a). An additive-multiplicative mean residual life model for right-censored data. Biometrical Journal, 59(3): 579–592. [DOI] [PubMed] [Google Scholar]
- Cai Q, Wang MC, Chan KCG (2017b). Joint modeling of longitudinal, recurrent events and failure time data for survivor’s population. Biometrics, 73(4): 1150–1160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao H, Churpek MM, Zeng D, Fine JP (2015). Analysis of the proportional hazards model with sparse longitudinal covariates. Journal of the American Statistical Association, 110(511): 1187–1196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao H, Fine JP (2021). On the proportional hazards model with last observation carried forward covariates. Annals of the Institute of Statistical Mathematics, 73(1): 115–134. [Google Scholar]
- Dafni UG, Tsiatis AA (1998). Evaluating surrogate markers of clinical outcome when measured with error. Biometrics, 54(4): 1445–1462. [PubMed] [Google Scholar]
- Faucett CL, Schenker N, Elashoff RM (1998). Analysis of censored survival data with intermittently observed time-dependent binary covariates. Journal of the American Statistical Association, 93(442): 427–437. [Google Scholar]
- Han M, Song X, Sun L, Liu L (2016). An additive-multiplicative mean model for marker data contingent on recurrent event with an informative terminal event. Statistica Sinica, 26(3): 1197–1218. [Google Scholar]
- Henderson R, Diggle P, Dobson A (2000). Joint modelling of longitudinal measurements and event time data. Biostatistics, 1(4): 465–480. [DOI] [PubMed] [Google Scholar]
- Jose JJM, Brahmadathan KN, Abraham VJ, Huang CY, Morens D, Hoe NP, et al. (2018). Streptococcal group a, c and g pharyngitis in school children: a prospective cohort study in southern india. Epidemiology & Infection, 146(7): 848–853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim S, Zeng D, Chambless L, Li Y (2012). Joint models of longitudinal data and recurrent events with informative terminal event. Statistics in Biosciences, 4(2): 262–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lawless JF, Nadeau C, Cook RJ (1997). Analysis of mean and rate functions for recurrent events. In: Proceedings of the First Seattle Symposium in Biostatistics: Survival Analysis, eds. Lin DY and Fleming TR, 37–49. Springer. [Google Scholar]
- Li S (2016). Joint modeling of recurrent event processes and intermittently observed time-varying binary covariate processes. Lifetime Data Analysis, 22(1): 145–160. [DOI] [PubMed] [Google Scholar]
- Li S, Sun Y, Huang CY, Follmann DA, Krause R (2016). Recurrent event data analysis with intermittently observed time-varying covariates. Statistics in Medicine, 35(18): 3049–3065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin DY, Wei LJ, Yang I, Ying Z (2000). Semiparametric regression for the mean and rate functions of recurrent events. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4): 711–730. [Google Scholar]
- Lin DY, Ying Z (1995). Semiparametric analysis of general additive-multiplicative hazard models for counting processes. The Annals of Statistics, 23(5): 1712–1734. [Google Scholar]
- Liu Y, Wu Y, Cai J, Zhou H (2010). Additive–multiplicative rates model for recurrent events. Lifetime Data Analysis, 16(3): 353–373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu YY, Wu YS (2011). Semiparametric additive intensity model with frailty for recurrent events. Acta Mathematica Sinica, English Series, 27(9): 1831. [Google Scholar]
- Lyu T, Luo X, Huang CY, Sun Y (2021). Additive rates model for recurrent event data with intermittently observed time-dependent covariates. Statistical Methods in Medical Research, 30(10): 2239–2255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pepe MS, Cai J (1993). Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. Journal of the American Statistical Association, 88(423): 811–820. [Google Scholar]
- Prentice RL (1982). Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika, 69(2): 331–342. [Google Scholar]
- Prentice RL, Williams BJ, Peterson AV (1981). On the regression analysis of multivariate failure time data. Biometrika, 68(2): 373–379. [Google Scholar]
- Raboud J, Reid N, Coates RA, Farewell VT (1993). Estimating risks of progressing to aids when covariates are measured with error. Journal of the Royal Statistical Society. Series A, 156(3): 393–406. [Google Scholar]
- Schaubel DE, Zeng D, Cai J (2006). A semiparametric additive rates model for recurrent event data. Lifetime Data Analysis, 12(4): 389–406. [DOI] [PubMed] [Google Scholar]
- Scheike TH, Zhang MJ (2002). An additive–multiplicative cox–aalen regression model. Scandinavian Journal of Statistics, 29(1): 75–88. [Google Scholar]
- Stanworth SJ, Hudson CL, Estcourt LJ, Johnson RJ, Wood EM (2015). Risk of bleeding and use of platelet transfusions in patients with hematologic malignancies: recurrent event analysis. Haematologica, 100(6): 740–747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun X, Song X, Sun L (2021a). Additive hazard regression of event history studies with intermittently measured covariates. Canadian Journal of Statistics. In press. [Google Scholar]
- Sun Y, McCulloch CE, Marr KA, Huang CY (2021b). Recurrent events analysis with data collected at informative clinical visits in electronic health records. Journal of the American Statistical Association, 116(534): 594–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tsiatis AA, Degruttola V, Wulfsohn MS (1995). Modeling the relationship of survival to longitudinal data measured with error. applications to survival and cd4 counts in patients with aids. Journal of the American Statistical Association, 90(429): 27–37. [Google Scholar]
- Van Der Heijden AA, van’t Riet E, Bot SD, Cannegieter SC, Stehouwer CD, Baan CA, et al. (2013). Risk of a recurrent cardiovascular event in individuals with type 2 diabetes or intermediate hyperglycemia. Diabetes Care, 36(11): 3498–3502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Der Vaart AW, Wellner JA (1996). Weak Convergence and Empirical Processes. New York: Springer-Verlag. [Google Scholar]
- Vonesh EF, Greene T, Schluchter MD (2006). Shared parameter models for the joint analysis of longitudinal data and event times. Statistics in Medicine, 25(1): 143–163. [DOI] [PubMed] [Google Scholar]
- Wulfsohn MS, Tsiatis AA (1997). A joint model for survival and longitudinal data measured with error. Biometrics, 53(1): 330–339. [PubMed] [Google Scholar]
- Xu J, Zeger SL (2001). Joint analysis of longitudinal data comprising repeated measures and times to events. Journal of the Royal Statistical Society: Series C, 50(3): 375–387. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
