Summary
This paper proposes a unified framework to characterize the rate function of a recurrent event process through shape and size parameters. In contrast to the intensity function, which is the event occurrence rate conditional on the event history, the rate function is the occurrence rate unconditional on the event history, and thus it can be interpreted as a population-averaged count of events in unit time. In this paper, shape and size parameters are introduced and used to characterize the association between the rate function λ(·) and a random variable X. Measures of association between X and λ(·) are defined via shape- and size-based coefficients. Rate-independence of X and λ(·) is studied through tests of shape-independence and size-independence, where the shape-and size-based test statistics can be used separately or in combination. These tests can be applied when X is a covariable possibly correlated with the recurrent event process through λ(·) or, in the one-sample setting, when X is the censoring time at which the observation of N(·) is terminated. The proposed tests are shape- and size-based, so when a null hypothesis is rejected, the test results can serve to distinguish the source of violation.
Keywords: Intensity function, Point process, Poisson process, Rate-independence, Rate function, Shape-independence, Size-independence
1. Introduction and background
Recurrent events are natural outcomes in many longitudinal studies. Conventionally, recurrent events of a subject are modelled as the realization of an underlying point process. Statistical models for analysing recurrent event data are usually formulated based on either the intensity function or the rate function of the point process. Many authors, including Gail et al. (1980), Prentice et al. (1981) and Andersen & Gill (1982), have considered modelling the occurrence probability of events given the preceding covariates and event history. Others, including Nelson (1988) and Lawless & Nadeau (1995), have developed nonparametric estimation methods to estimate the rate function of a recurrent event process. Lin et al. (2000) studied regression of recurrent events unconditional on event history. Readers are referred to Therneau & Grambsch (2000), Kalbfleisch & Prentice (2002) and Cook & Lawless (2007) for reviews of existing models and methods. Examples of econometric applications can be found in Lancaster (1990) and references therein.
Let N(t) denote the number of recurrent events occurring before or at time t, where t ∈ [0, τ], with the constant τ > 0 determined from the knowledge that recurrent events could potentially be observed up to time τ. Here we assume that the recurrent events are of the same type. Let Y denote the censoring time at which observation of N(·) is terminated, so that only {N(t) : 0 ≤ t ≤ Y} is observable. Let X be a random variable which is possibly correlated with N(·). The variable X can be a covariate, such as a treatment indicator or health measurement, or it could be the censoring time; in the latter case, X = Y.
The intensity function for the point process N(·) is defined as the event occurrence rate conditional on the event history,
where
(t) = {N(u) : 0 ≤ u < t} represents the history of recurrent events up to t, for t ∈ (0, τ]. The intensity function uniquely determines the probability structure of the point process, provided that pr[N{(t + Δ)−} − N(t) > 1] = o(Δ) (Cox & Isham, 1980). The rate function λ(t) is defined as the occurrence rate unconditional on the event history; that is,
The rate function λ(t) may be interpreted as the instantaneous population-averaged count of recurrent events in unit time. Unlike the intensity function, the rate function does not fully determine the probability structure of a point process, and different intensity functions could share the same rate function. As illustrated by the following examples, the rate function is conceptually and quantitatively different from the intensity function.
Example 1
Suppose that N(·) is a nonstationary Poisson process with intensity function λ(t). Because of the independent-increments property of Poisson processes, the intensity function is memoryless and coincides with the rate function; that is, λ{t |
(t)} = λ(t).
Example 2
Suppose that there exists a nonnegative random variable Z such that, given Z = z, N(·) is a nonstationary Poisson process with subject-specific intensity function zλ0(t), where λ0(t) is a nonnegative-valued function. The variable Z is assumed to be latent in the model and is used to account for population heterogeneity. Let G denote the distribution function of Z. Conditioning on Z = z, the intensity function zλ0(t) is also the rate function. Unconditional on Z, the rate function is λ(t) = μZ λ0(t), with μZ = E(Z). Note that λ(t) is the same for those distribution functions G with the same mean.
For a fixed t ∈ [0, τ], let t1 < ··· < tN(t) denote the ordered event times before or at t. Let Λ0(t) = ∫I (0 ≤ u ≤t) λ0(u) du. For simplicity of notation, let p(·|·) denote a general conditional probability density function and P(·|·) a general conditional distribution function. The probability density function of the event history prior to t, given Z = z, is
By Bayes’ theorem, we obtain that
The memoryless property of the Poisson process implies that λ{t | z,
(t)} = zλ0(t), so the intensity function can be expressed as
Thus, the intensity function λ{t |
(t)} is a transformation of {N(t), Λ0(t), G} and depends on the event history only through N(t). In the special case where Z is a gamma random variable with mean γ/β and variance γ/β2, the rate function is (γ/β) λ0(t) and the intensity function can be expressed as
| (1) |
The detailed calculation can be found in the Supplementary Material. Clearly, in this case, the intensity function λ{t |
(t)} depends on the event history through the total number of events, N(t), and is structurally different from the rate function λ(t).
2. Size, shape and rate
Assume that the rate function λ(t) is continuous in t for t ∈ [0, τ]. The cumulative rate function, defined as Λ(t) =∫I (0 ≤ u ≤ t) λ(u) du, is nondecreasing and continuous in t, with E{N(t)} = Λ(t) for t ∈ [0, τ]. In this section we introduce the shape and size parameters of the rate function. These two parameters serve as the basis for defining shape- and size-independence between X and λ(·). Shape- and size-based measures will be constructed to quantify the association between X and λ(·).
We first note that the rate function can be characterized by shape and size parameters, where the size parameter is defined to be the cumulative rate function evaluated at τ, i.e., Λ(τ), and the shape parameter is defined by
Thus the rate function can be decomposed into the product of the shape and size parameters: λ(t) = f(t) × Λ(τ) for t ∈ [0, τ]. Clearly, f (t) defines a proper probability density function on [0, τ], and Λ(τ) can be interpreted as the total magnitude of the rate function or, more precisely, the expected number of events occurring in the prespecified interval [0, τ]. The function f (t) is essentially the standardized rate function over the interval [0, τ]. It is called the shape parameter because it characterizes the shape of the marginal occurrence rate over [0, τ], adjusted by the magnitude of the rate function. In contrast, the size parameter quantifies the magnitude of the rate function without containing additional information on the pattern of the occurrence rate over [0, τ]. For given X = x, let the conditional rate function be denoted by λ(t | x). Based on the shape and size parameters, we define rate-, shape- and size-independence between X and λ(·) as follows.
Definition 1
The rate function λ(·) is rate-independent of X if λ(t | x) depends on (t, x) only through t or, equivalently, λ(t|x) = λ(t) for (t, x) ∈ [0, τ] × RX, where RX is the support of the distribution of X.
Definition 2
The rate function λ(·) is shape-independent of X if, for (t, x) ∈ [0, τ] × RX, its shape function
depends on (t, x) only through t or, equivalently, λ(t | x) = a(x) f(t) where a(·) is a nonnegative-valued function of x.
Definition 3
The rate function λ(·) is size-independent of X if, for x ∈ RX, Λ(τ | x) depends on (τ, x) only through τ.
It can be shown that rate-independence holds if and only if both shape-independence and size-independence hold. In general, rate-independence is weaker than independence between N(·) and X. We now consider a number of rate-, shape- and size-independent models.
Example 3
Suppose that X coincides with Y, the censoring time when observation of N(·) is terminated. An independent censoring condition usually refers to the assumption that X is independent of N(·), although a weaker condition of rate-independence would be sufficient for nonparametric estimation of the rate function (Nelson, 1988, 1995; Lawless & Nadeau, 1995).
Example 4
Poisson count models are commonly used for modelling the number of events occurring in a fixed time interval [0, τ] in the absence of censoring. Such a model focuses only on the total event count at τ, i.e., N(τ), although it may implicitly impose a Poisson process assumption on the point process N(·). A typical question of interest is the association between a variable X and N(τ). In such a model, the count variable follows a Poisson distribution with mean parameter Λ(τ| x), where size-independence holds if Λ(τ| x) = Λ(τ) for x ∈ RX.
Example 5
The popular proportional rate model considered by Pepe & Cai (1993), Lin et al. (2000) and Schaubel & Cai (2005), among others, is given by λ(t | x) = f(t) exp(α + βx) for t ∈ [0, τ], where α and β are unknown parameters and f(t) is the shape function. This is a shape-independent model with a(x) = exp(α + βx). In the proportional rate model, the effect of X on N(·) is modelled only through the size parameter of the rate function.
Example 6
The shared-frailty proportional rate model is a commonly adopted model, in which the recurrent event process and censoring share the same subject-specific latent variable, Z. Given Z = z and X = x, the rate function is modelled as λ(t | z, x) = z f (t) exp(α + β x) for t ∈ [0, τ], where α and β are unknown parameters and f (t) is the shape function. This model has been considered by many authors, including Lancaster & Intrator (1998), Wang et al. (2001), Huang & Wang (2004), Liu et al. (2004) and Ye et al. (2007), and it is a shape-independent model with a(x) = E(Z | x) exp(α + βx). Similar to Example 5, the regression effect of X on N(·) is modelled only through the size parameter of the event process.
3. Association measures
When studying the relationship between X and λ(·), a question of interest is how to quantify the dependence of λ(·) on X. In a general model setting, it is useful to characterize the dependence relationship through the shape and size components of the rate function. In this section we consider new measures, based on Pearson’s correlation coefficient and Kendall’s tau, for quantifying shape- and size-based associations between X and λ(·).
Conceptually, the shape parameter coincides with the population-averaged distributional pattern of event times in [0, τ], and the size parameter is the expected value of the total number of events, K = N(τ), occurring in [0, τ]. Define the processes
Heuristically, E{dN*(t)} = f(t) dt, dS(t) = t dN*(t) and E{dS(t)} = t f (t) dt. The weighted process N*(t) identifies the shape of N(t) for t ∈ [0, τ] and therefore can be regarded as a shape process. Further, S(t) is a marked point process with mark t tracing the location of dN*(t) > 0. The processes N*(t) and S(t) can be used together to define the shape-based association measures. In what follows, we present two approaches to quantifying associations.
Pearson’s correlation coefficient can readily be used to define the size-based association measure
where μX and μK are the means and σX and σK the standard deviations of X and K. To define the shape-based association measure, we extend Pearson’s correlation coefficient to quantify the association between variable X and process dS(·) as follows. The covariance between X and dS(·) in the interval [0, τ] can be defined as , where is the mean of the shape function f. The shape-based association measure is thus defined as the correlation coefficient between X and the variable S(τ):
where σS is the standard deviation of S(τ). Because dN(t) takes values only from {0, 1}, one can show that
Interestingly, coincides with, is smaller than, or is larger than the variance of the shape function f, which is a probability density function, when Λ(τ) equals, is greater than, or is less than 1, respectively. This is intuitively correct in view of the fact that, with Λ(τ) being the expectation of N(τ) and representing a type of mean occurrence time of recurrent events, a larger value of Λ(τ) implies a smaller variance of S(τ) and vice versa.
An alternative approach is to define the size-based association based on Kendall’s tau:
where {X1, N1(·)} and {X2, N2(·)} are assumed to be independent and identically distributed. In an analogous way, Kendall’s tau can be extended to define the shape-based association between the random variable X and the shape process N*(·):
Clearly, both πsz and πsh take values between −1 and 1. This property holds for πsh in view of the fact that is essentially a probability density function.
Theorem 1
Shape-independence between X and λ(·) implies ρsh = 0. Size-independence between X and λ(·) implies ρsz = 0.
Shape-independence between X and λ(·) implies πsh = 0. Size-independence between X and λ(·) implies πsz = 0.
Theorem 1 quantifies the relationship between shape- or size-independence and the association measures. Further, since rate-independence holds if and only if shape- and size-independence both hold, it follows that rate-independence implies (ρsh, ρsz) = (0, 0) and (πsh, πsz) = (0, 0).
In general, it can be shown that the association measures ρsh, ρsz, πsh and πsz are invariant under linear transformations of the time scale t. In nonparametric models with censoring, the association measures ρsh, ρsz, πsh and πsz are neither identifiable nor estimable because of incompleteness in the observed samples. However, despite the nonidentifiability of these measures, nonparametric test statistics can still be constructed to test shape- and size-independence between X and λ(·) in the observable region of samples. These tests will be introduced in the next section.
4. Shape- and size-based tests
4·1. Test statistics
In this section we present a formal procedure for testing rate-independence between λ(·) and X for recurrent event data with censoring, where the null hypothesis is formulated as H: λ (t | x) = λ (t) for (t, x) ∈ [0, τ] × RX. Assume that the observed data { Xi, Yi,
(Yi)} (i = 1, …, n) are independent and identically distributed copies of {X, Y,
(Y)}. The proposed statistical test is a two-step procedure, where a shape-independence test is performed in step 1, followed by a size-independence test in step 2. By performing the shape- and size-independence tests, the dependence between X and λ(·) can be clearly explained and better characterized. As the appropriateness of the size-independence test depends on the validity of the shape-independence hypothesis, the two tests are performed in succession. These test statistics can be used in the following two cases.
Case 1: A covariable X is possibly correlated with N(·). In this case, we require an additional assumption that the censoring time Y is rate-independent of λ(·) given X.
Case 2: The censoring time Y coincides with X, i.e., X = Y. In this case, testing the rate-independence of λ(·) and Y is of interest.
Tests for rate-independence in Case 1 can be used to examine the association between a potential risk factor and the recurrent event process. Tests in Case 2 are used to examine the independent censoring assumption that is typically required for recurrent event data analysis in one-sample settings. In applications, study subjects are often censored because of loss to follow-up or end of the study; yet it is also possible that a terminal event, such as death, is a component of the censoring mechanism. Let D denote the time to death and C the time to loss to follow-up or end of study. Then the censoring time is given by Y = min(D, C). When postulating statistical models, the recurrent event process, N(t), is defined for t ∈ [0, τ] and could be considered latent for D < t ≤ τ, should death occur before τ. As will be shown later in this section, the proposed shape- and size-based tests are constructed based on recurrent events in the observable region {N(t) : t ∈ [0, Y]}. Therefore, in Case 2, the proposed tests can also be viewed as quasi-independence tests between Y and λ(t), for t ∈ [0, Y], to avoid the controversy of allowing the occurrence of recurrent events after death under the model assumption (Ghosh & Lin, 2003). In Case 2, our test statistics are designed to test shape- and size-independence between Y and λ(·). Thus, if the dependence between D and λ(·) is of interest, the proposed tests may not be optimal for data analysis, even though the dependence between D and λ(·) is partially carried over to the dependence between Y and λ(·).
Consider testing the null hypothesis that X is shape-independent of N(·), that is, Hsh: λ(t | x) = a(x) f(t) for (t, x) ∈ [0, τ] × RX. We propose a U-statistic to test the hypothesis based on the location of event times:
| (2) |
In (2), the sum is over all selections (i, j) from (1, …, n) such that i < j, is a nonnegative-valued weight function of (Xi, Xj, Yi, Yj) satisfying , sgn(r) = 1, −1 or 0 if r is positive, negative or zero, respectively, and Δij(u, t) = I{max(u, t) ≤ min(Yi, Yj)}. Define the kernel function
where Ni and Nj represent the recurrent event process in [0, τ] for subject i and subject j, respectively. Then
is a U-statistic with kernel hsh(·, ·), which is symmetric in its arguments. By the pairwise comparison of event times, Wsh is designed to test whether the shape of the recurrent event process is associated with X. The bivariate indicator Δij in (2) ensures the comparability of event times from subjects i and j.
Next, we consider testing for shape-independence. In general, when censoring is present, the size-independence between X and λ(·) cannot be tested nonparametrically due to nonidentifiability of Λ(τ). Nevertheless, if shape-independence holds, a test statistic can be constructed to test the size-independence between X and λ(·) in the presence of censoring. Under shape-independence, the null hypothesis Hsz: Λ(τ | x) = Λ(τ) for x ∈ RX is equivalent to Hsz: λ(t | x) = Λ(τ) f(t) for (t, x) ∈ [0, τ] × RX.
To test Hsz, an initial idea might be to construct a test statistic in the spirit of the log-rank test,
However, the log-rank statistic requires independent censoring to guarantee the representativeness of the risk set under both null and alternative hypotheses. In Case 1, the censoring time is independent of the recurrent event process given X, so the log-rank statistic is a proper test. In Case 2, the censoring time Y is equal to X and the risk sets under the alternative hypotheses fail to be representative. Thus, the log-rank statistic can be applied to Case 1 but not Case 2. Here we consider a size-independence test statistic which can be used for both Case 1 and Case 2:
| (3) |
where is a nonnegative-valued weight function of (Xi, Xj, Yi, Yj) satisfying , and Δij (t, t) = I{t ≤ min(Yi, Yj)}. Define the kernel function
Then
is a U-statistic with symmetric kernel hsz(·, ·).
Both Wsh and Wsz can be considered Kendall’s-tau-type statistics, and can handle censored and truncated data. The indicator function Δij(t, t) in (3) determines the time interval in which counts from subjects i and j are comparable, so the test statistic Wsz tests the size-independence of X and λ(·) based on recurrent event data from a comparable interval [0, Yi ∧ Yj]. This test strategy is appropriate only when the shape of N(·) is unaffected by X, as in this case the difference of sizes can be detected by comparing proportional counts.
In either Case 1 or Case 2, the rate-independence under H is accepted if the null hypotheses of shape-independence and size-independence are both accepted. Rate-independence is rejected if either Hsh is rejected in step 1, or Hsh is accepted in step 1 but Hsz is rejected in step 2.
4·2. Properties of the test statistics
This section establishes some properties of the proposed test statistics. The discussion will not distinguish between Cases 1 and 2, but one should keep in mind that Case 1 requires the additional assumption that Y is rate-independent of λ(·) given X.
Theorem 2
Define μsh = E[hsh{(Xi, Ni), (Xj, Nj)}] and assume . Then:
n1/2(Wsh − μsh) converges weakly to the normal distribution with mean zero and variance , where ;
under the null hypothesis Hsh, n1/2Wsh converges weakly to the normal distribution with mean zero and variance , where .
Theorem 3
Define μsz = E[hsz{(Xi, Ni), (Xj, Nj)}]. Assume that the null hypothesis Hsh holds and that . Then:
n1/2(Wsz − μsz) converges weakly to the normal distribution with mean zero and variance , where ;
under Hsz, n−1/2 Wsz converges weakly to the normal distribution with mean zero and variance , where .
The proofs of Theorems 2 and 3 are given in the Appendix and the Supplementary Material. To conclude this section, we remark that an alternative test statistic can be constructed by replacing the sign function sgn in Wsh and Wsz with a function ϕ(u, t) that satisfies ϕ(u, t) = −ϕ(t, u), such as ϕ(u, t) = (u − t)k where k is an odd integer. Finally, the asymptotic variances of the test statistics Wsh and Wsz can be consistently estimated by the corresponding sample U-statistics
5. Numerical studies
Two simulation studies were conducted to examine the performance of the proposed testing procedures in Cases 1 and 2 described in § 4·1. In all simulations, 2000 datasets were generated, each with n independent subjects, where we took n = 100 and 200. For Case 1, a binary time-independent covariate X was generated from the Bernoulli distribution with probability 0·5. A subject-specific frailty Z was simulated from a gamma random distribution with mean 1 and variance 0·25 to account for population heterogeneity. Given X = x and Z = z, the recurrent event times of a subject were generated from a nonstationary Poisson process {N(t) : t ∈ [0, 10]} with intensity function
| (4) |
By integrating out the frailty variable, the rate function of N(·), conditional on X = x, is λ(t | x) = 2 exp(−2−1t + β0x + γ0xt). Thus the shape parameter is
and the size parameter is
It is easy to see that γ0 = 0 implies that X is shape-independent of λ(·), while γ0 = β0 = 0 implies that X is rate-independent of λ(·). Finally, the censoring time Y was independently generated from a uniform distribution on [0, 10]. We considered combinations of simulation parameters γ0 = 0, 0·2 and β0 = 0, −0·3. The average number of recurrent events per subject ranges from 2·8 to 3·9 across all scenarios.
Table 1 summarizes the estimated powers of the shape- and size-independence tests used separately and in combination. The significance levels were set to 0·05 if tests were applied separately. To control the overall Type I error rate so that it is lower than 0·05, the significance level of each individual test was set to 0·025 when test statistics Wsh and Wsz were combined to test rate-independence. A unit weight function was used for both test statistics unless otherwise noted; that is, for all i, j. When the shape parameter is independent of X, i.e., γ0 = 0, the probability of the null hypothesis Hsh being falsely rejected is very close to the nominal level, 0·05. When γ0 = 0 and β0 = 0, both the size- and shape-independence tests reject the corresponding null hypotheses Hsh and Hsz approximately 5% of the time, demonstrating satisfactory performance of the proposed tests at the 5% level. When Wsh and Wsz are used in conjunction to test shape-independence, the estimated overall power is approximately 0·05 when the null hypothesis holds. Moreover, as illustrated in the Supplementary Material, the distributions of the p-values derived from the proposed test statistics under the null hypotheses are very close to uniform, suggesting that the test statistics perform well at all significance levels.
Table 1.
Estimated power (%) for the shape-, size- and rate-independence tests, with nominal level 5%. Results are the percentages of rejected null hypotheses Hsh, Hsz and H among 2000 replications
| n = 100 | n = 200 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| γ0 | β0 | m̄ | Shape | Size | Rate | Shape | Size | Rate | |
| Case 1 | 0 | 0 | 3·2 | 4 | 6 | 5 | 5 | 6 | 5 |
| 0 | −0·3 | 2·8 | 4 | 47 | 37 | 5 | 75 | 66 | |
| 0·2 | 0 | 3·9 | 62 | 49 | 69 | 90 | 78 | 97 | |
| 0·2 | −0·3 | 3·3 | 53 | 6 | 42 | 87 | 5 | 79 | |
| Case 2 | 0 | 0 | 3·2 | 5 | 5 | 4 | 5 | 5 | 5 |
| 0 | 0·5 | 4·3 | 4 | 42 | 32 | 5 | 71 | 62 | |
| 0·4 | 0 | 5·6 | 62 | 55 | 69 | 92 | 85 | 96 | |
| 0·4 | 0·5 | 7·8 | 78 | 98 | 98 | 98 | 100 | 100 | |
m̄, average number of observed events per subject.
Next, we evaluated the performance of the test statistics in Case 2. We generated X from a uniform distribution on [0, 1] and set the follow-up time to be Y = 10X. Thus Y has a uniform distribution on [0, 10]. Moreover, we simulated recurrent events from the Poisson process model (4) with a subject-specific frailty Z simulated from a gamma random variable with mean 1 and variance 0·25. Thus γ0 = 0 implies that Y is shape-independent of λ(·), while γ0 = β0 = 0 implies rate-independence. As shown in Table 1, when shape-independence holds for the follow-up time, the estimated power of the shape-independence test statistic Wsh is very close to the nominal level, 0·05. When both the size and shape parameters are independent of the follow-up time Y, the proportion of false rejections with combined use of the two test statistics is approximately 0·05. Q-Q plots in the Supplementary Material suggest that the test statistics under the null perform well at all significance levels.
6. Data example
We applied the proposed testing procedures to data collected by the Danish Psychiatric Central Register. This case register records data from all admissions to psychiatric hospitals in the entire nation of Denmark, and participation is mandatory for all Danish psychiatric hospitals, relevant clinical departments and units treating patients with schizophrenia. The register covers psychiatric inpatient services since 1969, as well as outpatient contacts from 1994 onward (Munk-Jørgensen & Mortensen, 1997). We analysed data from a cohort of 1235 patients who had their first contact with psychiatric services between 1 April 1970 and 1 April 1973. Of these patients, 810 were male and 226 had early-onset schizophrenia, defined as onset before 20 years of age. To demonstrate the use of the proposed test statistics in Case 1, we artificially censored the follow-up for all individuals on 1 April 1973 so that the censoring time is expected to be conditionally rate-independent of the recurrent event process. As a result, the median follow-up was 1·3 years and the average number of inpatient psychiatric admissions per patient was 1·2. A total of 28 deaths was recorded during the three-year study period.
We first examined the assumption of shape-independence of the censoring time Y and the counting process of repeated hospitalizations, N(·), within each gender group. The shape- and size-independence tests yield p-values of 0·56 and 0·06, respectively, for males, and 0·70 and 0·052 for females. Thus the null hypothesis that Y is rate-independent of λ(·) conditional on gender is not rejected. Next, we applied the shape- and size-independence tests to examine the association between recurrent inpatient psychiatric admissions and gender. The test for shape-independence yields a p-value of 0·70, and the test for size-independence yields a p-value of 0·0002. Hence gender is not associated with the shape parameter of λ(·) but is significantly associated with the size parameter. Thus, the proportional rate model (Pepe & Cai, 1993; Lin et al., 2000) is a natural choice for evaluating the effect of gender on rehospitalization risk.
Similarly, we checked the assumption of rate-independence of Y and λ(·) for individuals with early- and late-onset schizophrenia. The p-values given by the shape- and size-independence tests are 0·12 and 0·06, respectively, for the early-onset group, and 0·83 and 0·16 for the later-onset group. Thus, the conditional rate-independence assumption of Y and λ(·) is not violated for the first three years since initial hospitalization. Moreover, the shape-independence test for the association between N(·) and age of onset yields a p-value of 0·31, indicating that the shape parameter is shape-independent of onset age. On the other hand, the size-independence test for the association between N(·) and age of onset yields a p-value of 0·0002; thus the age of onset is significantly associated with the size parameter. The results suggest that the proportional rate model can be used to estimate the effect of the age of onset on the risk of repeated hospitalization.
The associate editor suggested conducting the tests with the two groups combined and the group effect accounted for. One can construct stratified tests in the presence of categorical covariates based on the weighted average of the test statistics from the subgroups, where the weights are inversely proportional to the sample sizes of the subgroups. The p-values of the stratified tests, stratified by gender and age of onset, are 0·21 for shape-independence and 0·39 for size-independence. Hence the rate-independence assumption of Y and λ(·) is not rejected after controlling for gender and age of onset.
7. Discussion
When studying a recurrent event process, the rate function is often of interest due to its interpretation as a population average. By decomposing the rate function into shape and size parameters, the probability structure of the recurrent event process can be better characterized and analysed. In either Case 1 or Case 2, the statistic Wsh is designed to test shape-independence between X and λ(·). If Hsh is accepted, the statistic Wsz is then used to test the size-independence hypothesis Hsz. The rate-independence between X and λ(·) is rejected if either Hsh or Hsz is rejected.
As the shape- and size-based statistics Wsh and Wsz are constructed in the spirit of Kendall’s tau, their use would be most appropriate when, for example, the model under the null and alternative hypotheses possesses a stochastic ordering property, i.e., for the shape-independence test, that the underlying model F(t | x) is monotone in x. Similarly, with validity of shape-independence established, a stochastic ordering model where Λ(τ | x) is monotone in x would be appropriate for the size-based test.
Along the lines of using shape and size parameters to characterize the recurrent event process, it would also be of interest to quantify the relationship between the size or shape of a process and the covariates in a regression model. Such statistical models and the accompanying methods are currently under investigation and the results will be reported in a future paper.
Supplementary Material
Acknowledgments
Thanks are due to the four reviewers for suggestions. The authors also thank Professors Preben Bo Mortensen and William Eaton for providing the anonymous Danish schizophrenia data. This research was supported by the U.S. National Institutes of Health.
Appendix
Proof of Theorem 1
If λ(·) is shape-independent of X, then dN(t | X = x) has rate λ(t | X = x) = a(x) f(t). This implies that E{dS(t | X = x)} = tf(t) dt and
and hence ρsh = 0. Size-independence between X and λ(·) implies that
and thus ρsz = 0.
Part (ii) deals with the association measures πsh and πsz. If X is size-independent of λ(·), then πsz = 0 because of the independence between Xj and Kj (j = 1, 2). If X is shape-independent of λ(·), then the independent and identically distributed structure of {X1, } and {X2, } together with the fact that imply that
and so πsh = 0 follows by symmetry of probabilities.
Proof of Theorem 2
Theorem 2 can be established by verifying the required conditions for the asymptotic normality of U-statistics. If holds, then Wsh converges to μsh in probability, and the asymptotic normality of n1/2(Wsh − μsh) in (i) follows from the theory of U-statistics (Hoeffding, 1948).
The test statistic Wsh is designed to test shape-independence between X and λ(·). Let g(u, t, Yi, Yj) = Δij(u, t)sgn(t − u) f(u) f(t). With Δij(u, t) = Δij(t, u), it is easy to see that g(u, t, yi, yj) = −g(t, u, yi, y j) and ∫ g(u, t, yi, yj) du dt = ∫ g(t, u, yi, yj) du dt. Thus, we derive the equality ∫ g(u, t, yi, yj) du dt = − ∫ g(u, t, yi, yj) du dt, which implies that ∫ g(u, t, yi, yj) du dt = 0. Under the hypothesis of shape-independence, Hsh: λ(t | x) = a(x) f(t), one can prove that μsh = E[hsh{(Xi, Ni), (Xj, Nj)}] = 0. The asymptotic normality of n1/2Wsh under Hsh follows by a standard argument for U-statistics: n1/2Wsh converges weakly to the normal distribution where , provided that .
The assumption in Theorem 2 is implied by the simple moment assumptions for i ǂ j and E{Ni(τ)4} < ∞. Details are given in the Supplementary Material.
Proof of Theorem 3
Assertion (i) is derived directly from the theory of U-statistics. To prove (ii), we further examine the zero-unbiasedness of Wsz under the additional hypothesis Hsz. Size-independence and shape-independence together imply rate-independence. Thus, when both Hsh and Hsz hold, we have
One can also prove that is implied by simpler moment constraints, namely for i ǂ j and E{Ni(τ)4} < ∞. These constraints are sufficient to validate the asymptotic normality of n−1/2Wsz in Theorem 3, although they may be stronger than necessary. Details can be found in the Supplementary Material.
Footnotes
Supplementary material
Supplementary material available at Biometrika online includes the proofs of equation (1) and Theorems 2 and 3, as well as Q-Q plots for examining the null performance of the proposed tests.
Contributor Information
MEI-CHENG WANG, Email: mcwang@jhu.edu, Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 615 N. Wolfe Street, Baltimore, Maryland 21205, U.S.A.
CHIUNG-YU HUANG, Email: cyhuang@jhmi.edu, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, 550 N. Broadway, Baltimore, Maryland 21205, U.S.A.
References
- Andersen PK, Gill RD. Cox’s regression model for counting processes: A large sample study. Ann Statist. 1982;10:1100–20. [Google Scholar]
- Cook RJ, Lawless JF. The Statistical Analysis of Recurrent Events. New York: Springer; 2007. [Google Scholar]
- Cox DR, Isham V. Point Processes. London: Chapman & Hall; 1980. [Google Scholar]
- Gail MH, Santner TJ, Brown CC. An analysis of comparative carcinogenesis experiments based on multiple times to tumor. Biometrics. 1980;36:255–66. [PubMed] [Google Scholar]
- Ghosh D, Lin DY. Semiparametric analysis of recurrent events data in the presence of dependent censoring. Biometrics. 2003;59:877–85. doi: 10.1111/j.0006-341x.2003.00102.x. [DOI] [PubMed] [Google Scholar]
- Hoeffding W. A class of statistics with asymptotically normal distribution. Ann Math Statist. 1948;19:293–325. [Google Scholar]
- Huang CY, Wang MC. Joint modeling and estimation for recurrent event processes and failure time data. J Am Statist Assoc. 2004;99:1153–65. doi: 10.1198/016214504000001033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2 Hoboken, New Jersey: John Wiley & Sons; 2002. [Google Scholar]
- Lancaster T. The Econometric Analysis of Transition Data. Cambridge: Cambridge University Press; 1990. [Google Scholar]
- Lancaster T, Intrator O. Panel data with survival: Hospitalization of HIV-positive patients. J Am Statist Assoc. 1998;93:46–53. [Google Scholar]
- Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent events. Technometrics. 1995;37:158–68. [Google Scholar]
- Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. J R Statist Soc B. 2000;62:711–30. [Google Scholar]
- Liu L, Wolfe RA, Huang X. Shared frailty models for recurrent events and a terminal event. Biometrics. 2004;60:747–56. doi: 10.1111/j.0006-341X.2004.00225.x. [DOI] [PubMed] [Google Scholar]
- Munk-Jørgensen P, Mortensen PB. The Danish psychiatric central register. Dan Med Bull. 1997;44:82–4. [PubMed] [Google Scholar]
- Nelson W. Graphical analysis of system repair data. J Qual Technol. 1988;20:24–35. [Google Scholar]
- Nelson W. Confidence limits for recurrence data—applied to cost or number of product repairs. Technomet-rics. 1995;37:147–57. [Google Scholar]
- Pepe MS, Cai J. Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. J Am Statist Assoc. 1993;88:811–20. [Google Scholar]
- Prentice RL, Williams BJ, Peterson AV. On the regression analysis of multivariate failure time data. Biometrika. 1981;68:373–9. [Google Scholar]
- Schaubel DE, Cai J. Semiparametric methods for clustered recurrent event data. Lifetime Data Anal. 2005;11:405–25. doi: 10.1007/s10985-005-2970-y. [DOI] [PubMed] [Google Scholar]
- Therneau TM, Grambsch PM. Modeling Survival Data: Extending the Cox Model. New York: Springer; 2000. [Google Scholar]
- Wang MC, Qin J, Chiang CT. Analyzing recurrent event data with informative censoring. J Am Statist Assoc. 2001;96:1057–65. doi: 10.1198/016214501753209031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ye Y, Kalbfleisch JD, Schaubel DE. Semiparametric analysis of correlated recurrent and terminal events. Biometrics. 2007;63:78–87. doi: 10.1111/j.1541-0420.2006.00677.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
