Inverse Probability of Censoring Weighted Estimates of Kendall’s τ for Gap Time Analyses

Lajmi Lakhal-Chaieb; Richard J Cook; Xihong Lin

doi:10.1111/j.1541-0420.2010.01404.x

. Author manuscript; available in PMC: 2014 Jan 19.

Published in final edited form as: Biometrics. 2010 Dec;66(4):1145–1152. doi: 10.1111/j.1541-0420.2010.01404.x

Inverse Probability of Censoring Weighted Estimates of Kendall’s τ for Gap Time Analyses

Lajmi Lakhal-Chaieb ^1,^*, Richard J Cook ^2,^**, Xihong Lin ^3,^***

PMCID: PMC3895492 NIHMSID: NIHMS508440 PMID: 20337629

Summary

In life history studies interest often lies in the analysis of the inter-event, or gap times and the association between event times. Gap time analyses are challenging however, even when the length of follow-up is determined independently of the event process, since associations between gap times induce dependent censoring for second and subsequent gap times. This paper discusses nonparametric estimation of the association between consecutive gap times based on Kendall’s τ in the presence of this type of dependent censoring. A nonparametric estimator which uses inverse probability of censoring weights is provided. Estimates of conditional gap time distributions can be obtained following specification of a particular copula function. Simulation studies show the estimator performs well and compares favourably with an alternative estimator. Generalizations to a piecewise constant Clayton copula are given. Several simulation studies and illustrations with real data sets are also provided.

Keywords: copula, dependent censoring, gap times, Kendall’s tau

1. Introduction

In many studies of life history processes, interest lies in the occurrence of two or more consecutive events. Bebchuk and Betensky (2002), for example, consider a three-state model in which the initial state is infection-free, the intermediate state represents HIV infection via blood transfusion, and the terminal state corresponds to AIDS diagnosis. Lin et al. (1999) discuss the analysis of follow-up data from a randomized trial of patients with colon cancer (Moertel et al., 1990). Recurrence of disease and death are the intermediate and terminal events respectively in this setting.

We are concerned with the analysis of data from the TREMIN Trust Study, which is part of the TREMIN Research Program for Women’s Health. The particular study we consider is the Menstrual and Reproductive Health Study in which 1,997 women were recruited from 1935-1939 and followed prospectively for up to 40 years. During this time participants were asked to keep a detailed diary of menstrual bleeding. As in Nan et al. (2006), we aim to examine the association between the time of the first menstrual cycle of at least 45 days in duration and menopause to better the understanding of the transitional phase to menopause. Multi-state models provide a natural representation for processes with an initial, intermediate and terminal state in which transitions between these states represent a progression. See Hougaard (2000) for a recent review of methods for the analysis of multi-state data and how these can be used to help understanding the time course of processes. Markov models are often adopted for progressive and degenerative processes, in which the operational time scale is the time since process initiation or some other common origin. However, in many settings interest lies in the sojourn time distributions for particular states. In such cases, the canonical models are semi-Markov but it may not be plausible to assume that successive gap times are independent. When the process is subject to type I right-censoring and there is an association between gap times, the second sojourn time is subject to dependent censoring (Lin et al., 1999), and analysis must take this dependent censoring into account.

There have been considerable advances in the analysis of consecutive sojourn, or gap times in settings such as these in the recent years. Kessing et al. (2004) consider models which assume conditional independence across gap time distributions given shared frailties and Kvist et al. (2007) develop a goodness-of-fit test for the latter model when the frailty is Gamma distributed. Lin et al. (1999) and Schaubel and Cai (2004), among others, derived nonparametric estimators for the joint and conditional survival functions. None of these methods yield simple summary measures of the association between sojourn times, and there is considerable appeal in developing methods with simple measures of association which are robust to dependent censoring.

Kendall’s tau is among the most popular measures of association between two positive random variables. Kendall and Gibson (1990) gave an empirical estimate of τ from a sample of uncensored bivariate positive random variables. Several authors proposed estimators for τ with bivariate right-censored parallel observations; see Lakhal et al., (2009) for a review. Betensky and Finkelstein (1999) extended the estimation of τ to bivariate interval-censored observations and Wang and Wells (2000) derived an estimator for τ valid for any censoring scheme as long as a nonparametric estimator for the joint survivor function exists. Little work has been done, however, for nonparametric estimation of τ under more complicated censoring schemes, such as the dependent censoring scheme discussed above. In this paper we propose nonparametric estimators for Kendall’s τ measuring the association between two consecutive gap times. The proposed estimators use inverse probability of censoring weights to address the impact of dependent censoring on the second gap times.

In the recent years, copulas had become an attractive framework for modeling the joint distribution of multiple failure times (He & Lawless 2003, Nan et al. 2006). Indeed, under a copula model, association parameters and complex joint probabilities can be easily expressed and estimated. He and Lawless (2003) consider the case of serial events, assume a Clayton copula to model the dependence between two successive gap times, and derive maximum likelihood estimates for the copula parameter and the survival function of the second gap time under weak marginal assumptions. Oakes (1986) proposed a more general copula model under which the cross-ratio is constant within particular regions of the plane. Nan et al. (2006) adapt this model for a joint analysis of the occurence times of an intermediate and a terminal event, and derive estimators of the copula parameters using the maximum pseudo-likelihood procedure of Shih and Louis (1995).

The second purpose of this paper is to develop methods for consecutive gap times analysis based on a standard copula formulation, and then to generalize this in the spirit of Nan et al. (2006), assuming a piecewise-constant cross-ratio model for consecutive gap times. The parameters of the latter model will be estimated by inverting the estimator of Kendall’s tau derived in the first part of the paper.

The remainder of this paper is organized as follows. In the next section we define Kendall’s τ and propose nonparametric estimators for this association measure using censored serial gap time data. An estimate of the conditional distribution of second gap time based on an assumed Clayton copula function is described in Section 3. These estimates are assessed through simulation studies and illustrated with data from a randomized trial of patients with colon cancer (Moertel et al., 1990). A goodness-of-fit procedure for the assumed copula is derived. This method is generalized in Section 4 to accommodate piecewise-constant cross-ratios as in Oakes (1986) and Nan et al. (2006). Application to the data from the Tremin Trust study of women’s reproductive health illustrates the application of the proposed methods and facilitates comparisons with the findings of Nan et al. (2006). Section 5 contains general remarks and topics for further research.

2. Nonparametric Estimation of Kendall’s τ

Suppose X and Y two time-to-event random variables, with joint survivor function π(x, y) = P(X > x, Y > y) and (X₁, Y₁) and (X₂, Y₂) two independent replications of (X, Y). These points are said to be concordant if (X₁ − X₂)(Y₁ − Y₂) > 0, i.e. if the marginal rankings of individuals with respect to X and Y agree. They are otherwise discordant. Kendall’s τ is defined as the probability of concordance among two pairs of points minus the probability of discordance and given as

τ = \Pr {(X_{1} - X_{2}) (Y_{1} - Y_{2}) > 0} - \Pr {(X_{1} - X_{2}) (Y_{1} - Y_{2}) < 0} .

This association measure is independent of the marginal distributions of X and Y and is equal to zero under independence. Moreover, if ψ₁₂ = I{(X₁ − X₂)(Y₁ − Y₂) > 0} − I{(X₁ − X₂)(Y₁ − Y₂) < 0} is the concordance/discordance status, equal to 1 if the pairs (X₁, Y₁) and (X₂, Y₂) are concordant and −1 otherwise, then

τ = E (ψ_{12}) = 4 \int_{0}^{\infty} \int_{0}^{\infty} π (x, y) \frac{\partial^{2} π (x, y)}{\partial x \partial y} dxdy - 1 .

(2.1)

Kendall and Gibson (1990) estimated τ from uncensored bivariate data {(X_i, Y_i), i = 1, …, n} by its empirical version

{(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i < j} ψ_{ij} .

2.1 Estimation with Parallel Survival Times

In the presence of censoring, it may not be possible to compute ψ_ij for some pairs of points, making estimation of τ more difficult; such pairs are called non-orderable, while pairs that can be ordered are orderable. Let C and D denote the censoring variables associated to X and Y, respectively. One may only observe (X̃, Ỹ, δ_X, δ_Y), where X̃ = min(X, C), Ỹ = min(Y, D), δ_X = I(X < C), and δ_Y = I(Y < D). Oakes (1982) shows that the pair (i, j) is orderable if {X̃_ij < C̃_ij, Ỹ_ij < D̃_ij} where X̃_ij = min (X_i, X_j), Ỹ_ij = min (Y_i, Y_j), C̃_ij = min (C_i, C_j) and D̃_ij = min (D_i, D_j). Denote by L_ij the indicator of this event. Oakes (1982) proposes to estimate τ by

{\hat{τ}}_{O} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i < j} L_{ij} ψ_{ij} .

This estimator is biased when τ is non-zero. Nevertheless, it is widely used to test independence of a pair of random variables based on censored data.

Recently, Lakhal et al. (2009) propose a method that greatly reduces the bias of τ̂_O by incorporating use of inverse probability of censoring weights. Let p̂_ij be an estimator of the probability of being orderable p_ij = {Pr(C > X̃_ij; D > Ỹ_ij∣X̃_ij, Ỹ_ij)}².

The weighted estimate is then

{\hat{τ}}_{mo} = {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i < j} \frac{L_{ij} ψ_{ij}}{{\hat{p}}_{ij}} .

(2.2)

This estimator may lie outside [−1, 1] for large values of |τ|, so one may also consider

{\hat{τ}}_{mo 2} = {(\sum_{i < j} \frac{L_{ij}}{{\hat{p}}_{ij}})}^{- 1} \sum_{i < j} \frac{L_{ij} ψ_{ij}}{{\hat{p}}_{ij}} .

(2.3)

These estimators are shown to be consistent, asymptotically normally distributed and empirically to be superior to existing competitors in finite samples.

2.2 Estimation With Serial Gap Times and Dependent Censoring

In this section, we discuss the estimation of Kendall’s tau between serial gap times. Let T₁ = X and T₂ = X+Y denote the times of occurrence of two successive events. Typically, the follow up process is subject to independent right-censoring by C. Denote by G(·) its survival function. Under this setting, one may only observe X̃ = T̃₁, Ỹ = T̃₂ − T̃₁, δ_X = I(X < C) and δ_Y = I(X + Y < C), where T̃₁ = min(T₁, C) and T̃₂ = min(T₂, C). Note that if T₁ is censored, T₂ is also censored and Ỹ = 0. Hence, Y is censored by D = max(0, C − X). So, unless X and Y are independent, D is associated to Y.

Under these conditions, Lin et al. (1999), among others, derived a nonparametric estimator for π(x; y) for each (x, y) such as x + y ≤ C̄, where C̄ > 0 satisfies G(C̄) > 0. This estimator may be expressed as

\hat{π} (x, y) = \frac{1}{n} \sum_{i = 1}^{n} \frac{I ({\tilde{X}}_{i} > x; {\tilde{Y}}_{i} > y)}{\hat{G} ({\tilde{X}}_{i} + y)},

(2.4)

where Ĝ(·) is the Kaplan–Meier estimator of G(·) based on {(X̃_k + Ỹ_k, 1 − δ_{Y_k}), k = 1, …, n}. One may incorporate (2.4) into (2.1) to estimate τ for serial gap times using the approach of Wang & Wells (2000) by

{\hat{τ}}_{W} = 4 \sum_{i = 1}^{n} \sum_{j = 1}^{n} \hat{π} ({\tilde{X}}_{(i)}, {\tilde{Y}}_{(j)}) \hat{π} (Δ {\tilde{X}}_{(i)}, Δ {\tilde{Y}}_{(j)}) - 1,

(2.5)

where X̃₍₀₎ = 0 < X̃₍₁₎ < ⋯ < X̃_(n), and Ỹ₍₀₎ = 0 < Ỹ₍₁₎ < ⋯ < Ỹ_(m) are the ordered samples of {X̃_k, k = 1, …, n} and {Ỹ_k, k = 1, …, n}, respectively, $m = \sum_{k = 1}^{n} δ_{X_{i}}$ and π̂(ΔX̃_(i), ΔỸ_(j)) = π̂(X̃_(i), Ỹ_(j)) − π̂(X̃_(i−1), Ỹ_(j)) − π̂(X̃_(i), Ỹ_(j−1)) + π̂(X̃_(i−1), Ỹ(_j−1)) is the estimated density mass at (X̃_(i), Ỹ_(j)).

Here, we derive alternative estimators for τ based on an adaptation of (2.2) and (2.3) to deal with serial gap times. The main challenge in this adaptation is to identify orderable pairs and to express and estimate their associated p_ij terms.

The orderability condition L_ij is still expressed as {X̃_ij < C̃_ij; Ỹ_ij < D̃_ij}. By continuity, any orderable pair (i, j) satisfies 0 < Ỹ_ij < D̃_ij and thus D_i = max(C_i− X_i, 0) > 0 and D_j = max(C_j − X_j, 0) > 0, which implies C_i > X_i and C_j > X_j. Hence, X_i and X_j are uncensored and L_ij reduces to {Ỹ_ij < min(C_i − X_i, C_j − X_j)}, which can be re-written as {C_i > X_i + Ỹ_ij; C_j > X_j + Ỹ_ij}. Note that for such pairs, Ỹ_ij is also observed, so the conditional probability of a pair being orderable is

\begin{array}{l} p_{ij} & = & \Pr {C_{i} > X_{i} + {\tilde{Y}}_{ij}; C_{j} > X_{j} + {\tilde{Y}}_{ij} ∣ X_{i}, X_{j}, {\tilde{Y}}_{ij}} \\ = & \Pr {C_{i} > X_{i} + {\tilde{Y}}_{ij} ∣ X_{i}, X_{j}, {\tilde{Y}}_{ij}} \times \Pr {C_{j} > X_{j} + {\tilde{Y}}_{ij} ∣ X_{i}, X_{j}, {\tilde{Y}}_{ij}} \\ = & G (X_{i} + {\tilde{Y}}_{ij}) \times G (X_{j} + {\tilde{Y}}_{ij}) . \end{array}

This probability is estimated by

{\hat{p}}_{ij} = \hat{G} (X_{i} + {\tilde{Y}}_{ij}) \times \hat{G} (X_{j} + {\tilde{Y}}_{ij}) .

(2.6)

Kendall’s tau is then estimated by (2.2) and (2.3), with p̂_ij given by (2.6) and we denote the resulting estimators by τ̂₁ and τ̂₂, respectively.

In many applications, as will be seen in Sections (3.2) and (4), interest lies on the conditional version of Kendall’s tau τ_A = E[ψ₁₂∣ν₁₂(A)] where A is a subset of [0; ∞) and ν₁₂(A) = I(X₁ ∈ A; X₂ ∈ A). One may compute ν_ij (A) for any orderable pair (i, j) since both X_i and X_j are uncensored for such a pair. Therefore, one may adapt τ̂₂ to estimate τ_A by

{\hat{τ}}_{A} = {\sum_{i < j} \frac{L_{ij} ν_{ij} (A)}{{\hat{p}}_{ij}}}^{- 1} \sum_{i < j} \frac{ψ_{ij} L_{ij} ν_{ij} (A)}{{\hat{p}}_{ij}} .

(2.7)

2.3 Asymptotic Properties

In web Appendix A, we prove the following result.

Theorem 1

Let ξ_ij denote a random variable, measurable for any orderable pair (i, j) and such that ξ_ij = ξ_ji and let γ = E[ξ₁₂]. Under some regularity conditions, the distribution of

\sqrt{n} {\hat{γ} - γ} = \sqrt{n} {{(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i < j} \frac{L_{ij} ξ_{ij}}{{\hat{p}}_{ij}} - γ}

converges to a zero-mean normal with variance

σ_{γ}^{2} = I - \int_{0}^{\infty} \frac{q^{2} (u)}{\Pr (\tilde{X} + \tilde{Y} > u)} d Λ_{c} (u),

(2.8)

where I and q(·) are given in web Appendix A and Λ_c(·) = − log[G(·)] is the cumulative hazard function of C. This variance can be estimated by plugging in consistent estimators for unknown quantities in (2.8).

Applying this result with ξ_ij = ψ_ij and γ = τ yields the asymptotic normality of $\sqrt{n} {{\hat{τ}}_{1} - τ}$ and an expression for its asymptotic variance. We show in web Appendix B that

\sqrt{n} {{\hat{τ}}_{2} - τ} = \sqrt{n} {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i < j} \frac{L_{ij} (ψ_{ij} - τ)}{{\hat{p}}_{ij}} + o_{p} (1)

and thus applying Theorem 1 with ξ_ij = ψ_ij − τ and γ = 0 yields the asymptotic normality of $\sqrt{n} {{\hat{τ}}_{2} - τ}$ as well as an expression for its asymptotic variance. Similary, one can show that

\sqrt{n} {{\hat{τ}}_{A} - τ_{A}} = \frac{\sqrt{n}}{E [ν_{12}]} {(\begin{matrix} n \\ 2 \end{matrix})}^{- 1} \sum_{i < j} \frac{L_{ij} ν_{ij} (ψ_{ij} - τ_{A})}{{\hat{p}}_{ij}} + o_{p} (1) .

Asymptotic normality and an expression for the asymptotic variance follow by applying Theorem 1 with ξ_ij = ν_ij (ψ_ij − τ_A) and γ = 0.

3. A Clayton Copula Model for serial gap times (X, Y)

Once the first event occurs, say at X = x, the conditional survival function S_Y (·∣X = x) often becomes of interest. A convenient way to estimate this probability is to assume a Clayton copula for the pair (X, Y). In this section, we investigate such a model, derive related inference procedures, and discuss extensions.

3.1 Model and properties

Under a Clayton copula for (X, Y), the joint survival function is expressed as

π (x, y) = \max [{S_{X} {(x)}^{- (θ - 1)} + S_{Y} {(y)}^{- (θ - 1)} - 1}^{- 1 / (θ - 1)}, 0],

(3.1)

where θ = λ_Y (y∣X = x)/λ_Y (y∣X > x) is the cross-ratio and λ_Y (y∣·) = lim_dy↓0 Pr(Y ≤ y + dy∣Y > y; ·)/dy is the conditional hazard function of Y. Under this model, θ is constant and related to Kendall’s τ through τ = (θ − 1)/(θ + 1). Moreover, one has

S_{Y} (y ∣ X = x) = {S_{Y} (y ∣ X > x)}^{θ} for all 0 \leq x, y .

(3.2)

Lin et al. (1999) note that, S_Y (y∣X > x) can be estimated by π̂(x; y)/π̂(x; 0) for x + y < C̄, where π̂(·, ·) is given by (2.4). We can also estimate θ by θ̂ = (τ̂ + 1)/(τ̂ − 1), where τ̂ is the nonparametric estimator of τ, derived in Section (2.2). A natural estimator for S_Y (y∣X = x) is then obtained by plugging in estimators for unknown quantities in (3.2).

Taylor series expansions of

\sqrt{n} {{\hat{S}}_{Y} (y ∣ X = x) - S_{Y} (y ∣ X = x)} = \sqrt{n} [{{\hat{S}}_{Y} (y ∣ X > x)}^{\hat{θ}} - {S_{Y} (y ∣ X > x)}^{θ}],

along with the asymptotic presentations of $\sqrt{n} {{\hat{S}}_{Y} (y ∣ X > x) - S_{Y} (y ∣ X > x)}$ given by Lin et al. (1999) and of $\sqrt{n} {\hat{θ} - θ}$ given in Appendix A, prove that the distribution of $\sqrt{n} {{\hat{S}}_{Y} (y ∣ X = x) - S_{Y} (y ∣ X = x)}$ converges to a zero mean normal. However, its variance involves complex formulas and we suggest variance estimation based on the Jackknife procedure.

3.2 A Goodness-of-Fit Test for the Clayton Copula Under Dependent Censoring

In the previous section, we derived an estimator for the conditional survival function S_Y (y∣X = x) under a Clayton copula model for the pair (X, Y). If this assumption does not hold, the estimator for S_Y (y∣X = x) may be biased and it is therefore desirable to check the adequacy of the Clayton copula. Several copula goodness-of-fit tests exist for complete data (Genest et al., 2009) and parallel censored observations (Lakhal-Chaieb 2010), but no procedure exists for successive gap times. We develop such a procedure here for the Clayton copula.

Let M > 0 and A = [M; ∞). By Manatunga & Oakes (2006), the Clayton copula is the only family preserved under truncation; i.e. if (X, Y) follows a Clayton copula, then (X, Y ∣X > M) follows also a Clayton copula, with the same association parameter. In particular, the conditionnal Kendall’s tau τ_A is equal to τ. Our test is then based on $Q = \sqrt{n} {{\hat{τ}}_{2} - {\hat{τ}}_{A}}$ .

Under H₀, one may write the test statistics as $Q = \sqrt{n} {({\hat{τ}}_{2} - τ) - ({\hat{τ}}_{A} - τ_{A})}$ . It follows from web Appendix B that:

Q = \sqrt{n} {(\begin{matrix} 2 \\ n \end{matrix})}^{- 1} \sum_{i < j} \frac{L_{ij}}{{\hat{p}}_{ij}} (ψ_{ij} - τ) (1 - \frac{ν_{ij}}{E [ν_{12}]}) + o_{p} (1) .

Applying the result of Theorem 1 with $ξ_{ij} = (ψ_{ij} - τ) (1 - \frac{ν_{ij}}{E [ν_{12}]})$ and γ = 0 yields the asymptotic normality and an expression for the asymptotic variance of Q under H₀. We reject H₀ at level α if |Q/s.e.(Q)| > Z_α/2. The performance of this test will be investigated by simulations in the next Section.

Clearly, the choice of M dramatically affects the performance of our test. For small values of M, the difference between τ and τ_A is tiny for most copula families and the test may not be able to detect it. At the other extreme, for large values of M, the estimate of τ_A may be based on a relatively small number of points and thus may not be precise. Thus our test is most useful for moderate values of M. In our numerical investigations, we take M to be the estimated median from the Kaplan-Meier estimate based on {(X̃_k, δ_{X_k}), k = 1, ⋯, n}.

3.3 Applications Involving the Clayton Copula

Colon cancer data

Moertel et al. (1990) discuss a clinical trial where patients treated for colon cancer are randomized into two groups: therapy and placebo, including 304 and 315 patients respectively. Patients are at risk of ordered events and the serial gap times in this example are: the time from randomization to cancer recurrence (X) and the time from cancer recurrence to death (Y). At the end of the study, 108 patients died among the 119 who had cancer recurrence in the therapy group and 155 died among 177 who had cancer recurrence in the placebo group. We computed τ̂₁ and τ̂_W for both groups. We found τ̂₁ = 0.2725 (s.e.=0.062) and τ̂_W = −0.796 (s.e.=0.613) for the therapy group and τ̂₁ = 0.2685 (s.e.=0.058) and τ̂_W = 0.012 (s.e.=0.779) for the placebo group. Our estimator τ̂₁ detects a significant positive dependence between X and Y in both groups, as conjectured by Lin et al. (1999). Furthermore, τ̂₁ suggests that the magnitude of this dependence is not affected by the therapy. The variance of τ̂_W, estimated by the Jackknife procedure, is too large to make inference. A convenient way to illustrate this dependence is to investigate the conditional survival S_Y (·∣X = x) under a Clayton copula for the pair (X, Y). In Figure 1, we report the median of Ŝ_Y (·∣X = x) versus x for both groups.

Empirical estimates of conditional median time from cancer occurrence to death given cancer occurrence date.

Figure 1 suggests that therapy decreases survival time following cancer recurrence. This is in agreement with the conclusions of Lin et al. (1999) and He and Lawless (2003). The Clayton copula assumption is tested for each group by the procedure presented in Section (3.2). It gives Q = 1.983 (s.e.=2.8115; p=0.48) and Q = 1.906 (s.e.=2.068; p=0.36) for the placebo and treatment group, respectively.

Simulations

The first set of simulations was conducted to assess and compare the finite sample performances of τ̂₁, τ̂_W and the resulting estimators for S_Y(y∣X = x). Three real τ values (0.2, 0.5 and 0.8) and two sample sizes (100 and 200) were used to generate correlated pairs (X, Y) using a Clayton copula, with exponential margins with means equal to 1 and 1/2 respectively. The censoring variable C was generated from a Weibull distribution with parameters controlling the censoring fractions cf₁ = Pr(X > C) and cf₂ = Pr(X + Y > C). Two scenarios corresponding to (cf₁, cf₂) = (0.15, 0.30) and (0.20, 0.40) were considered. For each combination of the parameters above, 1000 samples were generated and for each simulated data set, we computed τ̂₁, var(τ̂₁), τ̂_W and the resulting Ŝ_Y (y_i∣X = x₀); i = 1, …, 4 at points x₀, y₁, …, y₄ such that S_X(x₀) = 1/2 and S_Y (y_i∣X = x₀) = i/5. The empirical means and mean square errors of τ̂₁, τ̂_W and Ŝ_Y (y_i∣X = x₀) are reported in Table 1.

Table 1.

Means and Mean square errors (in parentheses; ×10⁴) of τ̂₁ (IPCW) and τ̂_W (WW) and their resulting conditional survival estimates

n	cf₁ cf₂	τ	τ̂		Ŝ(y₁∣X = x₀)		Ŝ(y₂∣X = x₀)		Ŝ(y₃∣X = x₀)		Ŝ(y₄∣X = x₀)

			IPCW	WW	IPCW	WW	IPCW	WW	IPCW	WW	IPCW	WW
100	15% 30%	0.2	0.187 (6.7)	0.229 (19.0)	0.201 (4.9)	0.179 (8.0)	0.405 (7.2)	0.374 (13.2)	0.601 (7.7)	0.570 (14.2)	0.801 (6.0)	0.780 (9.5)
		0.5	0.492 (4.1)	0.547 (25.7)	0.210 (8.6)	0.175 (13.2)	0.410 (14.9)	0.354 (27.3)	0.618 (18.7)	0.557 (34.8)	0.809 (16.0)	0.748 (29.9)
		0.8	0.797 (1.0)	0.849 (23.4)	0.232 (28.5)	0.203 (35.6)	0.401 (44.8)	0.352 (63.9)	0.547 (50.2)	0.481 (88.3)	0.670 (56.6)	0.592 (115.6)

	20% 40%	0.2	0.185 (9.1)	0.231 (37.0)	0.193 (7.1)	0.175 (11.1)	0.398 (10.3)	0.367 (19.6)	0.599 (10.9)	0.566 (21.9)	0.801 (7.7)	0.774 (15.3)
		0.5	0.484 (5.0)	0.540 (40.9)	0.201 (10.3)	0.174 (16.9)	0.409 (26.8)	0.360 (36.0)	0.615 (26.8)	0.548 (46.32)	0.808 (21.1)	0.736 (34.6)
		0.8	0.791 (1.3)	0.830 (36.6)	0.246 (42.8)	0.259 (57.4)	0.401 (54.9)	0.401 (76.6)	0.503 (64.9)	0.478 (95.5)	0.620 (84.7)	0.575 (135.4)

200	15% 30%	0.2	0.195 (3.3)	0.218 (8.3)	0.199 (2.5)	0.185 (4.0)	0.399 (3.3)	0.380 (5.9)	0.604 (3.2)	0.587 (5.3)	0.805 (2.7)	0.795 (3.7)
		0.5	0.495 (2.1)	0.522 (11.3)	0.204 (3.3)	0.183 (6.8)	0.411 (7.1)	0.379 (13.8)	0.608 (8.3)	0.577 (15.4)	0.804 (7.0)	0.780 (11.2)
		0.8	0.797 (0.4)	0.824 (9.8)	0.216 (14.0)	0.188 (21.3)	0.417 (32.0)	0.365 (48.9)	0.587 (35.7)	0.531 (62.1)	0.821 (34.7)	0.667 (65.6)

	20% 40%	0.2	0.188 (4.4)	0.205 (15.9)	0.194 (3.8)	0.185 (6.4)	0.397 (5.3)	0.383 (10.1)	0.599 (5.5)	0.584 (10.3)	0.800 (4.0)	0.789 (7.3)
		0.5	0.490 (2.5)	0.515 (18.3)	0.204 (5.9)	0.186 (10.8)	0.412 (11.7)	0.381 (15.7)	0.608 (13.9)	0.575 (24.7)	0.801 (9.2)	0.777 (17.9)
		0.8	0.795 (0.5)	0.821 (21.6)	0.228 (26.5)	0.227 (37.5)	0.415 (45.8)	0.394 (63.8)	0.548 (50.0)	0.519 (75.4)	0.664 (61.0)	0.623 (103.4)

Open in a new tab

The empirical means of the estimates of Var(τ̂₁) along with their coverage rate of the 95% confidence interval are reported in Table 2.

Table 2.

Empirical means of the estimates of Var(τ̂₁) with their empirical coverage rate

	n = 100		n = 200
τ	cf₁ = 20%	cf₁ = 40%	cf₁ = 20%	cf₁ = 40%
0.2	(6.9) 95.3	(9.0) 94.0	(3.1) 95.0	(4.5) 94.5
0.5	(4.0) 94.6	(4.8) 94.8	(2.2) 94.2	(2.6) 95.5
0.8	(1.0) 95.2	(1.4) 93.8	(0.6) 94.3	(0.6) 94.5

Open in a new tab

As expected, the censoring fraction affects all estimators. Table 1 shows that τ̂₁ outperforms τ̂_W under all simulation conditions, in terms of mean squared error. Note that τ̂₁ is virtually unbiased, except under the extreme conditions cf₂ = 0.4 and τ = 0.8. On the other hand, the bias of τ̂_W is non-negligible, even under light censoring and small values of τ. The same conclusions are made for estimators of Ŝ_Y (y_i∣X = x₀) based on τ̂₁ and τ̂_W respectively. Table 1 suggests that the estimator of S_Y (y_i∣X = x₀) based on τ_W is not particularly reliable, especially when τ = 0.8. Table 1 also suggests that the performance τ̂₁ improves as τ increases while the opposite is observed for Ŝ_Y (y_i∣X = x₀) based on τ̂₁. The results in Table 2 suggest that our estimator of Var(τ̂₁) provides a reasonable measure of the variability of τ̂₁

A second set of simulations was conducted to investigate the performance of the goodness-of-fit test presented in Section 3.2. Samples of 200 pairs with (cf₁, cf₂) = (0.2, 0.4) were generated from the Clayton, Frank and Gumbel families using the transformation method of Genest & Rivest (1993). Proportions of rejection of the null hypothesis corresponding to the Clayton copula are reported in Figure 2. These simulations show that the rejection rate of this test is comparable with the nominal level. Furthermore, it indicates that the Gumbel copula is easier to distinguish from the Clayton copula than the Frank, in accordance with well known properties of these copulas.

Empirical power based on 1000 samples of size 200 from Gumbel, Frank and Clayton copulas.

4. A Piecewise Clayton Copula

4.1 Model and Method of Inference

Rejection of the null hypothesis by the test presented in the previous section implies that the assumption of a constant cross-ratio over the positive quadrant does not hold for the data at hand. Nan et al.(2006) discuss a copula model that relaxes this assumption and assumes that the cross-ratio depends on one of the time-to-events variables, say X. They considered a partition 0 = w₀ < w₁ < ⋯ < w_K of the support of X such as the cross-ratio is constant inside each grid (x, y) ∈ A_k × [0, ∞] and equal to θ_k, where A_k = [w_k−1, w_k). Under such conditions, (3.2) becomes

S_{Y} (y ∣ X = x) = {[S_{Y} (y ∣ X > x)]}^{θ_{k}} for (x, y) \in A_{k} \times [0, \infty] .

(4.1)

and the resulting model is referred to as a piecewise Clayton copula. In web Appendix C we show that the conditional Kendall tau τ_{A_k} = E[ψ₁₂∣ν₁₂(A_k)] is related to θ_k through

τ_{A_{k}} = g (a, b, θ_{k}) = \frac{4}{{(b - a)}^{2}} [\frac{θ_{k}}{2 (θ_{k} + 1)} (b^{2} - a^{2}) + J (a, b, θ_{k})] - 1,

(4.2)

where a = S_X(w_k), b = S_X(w_k−1) and

J (a, b, θ) = {\begin{cases} \frac{1}{θ - 1} \int_{0}^{a^{- (θ - 1)}} {(a^{- (θ + 1)} - t)}^{- 1 / (θ - 1)} {(b^{- (θ + 1)} - t)}^{- θ / (θ - 1)} dt - \frac{a^{2}}{2} & if θ_{k} < 1 \\ \frac{a^{2}}{2} - \frac{1}{θ - 1} {\int_{0}^{\infty} (a^{- (θ + 1)} + t)}^{- 1 / (θ - 1)} {(b^{- (θ + 1)} + t)}^{- θ / (θ - 1)} dt & if θ_{k} > 1 . \end{cases}

(4.3)

Nan et al. (2006) assume a piecewise Clayton copula for (T₁, T₂) and estimate the model parameters {θ_k, k = 1, …, K} by adapting the methodology of Shih and Louis (1995). This approach ignores the ordered nature of (T₁, T₂) and as a consequence, the resulting estimator of Pr(T₁ > T₂) not identically equal to zero, as it should be. A more appealing approach is to assume a piecewise Clayton copula for (X, Y) since no order restrictions are required for these variables. Moreover, if the presence of a cycle of 45 days or more signals early changes prior to the onset of menopause, Y may well be a more natural quantity to focus on for some scientific questions.

For k = 1, ⋯, K, we estimate τ_{A_k} by (2.7). An estimate of θ_k is then obtained as the solution of τ̂_{A_k} = g [Ŝ_X (w_k), Ŝ_X (w_k−1), θ_k]. The resulting estimate θ̂_k, along with (4.1), yields an estimator for S_Y (y∣X = x). The variance of θ̂_k involves those of τ̂_{A_k}, Ŝ_X (w_k) and Ŝ_X (w_k−1) and some covariance terms, and thus is complex to compute. We suggest using the Jackknife method to estimate Var(θ̂_k) and Var [Ŝ_Y (y∣X = x)].

4.2 Numerical Applications

Analysis of the Tremin Trust Data

We consider the data used by Nan et al. (2006) who analyzed follow up data from 562 women who were less than 25 years of age at the time of recruitment, provided data on the age of menarche, and were on study at 35 years of age. The purpose of this study is to characterize the association between several bleeding makers such as the age at the first 45-day cycle, and menopause.

Consider a reproductive life cycle where T₁ = X represents the age at the first cycle of at least 45 days duration, and T₂ is the age at menopause; Y = T₂ − X is then the time from the first 45 day cycle to menopause. At the end of the study, 193 women were observed to reach menopause among the 357 women who experienced the 45-day marker. We first test the assumption of a constant cross-ratio between X and Y by the procedure derived in Section (3.2). We find Q = −4.507, s.e.=1.0875 and p-value=3.4 × 10⁻⁵ leading us to reject the null hypothesis of a constant cross-ratio. We consider then a piecewise Clayton copula where the cross-ratio is assumed constant within the same intervals of T₁ = X adopted by Nan et al. (2006), namely: 35 – 39, 40 – 45, 46 – 49 and 50+ years of age. These boundaries satisfy Ŝ_X (w_k) = {0.8383, 0.562, 0.281}. We estimate τ̂_{A_k}, k = 1, 2, 3, 4 using equation (2.7) and obtain τ̂_A₁ = −0.049 (s.e.=0.160), τ̂_A₂ = −0.374 (s.e.=0.068), τ̂_A₃ = −0.235 (s.e.=0.095) and τ̂_A₄ = −0.139 (s.e.=0.112). Inverting these estimates yields θ̂₁ = 0.411; 95% C.I. [0.145;15.023], θ̂₂ = 0.184; 95% C.I. [0.129;0.260], θ̂₃ = 0.327; 95% C.I. [0.202;0.673] and θ̂₄ = 0.756; 95% C.I. [0.472;1.175].

We conclude that there is a significant association between the 45-day cycle and menopause only if the bleeding event occurs inside the age interval 40 – 49. This is in accordance with the results of Nan et al. (2006), who detect a significant association between T₁ and T₂ in the same region. Nan et al. (2006) present plots of Pr(T₂ > t₂∣T₁ = t₁) versus t₂ for different values of t₁. This probability is equal to S_y (t₂ − t₁∣X = t₁) and thus can be estimated by our model. The results are presented in Figure 3 for t₁ ∈ {36, 39, 42, 45, 48, 51}.

Empirical estimates of conditional survivor functions for menopause given age at first cycle of ≥ 45 days duration.

Figure 3 is again in accordance with the one produced by Nan et al. (2006) for women who experience the 45 days cycle marker after 40 years of age. However, we obtain different results for women who experience the 45 days cycle before age 40. In particular, we estimate that such a woman has about a 20% chance of experiencing menopause before age 45 (see the X = 36 and 39 curves), while this probability is estimated to be approximately null by Nan et al. (2006). This suggests a lack of fit of one of the models and thus there is a need for appropriate goodness-of-fit tests for such copulas models. This is beyond the scope of this paper.

Simulations

A second set of simulations were conducted to assess the performances of the estimator of τ_{A_k} given by (2.7). Censored samples were generated following the piecewise Clayton copula with conditions similar to those of the Tremin Trust dataset: n = 562, bound-aries such as S_X(w_k) ∈ {0.8383, 0.562, 0.281} and τ_{A_k} ∈ {−0.049, −0.374, −0.235, −0.139}. The marginal distributions of X and Y were exponential with means 1 and 1/2 respectively. The distribution of C was taken as Weibull with parameters such as cf₁ = 0.35 and cf₂ = 0.65. The empirical means of τ̂_{A_k} from 1000 iterations are -0.049 (s.e.=0.10), -0.373 (s.e.=0.066), -0.229 (s.e.=0.054) and -0.136 (s.e.=0.053).

The squared roots of the means of the estimated variances are {0.09, 0.064, 0.057, 0.055}. These simulations show that the τ̂_{A_k} are virtually unbiased and that their variances are consistently estimated.

5. Discussion

In this paper, we consider nonparametric estimation of the association between successive gap times when the second gap time is subject to dependent censoring. This dependent censoring is induced by the association between the first and second gap times and is present even when the process itself is independently right-censored. Inverse probability weights are used to obtain a nonparametic estimate of Kendall’s tau. This estimator is then used to make inferences about the conditional survivor function S_Y (y∣X = x) under a Clayton copula. While this is possible by (3.2) for the Clayton copula, this particular relation does not hold for other copula families. Alternative ways of estimating S_Y (y∣X = x) under arbitrary copula functions warrant investigation for settings where the Clayton copula is inappropriate.

Recurrent event processes are increasingly arising in health research and gap time models frequently offer an appealing framework for analysis. Generalizations of the proposed estimator of Kendall’s tau would facilitate inferences about gap time models for recurrent event processes with copula formulations. Such models would allow the estimation of

\Pr (T_{k} > t_{k} ∣ T_{1} = t_{1}, T_{2} = t_{2}, \dots, T_{k - 1} = t_{k - 1}) .

This is an area warranting development.

We employed a piecewise Clayton copula formulation in Section 4. Such models have received relatively little attention in the literature and methods to assist in specification of the regions with a constant cross-ratio would be helpful since at present they are based on ad hoc graphical methods. Derivation of formal goodness-of-fit tests for these models and objective procedures for specifying these regions would increase their practical utility.

In many settings event times are not observed precisely, but individuals are only assessed at periodic inspection times creating interval-censored data on gap times. In the context of a three-state progressive model, the intermediate event may be subject to interval-censoring and the terminal event right-censoring, or both events may be interval censored. In this case nonparametric estimation of the association between gap times is considerably more challenging, and parametric assumptions may be required.

Supplementary Material

Supplementary Materials

NIHMS508440-supplement-Supplementary_Materials.pdf^{(44.1KB, pdf)}

Acknowledgments

We are grateful to two anonymous referees and to the associate editor for their constructive comments that led to substantial improvement of the initial version of this paper. This work was partially supported by individual grants from the Natural Sciences and Engineering Research Council of Canada to the two first authors and by grants R37 CA76404 and P01 CA134294 from the National Cancer Institute to the third author.

Footnotes

Supplementary Materials

Web Appendices referenced in Sections 2.3 and 3.2 ara available under the Paper information link at the Biometrics website: http://www.biometrics.tibs.org

References

Bebchuk JD, Betensky RA. Local likelihood analysis of the latency distribution with interval censored intermediate events. Statist Med. 2002;21:3475–3491. doi: 10.1002/sim.1275. [DOI] [PubMed] [Google Scholar]
Betensky R, Finkelstein D. An extension of Kendall’s coefficient of concordance to bivariate interval-censored data. Statist Med. 1999;18:3101–3109. doi: 10.1002/(sici)1097-0258(19991130)18:22<3101::aid-sim339>3.0.co;2-5. [DOI] [PubMed] [Google Scholar]
Cheng SC, Wei LJ, Ying Z. Analysis of transformation models with censored data. Biometrika. 1995;82(4):835–845. [Google Scholar]
Genest C, Rémillard B, Beaudoin D. Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics. 2009;44:199–213. [Google Scholar]
Genest C, Rivest L-P. Statistical inference procedures for bivariate Archimedean copulas. J Am Statist Assoc. 1993;88:1034–1043. [Google Scholar]
Hougaard P. Multi-state models: A review. Lifetime Data Analysis. 2000;5:239–264. doi: 10.1023/a:1009672031531. [DOI] [PubMed] [Google Scholar]
He W, Lawless JF. Flexible maximum likelihood methods for bivariate proportional hazards models. Biometrics. 2003;12:837–848. doi: 10.1111/j.0006-341x.2003.00098.x. [DOI] [PubMed] [Google Scholar]
Kendall M, Gibbons JD. Rank Correlation Methods. Fifth. London: Edward Arnold; 1990. A Charles Griffin Title. [Google Scholar]
Kessing LV, Hansen MG, Andersen PK. Course of illness in depressive bipolar disorders: Naturalistic study, 1994–1999. Brit J Psych. 2004;185:372–377. doi: 10.1192/bjp.185.5.372. [DOI] [PubMed] [Google Scholar]
Kvist K, Gerster M, Andersen PK, Kessing LV. Non-parameteric estimation and model checking procedures for marginal gap time distributions for recurrent events. Statist Med. 2007;26:5394–5410. doi: 10.1002/sim.3135. [DOI] [PubMed] [Google Scholar]
Lakhal-Chaieb L. Copula inference under censoring. Biometrika. 2010 In Press. [Google Scholar]
Lakhal-Chaieb L, Rivest L-P, Beaudoin D. IPCW estimator for Kendall’s tau under bivariate censoring. International Journal of Biostatistics. 2009;5(1):8. [Google Scholar]
Lin DY, Sun W, Ying Z. Nonparametric estimation of the gap time distributions for serial events with censored data. Biometrika. 1999;86:59–70. [Google Scholar]
Manatunga AK, Oakes D. A meaure of association for bivariate frailty distributions. Journal of Multivariate Analysis. 1996;56:60–74. [Google Scholar]
Moertel CG, Fleming TR, McDonald JS, et al. Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. New England Journal of Medicine. 1990;322:352–358. doi: 10.1056/NEJM199002083220602. [DOI] [PubMed] [Google Scholar]
Nan B, Lin X, Lisabeth LD, Harlow SD. Piece-wise constant cross-ratio estimates for the association between age at marker event and age at menopause. J Am Statist Assoc. 2006;101:65–77. [Google Scholar]
Oakes D. A concordance test for independence in the presence of censoring. Bioemtrics. 1982;38(2):451–455. [PubMed] [Google Scholar]
Oakes D. A model for bivariate survival data. In: Moolgavkar SH, Prentice RL, editors. Modern Statistical Methods in Chronic Disease Epidemiology. Wiley; New York: 1986. [Google Scholar]
Schaubel DE, Cai J. Regression methods for gap time hazard functions of sequentially ordered multivariate failure time data. Biometrika. 2004;91:291–303. [Google Scholar]
Shih JH, Louis TA. Inferences on the association parameter in copula models for bivariate survival data. Biometrics. 1995;51:1384–1399. [PubMed] [Google Scholar]
Wang W, Wells MT. Estimation of Kendall’s tau under censoring. Statist Sinica. 2000;10(4):1199–1215. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS508440-supplement-Supplementary_Materials.pdf^{(44.1KB, pdf)}

[R1] Bebchuk JD, Betensky RA. Local likelihood analysis of the latency distribution with interval censored intermediate events. Statist Med. 2002;21:3475–3491. doi: 10.1002/sim.1275. [DOI] [PubMed] [Google Scholar]

[R2] Betensky R, Finkelstein D. An extension of Kendall’s coefficient of concordance to bivariate interval-censored data. Statist Med. 1999;18:3101–3109. doi: 10.1002/(sici)1097-0258(19991130)18:22<3101::aid-sim339>3.0.co;2-5. [DOI] [PubMed] [Google Scholar]

[R3] Cheng SC, Wei LJ, Ying Z. Analysis of transformation models with censored data. Biometrika. 1995;82(4):835–845. [Google Scholar]

[R4] Genest C, Rémillard B, Beaudoin D. Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics. 2009;44:199–213. [Google Scholar]

[R5] Genest C, Rivest L-P. Statistical inference procedures for bivariate Archimedean copulas. J Am Statist Assoc. 1993;88:1034–1043. [Google Scholar]

[R6] Hougaard P. Multi-state models: A review. Lifetime Data Analysis. 2000;5:239–264. doi: 10.1023/a:1009672031531. [DOI] [PubMed] [Google Scholar]

[R7] He W, Lawless JF. Flexible maximum likelihood methods for bivariate proportional hazards models. Biometrics. 2003;12:837–848. doi: 10.1111/j.0006-341x.2003.00098.x. [DOI] [PubMed] [Google Scholar]

[R8] Kendall M, Gibbons JD. Rank Correlation Methods. Fifth. London: Edward Arnold; 1990. A Charles Griffin Title. [Google Scholar]

[R9] Kessing LV, Hansen MG, Andersen PK. Course of illness in depressive bipolar disorders: Naturalistic study, 1994–1999. Brit J Psych. 2004;185:372–377. doi: 10.1192/bjp.185.5.372. [DOI] [PubMed] [Google Scholar]

[R10] Kvist K, Gerster M, Andersen PK, Kessing LV. Non-parameteric estimation and model checking procedures for marginal gap time distributions for recurrent events. Statist Med. 2007;26:5394–5410. doi: 10.1002/sim.3135. [DOI] [PubMed] [Google Scholar]

[R11] Lakhal-Chaieb L. Copula inference under censoring. Biometrika. 2010 In Press. [Google Scholar]

[R12] Lakhal-Chaieb L, Rivest L-P, Beaudoin D. IPCW estimator for Kendall’s tau under bivariate censoring. International Journal of Biostatistics. 2009;5(1):8. [Google Scholar]

[R13] Lin DY, Sun W, Ying Z. Nonparametric estimation of the gap time distributions for serial events with censored data. Biometrika. 1999;86:59–70. [Google Scholar]

[R14] Manatunga AK, Oakes D. A meaure of association for bivariate frailty distributions. Journal of Multivariate Analysis. 1996;56:60–74. [Google Scholar]

[R15] Moertel CG, Fleming TR, McDonald JS, et al. Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. New England Journal of Medicine. 1990;322:352–358. doi: 10.1056/NEJM199002083220602. [DOI] [PubMed] [Google Scholar]

[R16] Nan B, Lin X, Lisabeth LD, Harlow SD. Piece-wise constant cross-ratio estimates for the association between age at marker event and age at menopause. J Am Statist Assoc. 2006;101:65–77. [Google Scholar]

[R17] Oakes D. A concordance test for independence in the presence of censoring. Bioemtrics. 1982;38(2):451–455. [PubMed] [Google Scholar]

[R18] Oakes D. A model for bivariate survival data. In: Moolgavkar SH, Prentice RL, editors. Modern Statistical Methods in Chronic Disease Epidemiology. Wiley; New York: 1986. [Google Scholar]

[R19] Schaubel DE, Cai J. Regression methods for gap time hazard functions of sequentially ordered multivariate failure time data. Biometrika. 2004;91:291–303. [Google Scholar]

[R20] Shih JH, Louis TA. Inferences on the association parameter in copula models for bivariate survival data. Biometrics. 1995;51:1384–1399. [PubMed] [Google Scholar]

[R21] Wang W, Wells MT. Estimation of Kendall’s tau under censoring. Statist Sinica. 2000;10(4):1199–1215. [Google Scholar]

PERMALINK

Inverse Probability of Censoring Weighted Estimates of Kendall’s τ for Gap Time Analyses

Lajmi Lakhal-Chaieb

Richard J Cook

Xihong Lin

Summary

1. Introduction

2. Nonparametric Estimation of Kendall’s τ

2.1 Estimation with Parallel Survival Times